You are on page 1of 215

Swansea University E-Theses

_________________________________________________________________________

Investigating ray tracing algorithms and data structures in the


context of visibility.

Kammaje, Ravi Prakash

How to cite:
_________________________________________________________________________
Kammaje, Ravi Prakash (2009) Investigating ray tracing algorithms and data structures in the context of visibility..
thesis, Swansea University.
http://cronfa.swan.ac.uk/Record/cronfa42220

Use policy:
_________________________________________________________________________
This item is brought to you by Swansea University. Any person downloading material is agreeing to abide by the terms
of the repository licence: copies of full text items may be used or reproduced in any format or medium, without prior
permission for personal research or study, educational or non-commercial purposes only. The copyright for any work
remains with the original author unless otherwise specified. The full-text must not be sold in any format or medium
without the formal permission of the copyright holder. Permission for multiple reproductions should be obtained from
the original author.

Authors are personally responsible for adhering to copyright and publisher restrictions when uploading content to the
repository.

Please link to the metadata record in the Swansea University repository, Cronfa (link given in the citation reference
above.)

http://www.swansea.ac.uk/library/researchsupport/ris-support/
Investigating Ray Tracing Algorithms and Data
Structures in the Context of Visibility

Ravi Prakash Kammaje

Submitted to the University o f W ales in fulfilment o f the requirements for the D egree o f
D octor o f Philosophy

V©/
Swansea University
Prifysgol Abertawe

Department o f Computer S cience


Sw ansea University

September 2009
ProQuest Number: 10797922

All rights reserved

INFORMATION TO ALL USERS


The quality of this reproduction is d e p e n d e n t upon the quality of the copy subm itted.

In the unlikely e v e n t that the a u thor did not send a c o m p le te m anuscript


and there are missing pages, these will be noted. Also, if m aterial had to be rem oved,
a n o te will ind ica te the deletion.

uest
ProQuest 10797922

Published by ProQuest LLC(2018). C opyright of the Dissertation is held by the Author.

All rights reserved.


This work is protected against unauthorized copying under Title 17, United States C o d e
M icroform Edition © ProQuest LLC.

ProQuest LLC.
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 4 8 1 0 6 - 1346
LIBRARY
£
Summary
Ray tracing is a popular rendering method with built in visibility determination. However, the
computational costs are significant. To reduce them, there has been extensive research leading to
innovative data structures and algorithms that optimally utilize both object and image coherence.
Investigating these from a visibility determination context without considering further optical ef­
fects is the main motivation of the research.

Three methods - one structure and two coherent tree traversal algorithms - are discussed. While
the structure aims to increase coherence, the algorithms aim to optimise utilization of coherence
provided by ray tracing structures (kd-trees, octrees).

RBSP trees - Restricted Binary Space Partitioning Trees - build upon the research in ray tracing
with kd-trees. A higher degree of freedom for split plane selection increases object coherence
implying a reduction in the number of node traversals and triangle intersections for most scenes.
Consequently, reduced ray casting times for scenes with predominantly non-axis-aligned triangles
is observed.

Coherent Rendering is a rendering method that shows improved complexity, but at an absolute
performance that is much slower than packet ray tracing. However, since it led to the creation of
the Row Tracing algorithm, it is described briefly.

Row Tracing can be considered as an adaptation of Coherent Rendering, scanline rendering or


packet ray tracing. One row of the image is considered and its pixels are determined. Similar to
Coherent Rendering, an adapted version of Hierarchical Occlusion Maps is used to identify and
skip occluded nodes. To maximize utilisation of coherence, the method is extended so that several
adjacent rows are traversed through the tree.

The two versions of Row Tracing demonstrate excellent performance, exceeding that of packet
ray tracing. Further, it is shown that for larger models (2 million+ triangles), Row Tracing and
Packet Row Tracing significantly outperform Z-buffer based methods (OpenGL). Row tracing
shows scalability over scene sizes leading to a rendering method that has fast rendering times
for both large and small models. In addition it has excellent parallelisation properties allowing
utilisation of multiple cores with ease. Thus, the Row Tracing and Packet Row Tracing algorithms
can be considered as the significant contributions of the Ph.D.

These data structures and algorithms demonstrate that ray tracing data structures and adaptations
of ray tracing algorithms exhibit excellent potential in a visibility context.
DECLARATION
This work has not previously been accepted in substance for any degree and is not being concur­
rently submitted in candidature for any degree.

Signed ......................... ^ ................ ...,................ (candidate)

Date

STATEMENT 1
This thesis is the result of my own investigations, except where otherwise stated. Where correction
services have been used, the extent and nature of the correction is clearly marked in a footnote(s).

Other sources are acknowledged by footnotes giving explicit references. A bibliography is ap­
pended.

Signed ........ -................/.................... (candidate)

Date

STATEMENT 2
I hereby give consent for my thesis, if accepted, to be available for photocopying and for inter-
library loan, and for the title and summary to be made available to outside organisations.

Signed : rrrr... (candidate)


Contents

List of Figures ix

List o f Tables x

Glossary xiii

1 Introduction 1

2 Ray Tracing - Algorithms and Data Structures 5


2.1 Ray T ra c in g ........................................................................................................................ 5
2.2 Ray In te rs e c tio n s............................................................................................................... 7
2.2.1 Ray-Triangle in tersectio n s................................................................................ 7
2.2.2 Ray-Sphere intersections.................................................................................... 10
2.2.3 Ray-Plane intersections .................................................................................. 11
2.2.3.1 Ray-Axis-Aligned Plane In te rs e c tio n s ......................................... 12
2.2.4 Ray-Box In te rse c tio n s....................................................................................... 12
2.3 Acceleration S tru c tu re s..................................................................................................... 15
2.3.1 Binary Space Partitioning Trees (BSP Trees) ................................................ 16
2.3.2 K d -trees................................................................................................................. 17
2.3.2.1 C o n stru c tio n ....................................................................................... 17
2.3.2.2 T ra v e rsa l.............................................................................................. 22
2.3.3 O c t r e e s ................................................................................................................. 25
2.3.4 G r i d s ..................................................................................................................... 27
2.3.5 Object S u b d iv isio n ............................................................................................. 28
2.3.5.1 Shapes of Bounding v o lu m e s .......................................................... 29
2.3.5.2 C o n stru c tio n ....................................................................................... 29
2.3.5.3 T ra v e rsa l.............................................................................................. 31
2.3.5.4 Advantages of BVHs ....................................................................... 32
2.4 Packet Ray T r a c in g ........................................................................................................... 33
2.5 Anti-Aliasing and Incoherent R a y s ................................................................................. 36
2.6 Dynamic Ray T ra c in g ........................................................................................................ 37
2.7 Other Visibility M e th o d s ................................................................................................. 38
2.7.1 Area Subdivision M e th o d s ................................................................................ 39
2.7.2 Scanline A lg o r ith m s .......................................................................................... 39
2.7.3 Visibility Determination by Depth S o rtin g ..................................................... 40
2.7.4 Visibility Determination using a BSP T r e e ...................................................... 40
2.7.5 Z -b u ffe r................................................................................................................. 41
2.7.6 Hierarchical M e th o d s.......................................................................................... 41

i
2.8 S u m m a r y .......................................................................................................................... 43

3 RBSP Trees 45
3.1 M otivation....................................................................... 46
3.2 RBSP Trees C o n c e p t...................................................................................................... 48
3.3 Data S tru c tu re ................................................................................................................... 50
3.4 C onstruction....................................................................................................................... 52
3.4.1 Space Median Construction .............................................................................. 54
3.4.2 Surface Area H e u r is tic ........................................................................................ 54
3.5 RBSP Tree T r a v e r s a l...................................................................................................... 57
3.5.1 Algorithm 1 .1 - Traversal by Linear Interpolation of Ray-Plane Intersec­
tion Parameter .................................................................................................... 57
3.5.2 Algorithm 1.2 - Traversal using SSE .......................................................... 61
3.5.3 Alternate Traversal - R ay-Plane Intersection at Each N o d e ......................... 61
3.5.3.1 Algorithm 2.1 - Entry and Exit Plane Determination with OpenGL 63
3.5.3.2 Algorithm 2.2 - Entry and Exit Plane Determination by Recur­
sive D iv id e 64
3.6 Data Structure Visualised for Various M o d e l s ........................................................... 65
3.7 R esu lts................................................................................................................................. 66
3.7.1 Node Traversals and Triangle In te r s e c tio n s ................................................... 68
3.7.2 Rendering T i m e s .................................................................................................. 70
3.7.3 Construction T im e s............................................................................................... 73
3.7.4 Number of Faces in Node ................................................................................. 74
3.8 Further Research on Structures with Non-Axis-Aligned Splitting P l a n e s ............. 74
3.8.1 Accelerated Building and Ray Tracing of Restricted BSP Trees - Budge
et al.......................................................................................................................... 74
3.8.2 Ray Tracing with the BSP Tree - Ize et al........................................................ 76
3.9 S u m m a r y .......................................................................................................................... 78

4 C oherent R endering 79
4.1 M otivation.......................................................................................................................... 80
4.1.1 Average C o m p le x ity ........................................................................................... 80
4.1.2 Object Order Ray C a s t i n g ................................................................................. 83
4.2 Coherent Rendering - Concept and High LevelAlgorithm ...................................... 85
4.3 Tree T rav ersal.................................................................................................................... 87
4.3.1 Node Projection .................................................................................................. 88
4.3.2 Frustum Visibility and Node Size T e s t .............................................................. 89
4.3.2.1 Frustum V is ib ility ............................................................................ 89
4.3.2.2 Node Size T e s t................................................................................... 91
4.4 Occlusion Detection - HierarchicalOcclusion M a p s ................................................... 92
4.4.1 Concept of HOMs ............................................................................................... 92
4.4.2 HOM U p d a te ......................................................................................................... 93
4.4.3 Occlusion Testing using HOMs ........................................................................ 94
4.5 Leaf Node Processing - Recursive R asterisatio n ........................................................ 95
4.6 R esu lts................................................................................................................................. 99
4.6.1 Empirical C o m p le x ity ........................................................................................ 99
4.6.2 Absolute P e rfo rm a n c e ........................................................................................ 103
4.7 D iscussion.......................................................................................................................... 104
4.7.1 C o m p le x ity ............................................................................................................ 104
4.7.2 Absolute P e rfo rm a n c e ....................................................................................... 105
4.8 S u m m a r y ............................................................................................................................ 106

5 Row Tracing 108


5.1 M o tivation............................................................................................................................ 108
5.2 C o n c e p t............................................................................................................................... 109
5.3 High Level A lg o rith m ........................................................................................................ Ill
5.4 Datastructures for Row T ra c in g ....................................................................................... 112
5.4.1 K d - t r e e .................................................................................................................. 113
5.4.2 O c tre e ..................................................................................................................... 113
5.5 Tree T ra v e rsa l..................................................................................................................... 113
5.5.1 Plane-Box in te rse c tio n ........................................................................................ 113
5.5.2 Initialisation ........................................................................................................ 115
5.5.3 Tree Traversal A lg o rith m .................................................................................... 116
5.5.4 Projection of a Point onto the R o w .................................................................... 117
5.5.5 Kd-tree Node P r o je c tio n .................................................................................... 117
5.5.6 Octree Node P ro jectio n ........................................................................................ 120
5.5.7 Row-Kd-tree Node I n te rs e c tio n ....................................................................... 120
5.5.8 Row-Octree Node In tersection.......................................................................... 121
5.6 Leaf Node P ro cessin g ........................................................................................................ 121
5.6.1 Row-Triangles Intersection .............................................................................. 122
5.6.2 Segments Partially in F ro n t................................................................................. 124
5.6.3 Clipping Intersection Segments ....................................................................... 124
5.6.4 Projection of Clipped S e g m e n ts ....................................................................... 125
5.6.5 Rasterising the S e g m e n ts.................................................................................... 126
5.7 Final Image G e n e ra tio n ..................................................................................................... 127
5.7.1 Simplified Shading for Row T ra c in g ................................................................ 127
5.8 ID Hierarchical Occlusion Maps ( H O M s ) .................................................................... 128
5.8.1 HOM U p d a te ........................................................................................................ 129
5.8.2 Occlusion Testing using HOMs ....................................................................... 130
5.8.3 Node Occlusion T e s t i n g .................................................................................... 132
5.8.4 Recursive Rasterisation of Leaf Node T ria n g le s ............................................ 132
5.9 Packet Row T ra c in g ............................................................................................................ 133
5.9.1 High Level Algorithm ........................................................................................ 133
5.9.2 Tree Traversal ..................................................................................................... 134
5.10 Low Level O p tim isatio n s................................................................................................ 138
5.10.1 M u lti-T h re a d in g .................................................................................................. 138
5.11 R esu lts.................................................................................................................................. 139
5.11.1 Row Tracing vs Packet Ray T r a c in g ................................................................ 139
5.11.2 Row Tracing vs O p e n G L ........................................................................... 141
5.11.3 Row Tracing Performance on Kd-trees vsO c tre e s.................................. 142
5.11.4 Performance of Row Tracing vs PacketRow T r a c in g ................................... 143
5.11.5 Multi-core Performance .................................................................................... 145
5.11.6 Performance vs Tree Size ................................................................................. 145
5.11.7 Row Tracing with and without H O M s ............................................................. 146
5.12 S u m m a r y ........................................................................................................................... 146

6 Conclusions and Future Work 149


Bibliography 162

Appendices 164

A Software Design 164


A .l Software A rchitecture...................................................................................................... 164
A. 1.1 Renderers and Data S tru c tu re s.......................................................................... 166
A. 1.2 Scene S tru c tu re ..................................................................................................... 169
A.2 Scene and Tree Data Structure R epresen tatio n ........................................................... 172
A.2.1 Scene S tru c tu re ..................................................................................................... 172
A.2.2 Kd-tree Data S tr u c tu r e ....................................................................................... 173
A.3 User Interface D e s ig n ...................................................................................................... 174
A.4 S u m m a r y .......................................................................................................................... 179

B Optimising Row Tracing with SSE instructions 181


B.0.1 SIMD In s tr u c tio n s .............................................................................................. 181

C Low level Optimizations 187


List of Figures

2.1 Ray tracing. The ray tracing method whereby a ray is traced from the viewpoint
through the pixel to find the first intersected object. At the object of intersection,
additional rays are spawned to generate reflections, refractions and shadows. . . . 6
2.2 Ray-box intersection, r a y i does not intersect the box as tentry > t exit . r a y 2
intersects the box as t Kntry < t exjt,................................................................................... 14
2.3 A 2D BSP Tree. The splitting planes (lines in 2D) are selected so that they are
aligned according to the edges of triangles..................................................................... 16
2.4 Kd-tree Construction with the Space Median heuristic and with termination crite­
ria - maximum triangles in leaf node - 2 ....................................................................... 18
2.5 SAH potential split positions. The potential split positions are shown as blue
points. For every triangle, the two extremeties of the triangle with respect to an
axis are considered as potential SAH split positions..................................................... 20
2.6 SAH potential split positions, showing possible incorrect split positions (in red) if
triangles are not clipped..................................................................................................... 21
2.7 Potential for incorrect primitive counts. If the triangles are not clipped, then they
can be incorrectly counted. The two triangles would be considered as being on
both sides, if they are not clipped..................................................................................... 22
2.8 Kd-tree Construction with the Surface Area Heuristic and termination criteria -
maximum triangles in leaf node = 2................................................................................ 23
2.9 Space median and SAH kd-trees constructed on the Dragon model. The SAH
kd-tree more closely wraps the model and reduces the void area............................... 24
2.10 Kd-tree ray traversal.......................................................................................................... 24
2.11 2D version of an Octree (Quadtree). Space is split at even locations and along all
the axes (in 2D)................................................................................................................... 26
2.12 2D version of a uniform grid. The entire bounding volume of the scene is divided
into many smaller even spaces.......................................................................................... 27
2.13 Bounding Volume Hierarchy. Each lower level consists of bounding volumes that
enclose smaller parts of the model until the leaf node level where each bounding
volume has a single primitive........................................................................................... 31
2.14 Kd-tree packet traversal.................................................................................................... 36

3.1 Kd-trees on scenes with predominantly non-axis-aligned triangles. Even though


the SAH kd-trees converge to the model quickly, their void area is still significant.
This is because kd-trees are restricted to using axis aligned splitting planes. . . . 47
3.2 RBSP trees on scenes with predominantly non-axis-aligned triangles. The use of
several additional splitting axes allows the RBSP tree to more closely wrap the
model and minimise void area.......................................................................................... 48

v
3.3 A 2D RBSP tree: In 2D, the splitting planes (lines) can directly be used instead
of the normals (as done in 3D). Hence, the potential set of partitioning planes are
shown (labelled 1,2,3,4). Using this set of axes, a tree that partitions space into
two parts at each step is built............................................................................................ 49
3.4 Leaf nodes and pointers to their component triangles. Nodes in green are leaf
nodes that point to an index in a global list of leaf node triangles.............................. 52
3.5 Space median RBSP trees on scenes with predominantly non-axis-aligned trian­
gles. The split planes are not intelligently placed and hence the space median
method to construct RBSP trees is not beneficial.......................................................... 55
3.6 SAH RBSP trees on scenes with predominantly non-axis-aligned triangles. U s­
ing the intelligence provided by the SAH, the RBSP tree constructed is of better
quality................................................................................................................................... 56
3.7 Potential split points for SAH along an axis. Projecting the end points of the
clipped triangles onto the axis in consideration gives the potential split plane po­
sitions. The SAH cost at these potential points are calculated and the minimum
point is selected as the locally optimal split position.................................................... 57
3.8 RBSP tree ray traversal (2D). Similar to the kd-tree traversal - if t spnt > £en*ry
the left node is traversed, and if t sput < t exa the right node is traversed. If both
conditions hold, as in the figure, both nodes are traversed........................................... 58
3.9 RBSP tree root node for the Bunny with each plane coloured differently................... 63
3.10 RBSP tree bounding volume / root node of the Bunny with each bounding plane
coloured differently illustrating the recursive divide method. If the four comer
pixels of a rectangular region have the same colour, then all the pixels inside this
region have the same entry plane. Otherwise, the region is subdivided into four
smaller regions and the four com er pixels are tested again.......................................... 65
3.11 RBSP tree on the Bunny using 3,8,16 and 24 planes at various depths. Images
show the non-empty nodes at the given depth and leaf nodes at higher depths. The
visualisation shows that RBSP trees built with more axes converge to the model
more quickly than those built with fewer axes............................................................... 66
3.12 Images showing the bounding box and the RBSP tree using 3,8,16 and 24 planes
for few other models. Images show the non empty nodes at depth 16 and leaf
nodes at higher depths. As more axes are used to build the RBSP trees, their
quality improves.................................................................................................................. 68
3.13 Variation of node traversals per pixel over number of split axes used to build the
RBSP tree for ray tracing several models. The more the number of axes, the lower
the number of node traversals (except for the Sponza scene)...................................... 69
3.14 Variation of triangle intersections per pixel over number of split axes used to build
the RBSP tree for ray tracing several models. As the number of axes used increase,
the number of triangle intersections decrease................................................................ 69
3.15 Variation of construction times of RBSP trees over number of split axes for various
models. From the data graphed it was deduced that empirical complexity of RBSP
tree construction is 0 { m } 6N l o g 2( N ) ) .......................................................................... 73
3.16 Faces per node of RBSP trees with various number of split axes for various models.
The number of faces per node appears to be very close to six, irrespective of the
number of splitting axes used............................................................................................ 74

4.1 Naive recursive ray tracing example on a 4 x 4 voxelized grid. The number of
distinctly traversed nodes is a geometric series (1 + 2 + .... + N /2 + N = 2N — 1). 81
4.2 Ray tracing the Sponza s c e n e .......................................................................................... 82
4.3 Object Order Ray Casting rendering t im e s ................................................................... 84
4.4 Node projection. Coherent Rendering projects all the eight vertices of the node
onto the image plane to obtain the node projection for the root nodes...................... 88
4.5 Node projection - projecting only the split plane. For non-root nodes, Coherent
Rendering projects only the split plane to reduce computations................................. 89
4.6 Frustum visibility of a node. Node can either be partially inside the frustum, fully
outside the frustum or fully inside the frustum, as shown in the diagram................. 90
4.7 Examples of nodes occurring between four pixels. Since Coherent Rendering
considers the pixels as points, geometry occuring between pixels are ignored. . . 91
4.8 HOM for the Armadillo model. The image with the green border is the actual
image of size 1024 x 1024. The image labeled 0 is the lowest level of the HOM
and so on. Level 0 of the HOM consists of 256 x 256 pixels. The highest level (5)
of the HOM has just one pixel.......................................................................................... 93
4.9 Region Of Influence in HOM. When a pixel is determined, the four closest pixels
(including itself) are deemed to be in the region of influence. These four pixels
can cause an increment in upto four upper level HOM s............................................... 94
4.10 Leaf node triangles needing clipping due to partly enclosed triangles. To ensure
accurate visibility, only parts of triangles that are fully contained by the node are
to be considered. If other parts are considered, they result in rendering artifacts. . 96
4.11 Rasterising triangles. A subdivision method is followed where the triangle is sub­
divided recursively until they span a single pixel. At this pixel, the triangle is
determined as being visible............................................................................................... 98
4.12 Linearity of the hierarchical rasterisation algorithm. The number of recurive ras­
terisation calls to rasterise the triangle (right) at different resolutions (from 82 to
40962) has been shown. It can be seen from the graph (left) that the number of
calls increase linearly according to image size.............................................................. 99
4.13 Synthetic Benchmarks, (a) original single plane mesh (no subdivision), (b) 16-
plane scene subdivided 6 times, (c) is the final image obtained from all synthetic
scenes rendered using a unique viewpoint, (d) and (e) are the variation of render­
ing times per pixel (/us) according to the scene complexity (i.e, number of subdi­
visions) and the image size for single plane and multi-plane respectively. Results
show that rendering times are constant until the scene complexity becomes greater
than the image complexity. At this point, rendering times become logarithmic.
The similarity of (d) and (e) indicates that only the visible triangles are relevant. . 100
4.14 Mesh-based Coherent Renderings. From left to right: Single triangle, Sponza ,
Sponza with Stanford M odels, Sponza from outside, David and Powerplanl. All
can be rendered at approximately the same speed, provided that the resulting im­
age is large enough............................................................................................................. 101
4.15 Analysis according to the image size, (a) Rendering times for the different scenes.
(b) Rendering times per non-black pixel showing convergence to the same ren­
dering time, (c) Number of nodes traversed - independent of the image size, (d)
Number of rasterisation steps per non-black pixel - convergence to the same ras­
terisation cost per pixel. Note that the two Sponza outside curves overlap on all
graphs................................................................................................................................... 102
4.16 Profiling the Coherent Rendering algorithm.................................................................. 105
5.1 A row of the image as a 2D plane. Tracing this 2D plane through the tree is the
basic idea of the Row Tracing algorithm......................................................................... 110
5.2 Intersection of a box and a plane. As shown, it is sufficient to determine if the two
extreme points of the box in relation to the normal of the plane are on the same
side or either side of the plane........................................................................................... 115
5.3 Node Projection onto a Row. x min and x max indicate the node’s overestimate. If
the overestimate is occluded, then the actual projection is also occluded. Hence,
this overestimate can be used for occlusion detection.................................................. 119
5.4 Calculating the octree node’s vertices, o is the mid-point of the octree, d i and d 2
are vectors to be added to o to obtain the vertices......................................................... 120
5.5 Row-Node intersection. If the two extreme vertices of the node with respect to
the row plane’s normal (shown in blue) are on opposite sides of the row plane,
then it intersects the node.................................................................................................. 121
5.6 Row-plane-triangle intersection. If the vertices of the triangle lie on opposite sides
of the row plane, there is an intersection......................................................................... 122
5.7 Row-leaf-node-triangles intersection. Intersecting the row-plane with the trian­
gles in a node gives several line segments...................................................................... 123
5.8 Intersection segment clipping. Intersection line segments need to be clipped to
ensure that only parts of triangle inside the node are considered................................ 125
5.9 Clipped segments projection. The clipped intersection segments are projected
onto the row to obtain the pixels affected by the triangles in the node...................... 126
5.10 Initial state of the HOM .................................................................................................... 128
5.11 Updating the HOM ............................................................................................................ 129
5.12 HOM update - combining bits horizontally................................................................... 130
5.13 Occlusion testing. If the blue part on the row is to be tested for occlusion, 2 bits
(shown in blue) at level 2 of the HOM is tested............................................................. 131
5.14 Final HOM on the Dragon model. Orange line represents the actual pixel. Each
horizontal line above the orange line is a level of the HOM. In the HOM, each
single coloured block indicates a bit of the HOM at that level.................................... 131
5.15 Leaf node intersection segments - Clipped and projected onto the row. These pro­
jections on the image row are tested for occlusion against the HOMs to determine
inter-node visibility............................................................................................................. 132
5.16 Row-Packet-Node intersection....................................................................................... 137
5.17 Load balancing of Row Tracing and Packet Row Tracing. Each coloured block is
the work allocated to a thread........................................................................................... 138
5.18 Performance of Row Tracing vs PacketRay Tracing and OpenGL............................ 140
5.19 Performance of Packet Row Tracing vs Packet Ray Tracing - Single threaded and
Multi-threaded versions respectively. For smaller sized models Packet Row Trac­
ing is much faster than packet ray tracing. For larger models, though the perfor­
mance difference is not that significant,Packet Row Tracing is still faster..................... 141
5.20 Performance of Row Tracing (Single threaded and Multi-threaded versions) vs
OpenGL. OpenGL performance for first three models clipped to 35 fps and 120
fps in both charts. For smaller models, OpenGL is much faster. However, for
larger models, Packet Row Tracing, especially when multi-threaded, is faster than
OpenGL................................................................................................................................ 142
5.21 Performance of Row Tracing on kd-trees and octrees - Single threaded and Multi­
threaded versions respectively. Shows that Row tracing works well on both kd-
trees and octrees.................................................................................................................. 143
5.22 Performance comparison between Row Tracing and Packet Row Tracing on both
octrees and kd-trees. It is clear that Packet R o m - Tracing is considerably faster than
single Row Tracing............................................................................................................. 144
5.23 Speed-ups achieved by Row Tracing by using four threads a quad core CPU. Ac­
celerations of close to 4 x suggest that R o m > Tracing ishighly parallelisable. . . . 145
5.24 Rendering times using Row Tracing, Packet Row Tracing and PacketRay Tracing
over tree sizes show that while packet ray tracing needs larger trees, Row Trac­
ing and Packet Row Tracing work very well with smaller trees pointing to the
possibility of using Row Tracing for dynamic scenes................................................... 146
5.25 Performance of Row Tracing on kd-trees and octrees with and without HOMs.
HOMs are shown to be highly effective as an occlusion detection m ethod............... 147

A .l Software Class diagram..................................................................................................... 165


A.2 Structure of a Scene in memory........................................................................................ 173
A.3 Obtaining the leaf node triangles..................................................................................... 175
A.4 Application W in d o w s...................................................................................................... 176
A.5 File M e n u .......................................................................................................................... 177
A.6 Setting the camera parameters ...................................................................................... 177
A .l Options to visualise the scene and the data s tru c tu re ................................................. 178
A.8 RBSP tree visualised on the Bunny.................................................................................. 179
A.9 Selecting a rendering m e th o d ......................................................................................... 180
A. 10 Selecting a data s tr u c tu r e ................................................................................................ 180
List of Tables

3.1 RBSP tree node s tr u c tu r e ............................................................................................... 51


3.2 RBSP tree leaf node stru c tu re ......................................................................................... 52
3.3 Scenes and viewpoints used to benchmark RBSP trees............................................... 67
3.4 Ray tracing times with different methods on RBSP trees with different number of
splitting axes for various scenes. The various algorithms are:
Algorithm 1 .1 - Traversal by Linear Interpolation of R ay-Plane Intersection Pa­
rameter.
Algorithm 1.2 - Traversal using SSE.
Algorithm 2.1 - Entry and Exit Plane Determination with OpenGL.
Algorithm 2.2 - Entry and Exit Plane Determination by Recursive Divide.............. 72

4.1 Comparison with ray tracing a p p ro a c h ........................................................................ 85


4.2 Number of nodes (in millions) traversed by a recursive ray tracer (without packets)
and the Coherent Rendering algorithm for a 1024 x 1024 im age.............................. 102
4.3 Performances of Coherent Rendering vs single ray tracing and packet ray tracing
(8 x 8 rays) in frames per second. It is to be noted that the packet ray tracing
implementation has been optimised through the use of SSE instructions. Absolute
performance of Coherent Rendering is observed to be slower than that of packet
ray tracing............................................................................................................................ 103

5.1 Comparison between Rasterisation, Ray Tracing and Row T r a c in g ....................... I l l


5.2 Best performances of packet ray tracing and Row Tracing in frames per second.
8 x 8 ray packets and Packet Row Tracing's kd-tree versions have been used re­
spectively.............................................................................................................................. 139
5.3 Performance comparison between Row Tracing and Packet Row Tracing....................144

A .l Kd-tree node structure....................................................................................................... 167


A .2 Kd-tree leaf node structure................................................................................................ 167
A .3 kd-tree node s tru c tu re ...................................................................................................... 174
A.4 kd-tree leaf node structure ............................................................................................ 174

x
Glossary

Notation Description Page


List
ID 1 dimensional 82
2D 2 dimensional 14
3D 3 dimensional 8
3DDDA 3D-Digital DifferentialAnalyser - Aray tracing algorithm that identifies 27
voxels traversed by a 3D line (ray)
3DS 3DS - The file format used by Adobe 3D Studio Max previously. This 164
format has been superceded by the .max format

AABB Axis Aligned Bounding Boxes - are boxes, or cuboids, that are aligned 29
along the coordinate axes.

BIH Bounding Interval Hierarchy - A hybrid data structure that adapts con- 38
cepts from the kd-tree and the BVH and uses the space median method to
select splitting planes.
BSP Tree Binary Space Partitioning Tree - A binary tree in which space is parti- 11
tioned into two parts and each part is a node of the tree.
BVH Bounding Volume Hierarchy - A ray tracing structure where objects are 14
enclosed by bounding boxes and a hierarchy is created out of these boxes.

CPU Central Processing Unit 13

FLTK Fast Light ToolKit - A cross platform toolkit that allows development of 175
GUIs.
FLUID FL User Interface Designer - The user interface designer that allows de- 176
sign of interfaces using FLTK.

GB Giga Byte - A measure of the amount of data. A Giga Byte equals 1024 99
Mega Bytes.
GHz Gigahertz - A measure of the speed of a CPU. 99
GUI Graphical User Interface - The user interface consisting mainly of win- 175
dows and interacted mostly using the mouse.

HOM Hierarchical Occlusion Map - An image that represents the already oc- 42
eluded parts of the image.

xi
Notation Description Page
List

IDE Integrated development environment - A tool that allows easy develop- 164
ment of applications.

k-DOP k discrete oriented poly tope - A poly tope created by the intersection of 75
planes aligned according to a predetermined set of k axes.
KB Kilo Byte - A measure of the amount of data. A Kilo Byte equals 1024 99
bytes.
kd-tree k Dimensional Tree - A generalised form of a binary space partitioning 11
tree where the splitting planes are aligned according to one of the X, Y or
Z axes.

LCTS Longest Common Traversal Sequence - The longest sequence of nodes 34


traversed by all the rays in a group of rays.

MB Mega Byte - A measure of the amount of data. A Mega Byte equals 1024 139
Kilo Bytes.
MIP Maximum Intensity Projection - A volume rendering method that 83
projects the voxel with the maximum intensity onto the image plane to
create the image.
MLRT, MLRTA Multi Level Ray Tracing Algorithm - A high performance ray tracing 34
algorithm that uses frustum culling and interval arithmetic to trace a group
of rays.

NV Query An OpenGL extension introduced by NVIDIA on their Geforce 3 graph- 43


ics cards.

OBB Oriented Bounding Boxes - are boxes, or cuboids, that need not be 29
aligned along the coordinate axes. Due to the freedom allowed, these
boxes more closely wrap the model.
Octree-R Octree for ray tracing - A structure similar to a normal octree, i.e. one 26
in which an internal node has eight child nodes, but where the subdi­
visions are adaptive. The adaptive nature means that splits need not be
placed evenly along the axes. This allows the octree to use the surface
area heuristic so that it is better suited for ray tracing.
OORC Object Order Ray Casting - A volume rendering method that shows con- 82
stant complexity according to number of primitives.
OpenGL Open Graphics Library - A standard for developing graphics applications. 43

QBVH Quad Bounding Volume Hierarchy - A bounding volume hierarchy where 37


each internal node has four child nodes.

R-tree Rectangle Tree - A tree where the nodes are rectangular and in which the 29
internal nodes may have more than two children.
Notation Description Page
List
RBSP Tree Restricted Binary Space Partitioning Tree - A form of binary space par- 46
titioning tree that allows the splitting plane to be selected by using a pre­
determined set of axes.
RGB Red Green Blue - A format to represent colours where a colour is given 170
as a combination of the colours red, green and blue

S-kd-tree Spatial kd-trees - A structure similar to bounding interval hierarchies. 38


The difference is that the S-kd-tree selects the splitting planes using the
surface area heuristic.
SAH Surface Area Heuristic - A heuristic to select the split position popu- 19
larly used to build hierarchical structures for ray tracing (e.g., kd-trees or
BVHs).
SIMD Single Instruction Multiple Data - Instruction sets that are popularised 13
recently that allow operating on multiple data using a single instruction.
SSE Streaming SIMD Extensions - Intel’s set of instructions that allows sev- 8
eral (four) calculations (either floating point or integer) with one instruc­
tion

VRML Virtual Reality Modelling Language - A language using which 3D scenes 164
are modelled.

XML extensible Markup Language - A markup language that allows users to 172
define their own parameters.

Z-buffer Depth Buffer - indicated as Z-buffer as in graphics, the Z axis represents 38


depth in a scene
Acknowledgements
The research would not have been possible without EPSRC’s funding. The grant has provided
finances with which I could work without worrying about finances. It has also provided funds for
equipment and travel to conferences that has enriched my research experience.

I would like to thank my supervisor Dr. Benjamin Mora who has been a valuable influence on me
for the last four years. He has supported me immensely - technically, academically and morally -
during the course of the research and has provided invaluable ideas and input. His patience is also
highly appreciated. I would also like to thank Prof. Min Chen, who as my second supervisor, has
provided inputs and feedback that has improved the quality of my work. Dr. Robert S. Laramee’s
initiative in organising the Visible Lunch - a platform for valuable discussions and preparation -
is also appreciated.

Swansea University’s Computer Science Department, with a research lab that is highly conducive
to research, has positively influenced my research, both due to the beautiful workspace as well as
due to the people in the lab. My lab-mates have, during the past four years, been wonderful. They
have made my stay in Swansea a very happy one. Special thanks should also go to a few lab-mates
(in no particular order) - Ben Spencer, Tony McLoughlin, Chitra Acharya, Liam O ’Reilly and Ed
Grundy - for proofreading my thesis. In addition, for supporting and maintaining the equipment
of the lab, the technicians of the lab are also appreciated.

Thanks should also go to my tennis and badminton partners - currently Salar Kasto and Ben
Spencer respectively. I would also like to remember my earlier partners - Jibu Panicker, Ganesh
M. and Rajmohan Subash. Regular tennis and badminton have kept me physically and mentally
fit and for this I am highly thankful to them. My friends, who have enriched my life, are also due
credit.

Finally, I would like to thank my parents - Yajnanarayana and Vasanthi Kammaje - without whose
support and encouragement, the Ph.D. would not have been possible. I would also like to acknowl­
edge my soon to be wife, Deepthi, for being supportive and helping me through periods of great
stress.
Chapter 1

Introduction

Computer Graphics are an integral part of modem life. Computer Graphics is defined as - “Most
simply, pictures that are generated by a computer” [HilOO]. The most visible uses of Computer
Graphics are in computer games and movies. However, they are also extensively used in areas
such as automobile engineering, architecture, photography, newspaper and magazines, medical
industry, etc. The predominance means that accuracy and performance are very important.

Rendering is the process of generating images from a model. The model normally represents a
3D object based on a physical entity. To render this model implies generating the image of the
model from a particular viewpoint. The rendering may either be a photo realistic rendering -
which produces images that aim to be close to reality, a non photo realistic rendering - which aim
to have more artistic or stylistic properties, or an interactive rendering - that sacrifices realism for
high interactive frame rates [PH04].

Depending on the type of renderings desired, the methods used to generate the rendering vary
accordingly. Photo realistic renderings use physical properties of surfaces, materials and light to
produce images that are very close to reality.

Ray tracing - which is the basis for several methods that generate photo realistic renderings -
utilises the principles based on the physical properties of light. In its most basic form, as given
by Appel [App68], it is a technique whereby rays are generated and traced, usually backwards -
from the eye / camera through the image plane to the model, to find the closest object that each ray
intersects. When all the rays corresponding to all the pixels of the image undergo a similar process,
the image is generated. The image thus generated does not include any shadows, reflections or
refractions. These additional physical effects are computed through further rays that are generated
at the point of intersection [Whi80].

Although ray tracing is physically based, it does not model indirect lighting - i.e., light that is
dispersed by other objects in the scene. Global Illumination is the name given to the class of
techniques that model both direct and indirect lighting - i.e., lighting that is reflected by diffuse
surfaces - in order to generate images that are highly photo realistic. Images generated with
global illumination look very realistic. Radiosity [GTGB84] was one of the first techniques to
model diffuse indirect reflections.

Rasterisation is the other major technique used to render images. It consists of taking the set of
polygons of the model and mapping them onto the pixels of the screen to generate the image. Since
it is not based on physical properties, optical features like shading, reflections and refractions are

1
2

generated with artificial techniques and are not as accurate as ray tracing. However, it is extremely
fast, making it highly suitable for use when interactive to real time frame rates are necessary. Due
to the high frame rates achieved, it is very popular and is used extensively in computer games.

Motivation and Aims

Irrespective of the rendering method used, the first major step of all rendering methods is to find
the visible surface. Methods that solve this problem are known as visible surface determination
methods or just visibility methods. Very simply, it can be considered as the process of finding the
closest object at every pixel of the image.

There have been several methods like Z-buffer, Area subdivision, BSP tree method, etc., to solve
the problem [FvDFH90]. Ray tracing / ray casting can also be considered as a visibility determina­
tion method when only primary rays are traced. Due to the fact that ray tracing is a very expensive
operation, several innovative structures and algorithms have been developed to improve its perfor­
mance. The underlying principle of most algorithms and data structures is to create coherence and
optimally utilise this coherence.

One of the best structures for ray tracing, especially for static scenes are kd-trees. The Surface
Area Heuristic to construct them has shown to create trees that significantly reduce the number of
ray-node traversals and ray-primitive intersections than naive construction heuristics. Kd-trees,
and other structures built on the scene, improve and allow utilisation o f object coherence by group­
ing closer objects into closer nodes creating coherence by ordering / sorting the scene.

The other form of coherence, more widely used in recent times together with object coherence,
is the use of image coherence by tracing groups of neighbouring rays through the structure. This
method is popularly called packet ray tracing. It works on the basis that rays that are closer in the
image traverse a similar path down the structure. The combination of object and image coherence
has resulted in impressive results. If the visibility problem is considered as a searching problem,
packet ray tracing is using a basic result from searching algorithms that searching k neighbouring
elements in a tree can be achieved in l og( N) + k steps [Ben79]. With this, the number of steps
needed to render the image are reduced significantly leading to accelerated performance.

For the visibility problem, finding only the first intersection for all the pixels is necessary. This
makes it possible to maximise the utilisation of the coherence provided by the structure and the
image. Investigating methods to maximise the coherence provided by ray tracing structures to the
visibility problem is the motivation of the thesis.

Thus, the aims of the research are to develop and investigate new visibility / rendering algorithms
that build upon and utilise the coherence of structures and algorithms currently popular for ray
tracing and to investigate them in that context.

Contributions of the Research

The main contributions of the research are:

• the development and study of a structure that aims to minimise the number of ray-node
traversals and ray-triangle intersections by providing a closer fitting structure - i.e., a struc­
ture that improves object coherence.
3

• the development and study of an algorithm that utilises the entire coherence provided by a
kd-tree and investigates empirically the complexity of this algorithm.

• the development and study of a new algorithm that utilises the coherence provided by a kd-
tree or octree by using a single row at a time - similar to scanline algorithms. The method
is extended so that groups of rows can be traced to maximise coherence utilisation.

Thesis Outline

The thesis starts off with this chapter - Introduction - that introduces the main motivations, aims
and contributions of the thesis.

Chapter 2 - describes the previous and related work in ray tracing. It also describes briefly the
popular visibility methods of the other popular method of rendering - rasterisation.

Chapter 3 - introduces and describes Restricted Binary Space Partitioning Trees (RBSP Trees) in
detail. It describes the motivation, the heuristics to construct them, the algorithms to traverse them
for producing images and finally compares it to kd-trees.

Chapter 4 - Introduces the concept of Coherent Rendering, develops the algorithm and investigates
it from the point of view of empirical complexity.

Chapter 5 - introduces the algorithm - Row Tracing - and its packet variant - Packet Row Tracing
- in detail and provides the results when the algorithm is used to generate an image.

Chapter 6 - briefly summarises the contributions of the thesis. It will also provide a brief list of fu­
ture work that can be attempted to realise the potential of the introduced structure and algorithms.

Publications

The following are the publications and technical reports achieved during the course of the Ph.D.
The thesis describes the methods and results of these these publications in greater detail.

Kammaje, R.P.; Mora, B., “A Study o f Restricted BSP Trees fo r Ray Tracing,” IEEE Symposium
on Interactive Ray Tracing, 2007. R T ’07., pp.55-62, 10-12 Sept. 2007. [KM07] - Chapter 3 is
based on this publication. It describes the structure and the methods described in the paper in
greater detail. The chapter also provides additional traversal methods, one of which significantly
improves upon the results presented in the paper.

Benjamin Mora, Ravi Kammaje and Mark W. Jones, ‘‘On the Lower Complexity o f Coherent
Renderings,” Swansea University, Technical Report, 2008. [MKJ08] - Chapter 4 provides the al­
gorithm in greater detail with the results pointing to the possibility of a lower complexity followed
by a discussion regarding the complexity of the algorithm.

Kammaje, Ravi P.; Mora, Benjamin, “Row tracing using Hierarchical Occlusion Maps,” IEEE
Symposium on Interactive Ray Tracing, 2008. R T ’08., pp.27-34, 9-10 Aug. 2008. [KM08] -
Chapter 5 describes the Row Tracing algorithm presented in this paper in much greater detail.
Row Tracing traverses a kd-tree or octree using an entire row of the image at a time to maximise
coherence. A ID version of Hierarchical Occlusion Maps is used to determine occluded nodes
and occluded parts of triangles. Hierarchical Occlusion Maps are shown to be responsible for both
4

accurate visibility determination as well as acceleration of performance. To maximise coherence


and performance, an adapted version of the algorithm that traces groups of rows is also developed
and investigated.
Chapter 2

Ray Tracing - Algorithms and Data


Structures

C o n ten ts
2.1 Ray T r a c in g ................................................................................................................ 5
2.2 Ray Intersections ...................................................................................................... 6
2.3 Acceleration S tru ctu res............................................................................................ 14
2.4 Packet Ray T r a c in g ................................................................................................... 32
2.5 Anti-Aliasing and Incoherent R a y s .............................................. 35
2.6 Dynamic Ray T r a cin g ............................................................................................... 36
2.7 Other Visibility M e t h o d s ......................................................................................... 37
2.8 S u m m ary.................................................................................................. 42

W hile rendering from a viewpoint, the first step is to determine the objects that are visible from the
viewpoint. There have been several algorithms and data structures that have been used to achieve
this and these are grouped into the class of visibility determination methods. Ray tracing / ray
casting can be considered as a visibility determination method when only primary rays are traced
to determine the closest object at each pixel. There has been extensive research in ray tracing lead­
ing to the development of many data structures and algorithms to efficiently perform ray tracing.
The other main form of rendering - rasterisation, also has several visibility determination meth­
ods. In this chapter, the data structures and algorithms for ray tracing will be described followed
briefly by other visibility determination methods.

2.1 Ray Tracing

Ray tracing as a method to determine visible surfaces was introduced by Appel [App68], The
method introduced by Appel is more commonly known as ray casting. Ray tracing is the rendering
method introduced by Whitted [Whi80], in which a ray from the eye / camera for each pixel of
the image is cast and the first object in the scene that intersects the ray is found. Upon finding this
intersection, further rays - a shadow ray from the intersection point to the light source, a reflection
ray determined by the rule that its angle is equal to the angle of incidence, and if the object is
transparent a refraction ray whose direction is found based on Snell’s law - are spawned and a

5
2. 1 R ay Tracing 6

tree o f rays is fo rm ed (F igure 2.1). T he sh ad e r then trav erses this ray tree, finds the in tersec tio n o f
each seco n d ary ray and g ath ers the c o n trib u tio n o f each seco n d ary ra y to u ltim a te ly sh ad e the pixel
in co n sid eratio n . D ue to the m e th o d 's close sim ilarity to p h y sical reality, the im ag es g en e ra te d are
o f excellent quality. H ow ever, ray tracin g as d escrib ed m o d els o n ly d irect illu m in atio n - sp ecu lar
and refracted c o m p o n en ts. In o rd e r to he m ore realistic, in d irec t illu m in a tio n also needs to be
m o delled and the m eth o d s to g en erate such im ag es are c lassified u n d e r the g ro u p o f m eth o d s
know n as G lo b a l Illu m in a tio n .

A n o b serv atio n to he noted is that the rays in ray trac in g , in c o n tra st to p h y sic a l reality, o rig in ate
from the cam e ra rath er than the light so u rce itself. T h is allow s o n ly the set o f relev an t rays to be
co n sid ered . A lso, several o b jects m ay be o c clu d e d by o b jects in fro n t o f th e m - in w h ic h case the
rays are b lock ed by o th e r o b jects in front. T h u s, tra c in g these ray s is u n n ec e ssa ry and w astefu l.
O nly the rays o f in terest - rays in itially o rig in atin g fro m the c a m e ra and p assin g th ro u g h the im ag e
plan e - are traced.

reflected ray refracted ray

intersected object

shadow ray

viewpoint

light source

F igure 2.1: R ay tracing. T he ray tracin g m ethod w h ereb y a ray is traced from the v iew p o in t
through the pixel to find the first in tersected object. A t the o b ject o f in tersec tio n , ad d itio n a l rays
are sp aw n ed to g en erate reflectio n s, refra c tio n s a n d sh ad o w s.

T he ren d erin g tim es to g en erate a few im ag es, 44 m in s, 74 m in s an d 122 m ins, are sp ecified by
W hitted. T he a lg o rith m w as im p le m e n te d in C on a V A X -1 1/780 c o m p u te r ru n n in g U N IX and
the im ag es had a reso lu tio n o f 4 8 0 x 640 w ith 9 bits p e r pixel. T h is relativ ely p o o r p e rfo rm a n ce
w as due to the h ard w are lim itatio n s as also due to the fact that it w as a new a lg o rith m w ith a
very large n u m b e r o f c a lc u latio n s p erfo rm ed . A ppel u sed sim p le sp h eres as b o u n d in g v o lu m es o f
the ob jects to sim p lify in tersec tio n c alc u latio n s. H ow ever, it w as rev ealed that fo r sim p le scen es,
in tersectio n c a lc u la tio n s fo rm ed 75 % o f the tim e spent by the ray tracer. T h is w as e x a c erb a ted for
m ore com plex scenes fo r w h ich the in tersec tio n c a lc u la tio n s fo rm ed u pto 9 5% o f the tim e spent,
ten d in g to 100% as the n u m b e r o f o b jects increase to a v ery large num ber. H ow ever, the im ag es
g en erated by the m eth o d w ere o f a very high quality, en g en d e rin g im m e n se in terest in a c celeratin g
ray tracing to g en erate sim ilarly h igh q u ality im ages, but w ith a fa ste r p e rfo rm a n c e .

T he id en tificatio n o f ray in te rse c tio n s as the m ain cau se o f the p o o r p e rfo rm an c e o f ray tra cin g led
to ex tensive re search - both in a cc e le ra tin g the actual o b je c l-ra y in te rse c tio n s and in the use o f
acceleratio n stru ctu res to red u ce the n u m b er o f o b je c l-ra y in te rse c tio n s. A s S ectio n 2.3 w ill show,
2.2 Ray Intersections 7

numerous acceleration structures have been invented and researched and are of varying quality
with respect to ray tracing performance. For a particular scene, the number of object intersections
that a ray tracer has to perform is highly dependent on the acceleration structure used. However,
irrespective of the acceleration structure used, the ray has to be intersected with the objects at
some stage of the algorithm. Hence, fast object-ray intersection calculations are imperative.

2.2 Ray Intersections

A ray is a semi-infinite line and can be represented by its parametric form as

r (t) = o + td (2.1)

where o - origin of the ray


t - is the parameter and mostly t > 0
d - direction of the ray

For primary rays, the origin is the viewpoint from which the scene is to be rendered. For secondary
rays, it is the intersection point of the primary ray and the object. Since the ray travels only in
one direction - forwards - objects that are behind the origin are discarded. The direction is given
by the vector from the source of the ray to one of the points that it passes through. It is usually
normalised. In case of primary rays, the direction of the ray, d, is given by normalizing the vector
from the viewpoint to the pixel being considered.

The problem of finding the intersection reduces to finding the parameter, t, at which the ray hits
an object in the scene. Several objects may intersect the ray along its path. But, only the first
object that it hits is relevant. This is determined by selecting the object with the minimum positive
intersection parameter. A negative intersection parameter indicates that the intersection is behind
the viewpoint and hence such intersections are disregarded.

Scenes are most commonly represented using triangles. Most popular acceleration structures use
boxes / rectangular cuboids. Hence, fast methods to intersect the ray with triangles and boxes are
important. One of the most popular acceleration structures - the kd-tree - uses an axis-aligned
plane and hence intersections between a ray and an axis-aligned plane are also considered. Finally,
spheres, in addition to being a common primitive, have also been used as bounding volumes in the
first ray tracers and hence ray-sphere intersections are also discussed.

2.2.1 R a y -T ria n g le intersectio n s

In a majority of scenes, objects consist exclusively of triangles. They are the simplest possible
polygon and can be compactly represented. Other polygons can be easily broken down into trian­
gles. Complex objects with curved surfaces can be approximated to fine detail depending on the
number of triangles used [SB87]. Thus, they are the predominant primitive in ray tracing scenes.
In many ray tracers (as in ours) triangles are the only primitive supported. Pharr et al. [PKGH97]
use a similar approach of rendering only triangle based scenes and found that any disadvantages
of this approach was outweighed by the advantages. The predominance of triangles is both the
cause and effect of extensive research in efficient ray-triangle intersections.
2.2 Ray Intersections 8

The most obvious ray-triangle intersection method intersects the ray with the plane of the triangle
and determines whether the intersection point is within the triangle [Bad90] [ray]. Haines [Hai94]
gives a few strategies to determine if a point is within a polygon.

One of the strategies to verify if a point is inside the triangle is given by Arenberg [Are88]. It
computes two of the three barycentric coordinates of the intersection point by using the triangle’s
normal - computed either at intersection time or pre-computed and stored earlier. Due to the
properties of barycentric coordinates, it is only necessary to verify if they are both positive and
their sum is less than one.

Moller [MT97] provides a method that eliminates the need for the triangle’s plane equations by
computing the barycentric coordinates and the intersection parameter t by translating the triangle
onto the Y Z plane with the ray along the X axis and then transforming it to a unit triangle. As this
method does not need the plane equation of the triangle, this method can be used when memory
consumption is a priority.

Wald, in his thesis [Wal04], provides a similar method that is also an optimisation of the barycen­
tric coordinate test. The method first computes the intersection between the triangle’s plane and
the ray. Then the barycentric coordinates are computed by projecting the triangle onto the axis-
aligned plane on which the triangle projects the maximum area. By mathematically simplifying
the expressions for the barycentric coordinates of the intersection point, a few per triangle con­
stants are identified. By pre-computing these constants, the number of operations for the test are
reduced to a small number (worst case: 10 multiplications, 1 division and 11 additions, best case:
4 multiplications, 5 additions and 1 division). The method is shown to be easily vectorised us­
ing SSE instructions (Intel’s Streaming SIMD Extensions) [Int08] [SSE09b] to intersect four rays
with one triangle.

Another method used for the intersection is the use of Plucker coordinates. It has been used
extensively to determine the ray-triangle intersections [Eri97] [Sho98] [TH99] [JonOO]. Plucker
coordinates are a mapping of a 3D line into a 5D coordinate system. The coordinates of the 5D
system are found by a cross product of the two end points and a subtraction of the two points. It
provides a simple method to determine the orientation of one line with respect to the other with an
inner product of the vectors. By considering the three edges of the triangle and the ray as vectors,
the intersection of the ray to the triangle is found.

Segura and Feito [SF01] provide a method that is faster than either M oller’s [MT97] or Badouel’s
[Bad90] that has been shown to be mathematically equivalent [ 0 ’R98] [KS06] [Eri07] to the
intersection test with Plucker coordinates with a better efficiency. The scalar triple product - rep­
resenting the signed volumes of the parallelepipeds defined by three vectors - is used to determine
if there is an intersection or not. By eliminating complex operations, the algorithm is simple, fast
and robust. As provided, the method only determines if there is an intersection or not and does
not calculate the actual point of intersection. However, if there is an intersection, the barycentric
coordinates and the intersection points are easily determined by the values calculated as a part of
the algorithm.

The scalar triple product can be defined as a function of three vectors, sig n S D and can be given
as:

s i gnZD( p, q, r) = p .( q x r) (2.2)

where
2.2 R ay Intersections 9

p . q and r are v ectors


T h e sy m b o ls . and x in d icate a v ecto r dot p ro d u c t an d a v ecto r c ro ss p ro d u ct respectively.

If o and d rep resen t the v iew p o in t and the d ire ctio n o f the ra y b ein g traced and a . b and c are
the three v ertices o f the trian g le, the in tersec tio n can be d e te rm in ed as show n by the p seu d o co d e
below :

side3 = w[2] > 0


side! = w [0] > 0
if (side3==sidei)

side2 = w [1] > 0


if (side3-— side2)

return true;

return false;

L istin g 2.1: T ria n g le -ra y in te rse c tio n u sing sc a la r trip le p ro d u cts

If there is an in tersec tio n, the b a ry cen tric c o o rd in a te s o f the in te rse c tio n point are given by the
valu es in w. U sing these, the actual in te rse c tio n p o in t can be e a sily c o m p u ted as sh o w n by the
p seu d o co d e below :

intersect i o n P o i n t [0]= w [0]* a [ X ] + w [1]* b [X ]+ w [2]* c [X]


intersect ionPoint [ n = w [ 0 ] * a [ Y]+w[l ]*b[Y]+ w[2]*c[ Y]
in t e r s e c t i o n P o i n t [2]=w[0]*a[Z]+w[l]*b[Z] >w[2]*c[Z]

L isting 2.2: C o m p u tin g the ra y -tria n g le in te rse c tio n point u sing th e b a ry c e n tric co o rd in a tes
c a lc u la te d in L istin g 2.1.

D ue to the sim p lic ity o f the m eth o d , it is used in o u r im p lem en tatio n to in tersec t w ith trian g les.
T h e m eth o d has been v ecto rise d using S S E in s tru c tio n s w hen p ack et ra y tra cin g is used to in tersect
fo u r rays w ith one trian g le. W h en larg er p ac k e ts are used, the p ack ets are split into g ro u p s o f four
ra y s and in tersec te d using the S S E version.

C lo se r o b serv a tio n o f the above in tersec tio n p se u d o c o d e , intersectionRayTriangie, reveals


that there ex ist a few o p tim isa tio n s in the p ro c e ss like c o m p u tin g the v alu es o f ( a — o ) . ( b —
o ) . (c — o ) ju s t once in the m ethod. F u rth e r re d u ctio n s in the n u m b er o f c a lc u la tio n s are also
p o ssib le using the p ro p erty o f sc a la r p ro d u cts and c ro ss p ro d u c ts giv en below .
2.2 Ray Intersections 10

• P -(q x r) = q .( r x p ) = r .( p x q)

• p x q = —q x p

In addition, when triangle meshes are used, the vertices and edges of a triangle are shared by other
triangles pointing to the possibility of several pre-computations as Amanatides and Choi [AC97]
suggest. By sharing these computations, they could reduce the worst case computations from 51
flops to 33 flops. However, in our implementation, these optimisations have not been applied as
we concentrate more on reducing the number of intersections performed.

In addition to the simplicity of triangles, the availability of several efficient intersection methods
has popularised the use of triangles as primitives. However, for a few other primitives like spheres
and boxes, efficient intersection methods are available that make it cheaper to intersect them as
whole primitives instead of splitting them into triangles.

2.2.2 R a y -S p h e r e in tersection s

Spheres are a common primitive in a few scenes. In addition, they can be used as bounding
volumes to accelerate ray tracing as ray-sphere intersections are reasonably fast.

The mathematical method to intersect a sphere with a ray, as given in Realtime Rendering [AMH02],
uses the implicit mathematical equation of the sphere:

/ ( p ) = Up - c|| - r = 0 (2.3)

where

c - is the center of the sphere


r - is the radius of the sphere
p - is a point on the sphere

At the intersection point, both the ray’s parametric equation and the sphere’s equation should hold.
Substituting the ray equation into the sphere equation at this point gives:

/ ( r(0 ) = ||r(*) - c|| - r = 0

expanding the parametric equation of a ray in the sphere equation provides the following at the
point of intersection.

||o + t d - c |) = r
(o -I- t d — c ).(o + td —c) = r2
t 2d 2 + 2 f(d .(o — c)) + (o —c ).(o — c ) — r 2 = 0 (2.4)

If we consider
b = (d .(o - c))
c = (o —c ).(o - c) - r 2
2.2 Ray Intersections 11

and when d is normalised, d 2 = 1

then, Equation 2.4 reduces to

t 2 -t- 2tb c — 0

adding b2 to both sides and simplifying, we have

t 2 +- 2
2tb
tb + bb2 c
t 2 + 2tb + b2
2
(t + b ) 2

t b ± y /b 2 —c
t —b ± \ J b 2 — c (2.5)

If the solutions are real, then the ray intersects the sphere and if the solutions are imaginary, then
the ray does not intersect. This can be easily computed by determining if b2 —c < 0 or not. Thus,
if as in the case of bounding spheres, it is only necessary to determine if the sphere is intersected
or not, then determining whether b2 — c < 0 is sufficient.

The book also discusses a geometric solution as given by Haines [Hai89] that uses the geometric
method rather than the algebraic method. The geometric solution considers the various cases
where a ray may not intersect and tries to determine these cases with minimal calculations resulting
in a more efficient method when the ray misses the sphere. However, in cases where the ray
intersects the sphere and the intersection points are necessary, the computations are of similar cost
to the algebraic method.

Although ray-sphere intersections are fairly fast, normally scenes in computer graphics do not
consist of many spheres. Also, using them as bounding volumes is not very efficient as they do
not closely fit most objects in the scene.

2.2.3 R a y -P la n e in tersection s

The ray-plane intersection is used by ray tracers either as a part of the ray-triangle intersections or
to determine if a ray intersects a node when a Binary Space Partitioning tree(BSP Tree) or kd-tree
is used. The mathematical equations of the ray and the plane are used to achieve the intersection.

If the plane to be intersected is given by

/ ( P) = n .( p - p 0) = 0

where n - is the normal to the plane


Po - is a point on the plane

then, at the intersection point, both the ray’s parametric equation and the plane’s equation should
hold. Hence, the equation of the ray can be substituted for the value of p to give:
2.2 Ray Intersections 12

f ( r ( t ) ) = n .( r ( t) - p 0) = 0

expanding the above equation gives

n .((o + td ) - p 0) = 0

solving this equation gives the solution for the parameter t at which the ray intersects the plane as

n .p n — n .o
t = (2.6)
n .d

that can be factorised to

t = n ( P 0 : o) a?)
n .d

2.2.3.1 Ray-Axis-Aligned Plane Intersections

Geometrically, a dot product indicates the projection of one vector onto the other. When only
axis-aligned planes are considered, the coordinate axes themselves are the normals of the plane.
The dot product of a vector with the axis (the projection of a point on to an axis) is just the
corresponding coordinate. For eg., the projection of point (1,2, 3) onto the X axis is 1, onto the
Y axis is 2 and onto the Z axis is 3. Using this result, Equation 2.6 reduces to:

, Paxis Vaxis
tint ~ ------- (Z-<V
(*aris
where
tint ~ is the parameter of intersection
Paxis ~ is the point’s coordinate along a xis
Oaxis - is the corresponding coordinate of the ray’s origin
daxis - is the corresponding coordinate of the normalised direction of the ray

As the equation shows, it is a very efficient method.

This simple and efficient intersection method between a ray and plane is one of the causes for the
popularity of the kd-tree - that consists entirely of axis-aligned split planes. In addition, it is also
used for one of the more popular ray-box intersection methods.

2.2.4 R a y -B o x In tersection s

One of the most popular methods to intersect a ray with a box is the method proposed by Kay [KK86],
more popularly called the slabs method. A pair of planes along an axis is defined as a slab. For
2.2 Ray Intersections 13

each slab, the parameters, te.ntry at which the ray enters the slab, and t exu at which the ray exits the
slab, is computed using the ray plane intersection method given by Equation2.7 (If the plane is axis
aligned, then Equation 2.8 can be used instead). The interval given by t entry and t exa determines
if the ray intersects the slab. If t entry < t exa, then there is an intersection between the ray and the
slab in consideration. A bounding volume is the intersection of several slabs and consequently, the
intersection of the ray with the bounding volume is the intersection of the intersection intervals.
Hence, the ray intersects the volume i f - in a x ( tentry o f all slabs) < rn in (texit o f all slabs). For
an axis aligned bounding box, this method is shown by the set of Equations 2.10.

It is a very simple method that calculates the intersection of the ray with the six planes of the box
and determines intersection by using the values of the intersection parameters. Additionally, it
also provides the intersection parameters at which the ray enters and exits the box.

The method has been improved by Williams et al. [WBMS05] by eliminating degenerate intervals
caused due to floating point values of —0.0. They also propose pre-computing the results of the
division operation to optimise it. However, it still has branches whose misprediction can be quite
detrimental. Hence, a branch-less version using the min and max operations provided by the SIMD
instruction set can result in much faster intersection with boxes [GM03] [BP04] [BP05].

The intersection between a ray and an axis-aligned plane is given by Equation 2.8. However, it
is well known that division is an expensive operation [SL96], even on modern CPUs, and thus
minimizing it is imperative. It may be observed that the term j — is a constant for a ray that can
be pre-computed and stored as drcc . The intersection operation can thus be rewritten as

linl — (Paxis Oaxis)d-re.r. (—-9)

At the root node, the parameters for the six planes - one entry plane and one exit plane along each
axis - of the bounding box / root node are computed as below.

tx — ( b b \ x en t ry\ Ox ) d r e e

tx = (bblxexit] -—Ox ) d r e c .
ty = ( b b [ y e n t r y \ —O y ) d t ( c
ty
II

Oy)drec
H

tz — ( b b [ z e nt , r y\ O z ) d re.r

tz = { b b [ z e x it] Oz ) d r e c

tentry — max(tx . ty ,t

l e. xi t = min(tx i I'll i Iz

where
bb - is an array of the six coordinates representing the maximum and minimum coordinate along
each axis ordered axis-wise, i.e., first two values are the minimum and maximum values for the X
axis, the next two are values for the Y axis and the final two are values over the Z axis
Xentry ,Xexit,yentry,yexit,Zentry ,zexit - are the indices of the entry and exit coordinates along
each axis.
tentry and t cxit - t parameters at which the ray enters and exits the box
2.2 R ay Intersectio ns 14

X2

Y
Y.2

For ray1 2 • terXry <t exit


=> no intersection => ray intersects box

F igure 2.2: R a y -b o x in tersec tio n , r a y ! d o es n o t in tersec t th e b o x as t entry > l e n t - r a y 2


in tersec ts the box as t cntry < t,cx1(.

O nce the t entry an d t ex.jt p a ra m e ters are co m p u ted , the r a y - b o x in te rse c tio n is d e te rm in ed by
sim p ly c o m p a rin g the tw o. If t.tniry > t exu , then there is no in te rse c tio n , o th e rw ise th ere is an
in tersec tio n w ith the ray e n terin g at t cntry and ex itin g the box at t exit- F ig u re 2.2 sh o w s th is in
2D.

T he entry and exit c o o rd in a te s a lo n g each axis d e p en d s on th e d ire c tio n o f the ray - d ete rm in e d by
the sign o f d rec . If d rtc is p o sitiv e, the ra y is said to trav erse in the p o sitiv e d irectio n , o th e rw ise it
is said to trav erse in the neg ativ e d ire c tio n . T he d ire c tio n o f the ray d e te rm in e s the o rd er in w h ich
the ray en ters an d ex its the box. N o rm ally in a ray tracer, th ese are p re -c o m p u te d an d sto re d in
a v ariab le w ith 1 in d ic atin g that the ray is trav ellin g in the p o sitiv e d ire c tio n and 0 in d ic a tin g a
negative d ire c tio n fo r the ray.

O th er m eth o d s o f ra y -b o x in te rse c tio n s have b een re se a rc h e d . O n e o f the m eth o d s is to co m p u te


the in tersec tio n by P lu c k e r c o o rd in ate s [M ah05] IM W 04J. T h e m e th o d has the ad v a n ta g e in th at
it d o es not need d iv isio n s w hen the in te rse c tio n p o in t is not n eed e d . M ost o f the c o m p u ta tio n s
o f th is m eth o d are v ecto r o p eratio n s that allow o p tim isa tio n u sin g S S E in stru ctio n s. T h e m eth o d
can be used efficien tly fo r trav ersin g B V H trees w h ere the in te rse c tio n d ista n c e is not n o rm a lly
necessary. W oo [W oo90] p ro p o se d a m eth o d th at id en tifies th e b a c k -fa cin g p lan es o f the box,
red u cin g the n u m b e r o f p la n e s to be tested to three. A n o th e r re c e n tly p ro p o sed m eth o d fo r ax is-
aligned b o u n d in g b o x es (A A B B s) p ro je cts the ray and the A A B B o n to the th ree a x is-alig n ed
p lan es and d e te rm in e s in te rse c tio n u sing the slo p es o f the p ro jec ted ra y [E G M M 07J. T h e m eth o d
red u ces the 3D p ro b lem to a 2D p ro b lem and is d iv isio n free. T h e a u th o rs claim a p e rfo rm a n c e
adv an tag e o f 18% o v e r the fastest m eth o d . H ow ever, the m e th o d d o e s not d ete rm in e the actu a l
in tersec tio n d ista n c e s that n eed to be d eterm in ed w ith a d d itio n a l c a lc u la tio n s if necessary.

In spite o f the sev eral m e th o d s availab le, the slabs m eth o d a p p e ars to be the m ost p o p u lar m eth o d
2.3 A ccelera tio n S tru ctu res 15

to intersect a ray with a box. This is mostly due to it being simple. It is also a method that can
be easily vectorised with SSE. In addition, the method provides the intersection distances without
additional calculations.

2.3 Acceleration Structures

Although very fast prim ilive-ray intersection methods exist, intersecting every primitive with ev­
ery ray to determine each pixel of the image is prohibitive. Thus, divide and conquer methods by
which it is possible to determine the primitive at each pixel by intersecting the corresponding ray
against a small subset of primitives are used. The structures that facilitate this - acceleration struc­
tures - are classified into two main classes: object subdivision structures and space subdivision
structures.

Acceleration structures work on a simple principle. The structure subdivides the scene - either
the space or the objects - into several groups of either uniform or varying granularity. Each
subdivision, called a node for tree structures, usually contains (either fully or partly) a few objects.
If a ray does not intersect the enclosing structure, then the objects enclosed are also not intersected
by the ray. On the other hand, if the ray intersects the subdivision, then the ray may intersect
one of the objects in the node. Thus, only objects in intersected subdivisions need to be tested.
Acceleration structures allow reduction of the number of primitives that the ray has to intersect to
a very small number.

However, tracing a ray through an acceleration structure adds additional computation to ray trac­
ing. The cost of ray tracing is given by Weghorst et al. [WHG84] as

Rf = N t * C t + N p i * C p i (2.11)

where R t - is the rendering time


N p - is the number of node (or bounding volume) traversals
Cp - is the computational cost of traversing a node (or bounding volume)
N p i - is the number of primitive intersections
C p i - is the computational cost of a primitive intersection

If acceleration structures are not used, the first term would be zero. However, the second term
would be very large resulting in an impractical cost for ray tracing any scene consisting of more
than a few primitives. Hence, acceleration structures for ray tracing - that increase the first term,
but reduce the second term significantly - are necessary.

One class of acceleration structures - space subdivision structures - divide the space in a scene
into sub-spaces and classify the primitives as being included in any of the subspaces. A few of the
space subdivision structures that are used to accelerate ray tracing are BSP trees, kd-trees, octrees
and grids. Some of the more popular methods to construct these structures along with the ray
tracing method using them will be described briefly in the following sections.
2.3 A c c e le ra tio n S tru ctu res 16

triangles

arbitrarily aligned
split planes

F ig u re 2.3: A 2D B S P T ree. T h e sp littin g p lan es (lin es in 2 D ) are se lected so th a t they are


alig n ed acco rd in g to the ed g es o f trian g les.

2.3.1 B in ary Space P artitionin g Trees (B SP Trees)

B S P trees - B inary S p ace P artitio n in g trees are h ie rarch ica l stru c tu re s - m o re w id ely used in
hidden su rfac e a lg o rith m s (F K N 80] [T hi87] [G C 9 1 ). B S P trees p a rtitio n space into tw o parts. In
th eir m ost gen eral fo rm , th ese trees can p artitio n sp ace alo n g any arb itra ry axis. T h e p artitio n in g
axis is n o rm ally selected from a m o n g st one o f the p la n e s o f the scen e. T he p o ten tial o f B S P trees
to effectively sep arate trian g les and e m p ty sp ace to c reate a stru c tu re that clo sely iits the scen e is
quite clear. F ig u re 2.3 show s a B S P tree in 2D.

O ne o f the o n ly know n early im p le m e n ta tio n s o f ray tracin g on B S P trees [T hi87] states that it
can p ro v id e im p ro v e m en ts o v er ray tracin g p e rfo rm ed w ith o u t p a rtitio n in g . It uses a m ed ian cut
schem e that g e n erates b alan c ed trees. H ow ever, the efficiency o f th e stru ctu re w ith resp ect to ray
tracing is in do u b t as a c o m p ariso n ag ain st m ore p o p u la r stru c tu res is not p ro v id ed . R ecently, Ize
et al. [IW P 08] show that the use o f a rb itra rily alig n ed p la n e s can be useful fo r ray tracin g . T h e ir
m ethod w ill be d escrib ed in fu rth e r detail in C h a p te r 3.

O th er than that. B S P trees in th eir g en eral form have not b een used for ray tracin g . M o st often,
the stated reaso n is the d ifficulty in co n stru c tin g a tree that is g o o d fo r ray tracin g . In ad d itio n , the
fact that the p lan es can be arb itra ry m ak es sto rag e and trav ersal m o re ex p en siv e |C h a0 1 ] |S F 9 0 ].
Instead, k d -tre e s - a class o f B S P trees that restricts the sp littin g p la n e s to be a x is-a lig n e d - are
used frequently. A n o th e r su b set o f the B S P tree, that in m an y w ay s is sim ila r to k d -tre es, is
R estricted B S P trees inv estig ated by us [K M 07]. T h is stru c tu re an d asso c ia te d m eth o d s to ray
trace w ith it w ill be d etailed fu rth er in C h a p te r 3.
2 .3 A ccelera tio n Structures 17

2 .3 .2 K d -trees

The most popular space subdivision structure for ray tracing has been the k dimensional tree,
more popularly known as the kd-tree. It was invented as a search structure by Bentley [Ben75].
It was adapted to ray tracing by Kaplan [Kap85] (calling them bintrees) and then by Fussel and
Subramanian [FS88] who referred to it by the current name. It was adapted using the existing
research on BSP trees. The first implementation preferred well balanced kd-trees to reduce the
depth of the tree. The implementation used splitting planes placed either at the space median - the
mid-point between the minimum and maximum points or the object median - a split with which
half of the objects are on either side of the splitting plane.

2.3.2.1 Construction

The main criteria for rendering performance with the kd-tree is the quality of the trees constructed.
A well constructed tree can be several times faster for ray tracing than a poorly constructed tree
and hence most research has concentrated on improving the quality of kd-trees.

A kd-tree is a recursive structure where every node can be considered as the root node of the sub­
tree below it. Naturally, construction of a kd-tree is undertaken in a recursive manner. It is a top
down process where initially the entire space is considered. A bounding box is created for all the
primitives lying in this space by taking the minimum and maximum coordinates of the primitives
along each axis. One among the X , Y and Z axes is selected and space is split along that axis by
placing a plane, that is normal to the axis, at the selected point. The primitives are then classified
as being on one of the sides of the plane. Some primitives could lie on both sides of the plane and
in this case, the triangles are generally included in both the space partitions. Another solution to
this problem could be to clip the triangles at the splitting plane. However, in our implementation,
the first approach is used. Subsequently, each partition of space is a node of the tree containing
the triangles lying in that part of space.

At each node, the space represented by the node is again partitioned into two by selecting an axis
and a point on this axis where the splitting plane is placed. The primitives are classified again and
the two space partitions created are made into child nodes. The process is recursively continued
until predetermined criteria - called the termination criteria - are met. At this point, the node is
made into a leaf node that is not divided further. The two termination criteria normally used are:
the depth of the tree of the node, and the number of primitives contained in the node.

The pseudocode below and Figure 2.4 show the recursive construction process.
constructKDTree(Node node, i n t []nodeTriangles)
(

//If depth of the tree reaches the termination depth


//or if the number of triangles in the node is less than
//a small predetermined number, make it into a leaf node
i f (depth==MAX_DEPTH
or nodeTriangles.size <= MIN_NODE_TRIANGLES )
{
//Make the node a leaf node.
n ode.leafNode = true;
n ode.noOfTriangles = nodeTriangles.length;
//The index where the triangles in this node start from
//is the last index before the triangles are added
n ode.trianglelndex = leafNodeTriangles.length;
2.3 A c c e le ra tio n S tru ctu res 18

//Add the triangles to the global leaf node triangle list


leafNodeTriangles.add(nodeTriangles);
return;
}
//Find the split axis and posit ion to split the node
n ode.splitAxis = findSplitAxis();
n ode.splitPosition = findSplitPosition {);

//Classify the triangles into left part and right part


i n t [] leftTriangles = findLeftTriangles(splitAxis, splitPosition);
i n t [] rightTriangles = findRightTriangles(splitAxis, splitPosition);
//Recursively construct left and right parts
constructKDTree (node.leftNode, leftTriangles);
constructKDTree (node.rightNode, rightTriangles);

L istin g 2.3: R ecursive k d -tree c o n stru c tio n alg o rith m .

< primitives

^ ^ - > S p lit pie

bounding b o x ^ ^

- '

* ►
F igure 2.4: K d-tree C o n stru ctio n w ith the S pace M ed ian h eu ristic and w ith te rm in a tio n c rite ria -
m a x im u m trian g les in le a f node = 2 .

T he m ain ch allen g e in the c o n stru c tio n o f a g o o d k d -tree fo r ray tra c in g is finding a g o o d sp littin g
plan e to p a rtitio n the sp ace in the given node. T he p o sitio n o f th is p lan e d e te rm in e s the q u a lity o f
the tree fo r ray tracing. S everal m e th o d s have been p ro p o sed to find this split p o sitio n .

T he sim p lest m ethod used is to p la c e the sp littin g plan e in the c e n tre o f the tw o b o u n d in g p lan es
o f a node alo n g o n e o f the th ree axes |K a p 8 5 ] |F S 8 8 ]. U sin g th e m id -p o in t1 as the sp littin g
p la n e 's p o sitio n is not very efficient fo r ray tracin g . S in ce a ray c a n sk ip trav ersin g em p ty n o d es,
it is p referab le to id en tify em p ty sp aces and c reate em p ty n o d es. In ad d itio n , ray tra cin g can be

M id-point here m eans the spatial m edian. Since, the splitting planes are indicated by a point and a norm al (axis),
a plane placed at the m id-point o f the extent along an axis divides the space into tw o equal halves.
2.3 A ccelera tio n S tructures 19

accelerated by effective separation and classification of triangles to create a kd-tree that closely fits
the scene. The Surface Area Heuristic (SAH) aims to do this and was introduced by MacDonald
and Booth [MB90] to kd-trees, adapted from the Bounding Volume Hierarchy construction method
by Goldsmith and Salmon [GS87].

The SAH is a heuristic to determine a locally optimal split position at a node. It takes advantage of
the property of kd-trees that the splitting plane can be placed arbitrarily between the minimum and
maximum point along an axis. Its conception propelled kd-trees to be the preferred data structure
for ray tracing. The SAH has been widely used [MB90] [SF90] [Sub91] [Wal04] [Wal05] [PH04]
to accelerate ray tracing.

The SAH is based on the probability of a ray hitting a node and the cost of computing the inter­
sections to the geometry within it [MB90]. The probability of an arbitrary ray intersecting a node
is dependent on the surface area of the node itself. This is quantised into a cost that indicates
the cost of ray tracing if the split in consideration is used. The cost takes into consideration the
probability that the split node is hit and the number of objects contained by it. The intersection
cost of each primitive in the node is considered to be a constant. Similarly, node traversal cost
is also considered a constant. This is not unreasonable for ray tracers that use solely triangles as
primitives. This cost to be calculated at each potential split point is given by:

m S A ( l e f t N o d c ) L T + S A ( r ic jh t N o d e ) R T „
C ° Si = Tc + h 1----------------------------- SA (node) '------------------------------------------- < 2 ' 1 2 )

where
T c - Cost of traversing a node
I c - Cost of intersecting a triangle
S A ( l c f t N o d c ) - Surface area of the left node formed by this split plane
S A (r ig h t,N o d e ) - Surface area of the right node formed by this split plane
S A ( N o d e ) - Surface area of the node being split
L T - Number of triangles in the left node
R T - Number of triangles in the right node
S A (n o d c ) - Surface area of the node
The cost is computed as per the above equation at each potential split point along all the axes and
the point with the minimum cost - the locally optimal split position - is selected2. The split plane
is placed at this point and the primitives are classified accordingly.

It may be observed that there can potentially be an infinite number of split points along an axis.
MacDonald and Booth derive the property that the split point with the minimum cost has to be
between the object median and the space median. In addition, they observe that the SAH cost can
only differ significantly at the limits of each primitive (Figure 2.5). Considering only these points
reduces the number of SAH costs to be calculated to a manageable number.

The most expensive part of computing the SAH cost is the process of classifying the primitives as
being on either side or both sides of the splitting planes and counting them. The naive method to do
this is to scan through the entire list of N primitives at each potential split point and classify them.
This is a very expensive method leading to a cost in the order of 0 ( N 2). An improved version of
this method is to sort the primitives along the splitting axis at each node SAH step [PH04] [Sze03].

T and I are constants in our im plem entation w here only triangles are used as prim itives. H ence, using them does
not affect the selection o f the split plane position using the SA H .
2.3 A cc e le ra tio n S tru ctu res 20

Node
Node
primitives Potential split points

SplitAxis
Potential split
positions on axis

F igure 2.5: SA H p o te n tial split p o sitio n s. T h e p o te n tia l split p o sitio n s are sh o w n as blue points.
For every trian g le , the tw o ex trem e tie s o f the tria n g le w ith resp ect to an ax is are c o n sid e re d as
p o ten tial SA H split p o sitio n s.

S ince the sortin g is d o n e at each step, and the n u m b e r o f p rim itiv es can be assu m ed to red u ce lo g a­
rith m ically w ith the d ep th o f the tree, the o rd e r o f th is c o m p u tatio n is, on av erag e, G ( N l o c j ~ ( N ) ) .
T his has been refined so that it can be achieved w ith a co m p lex ity o f 0 ( N l o g ( N ) ) [W H 0 6 ] by
so rtin g once at the b eg in n in g and m a in ta in in g this o rd e r th ro u g h the c o n stru c tio n p ro c ess.

C lassificatio n o f p rim itiv es is also co m p licated by the k d -tre e s’s p ro p erty that p rim itiv e s can lie
on both sides o f a split. T he p ro p erty cau ses p ro b lem s in co u n tin g the n u m b er o f p rim itiv es as
w ell as in the selectio n o f a split p o in t. U sin g the end p o in ts o f the p rim itiv es lead s to in co rrect
split points as F ig u re 2.6 show s. To allev iate this p ro b le m , c lip p in g th e p rim itiv es h as been p ro ­
posed [H B 02] |H K R S 0 2 ] so that o n ly p arts o f p rim itiv e s in sid e the n o d e arc in clu d ed - as show n
in F igure 2.6.

C lip p in g also e lim in a te s the co u n tin g p ro b lem - show n in F ig u re 2.7 - th at o c c u rs w h en the end
points o f p rim itiv es are used. As can be seen from the figure, if large tria n g le s are no t clip p e d , they
can be incorrectly in clu d ed in a node leading to erro n e o u s c o u n ts. C lip p in g en su re s th a t p rim itiv es
o u tsid e the node are ex clu d e d so that the SA H co st is m o re accu rate. T h e o p e ra tio n is sim plified
- as show n in F ig u re 2.7 - w h ere o n ly the p o sitio n o f the p o ten tial sp lit p o in ts w ith resp ect to the
split plane are co u n ted to get an accu rate co u n t o f the n u m b e r o f p rim itiv es. It is also p o ssib le to
ensure that the c o u n t is acc u ra te by u sing a fast tria n g le -A A B B o v erlap m eth o d (A M 01 ].

T he cost fu n ctio n can be also be used to favour certain co n d itio n s such as em p ty v oxels / n o d es
|H K R S 0 2 ] [W H 06]. E m p ty voxels are fav o u rab le as ray s in tersec tin g th e m can im m e d ia te ly stop
the traversal o f the em p ty node allo w in g it to skip large p o rtio n s o f sp ace. T h u s, if a c ertain split
resu lts in one o f the tw o child n o d es to be em pty, then the c o n d itio n is fav o u red in the SA H cost
by biasing it so that the cost is m o re p ro b ab le to be less than if the split re su lte d in tw o n o n -em p ty
2.3 A cc e le ra tio n S tru ctu res 21

Incorrect potential
Node lit points \
Node Potential split
primitives points after clipping

Potential split points

SplitAxis
Potential split Incorrect split
positions on axis positions on axis

F igure 2.6: SA H p o ten tial split p o sitio n s, sho w in g p o ssib le in c o rre c t split p o sitio n s (in red) if
trian g les are not c lip p ed .

nodes. In [H K R S 02] [W H 06] the co st is red u ced to 80% o f the o rig in a l cost to add in this bias,
resu ltin g in ju s t a 5% im p ro v em en t in p erfo rm an ce.

Term ination criteria - O ne o f the facto rs in the c o n stru c tio n o f a k d -tree is to d e te rm in e w hen a
node is m ade into a le a f node and is not split further. It is the p o in t at w h ich the c o n stru c tio n o f the
tree is stopped as fu rth e r su b d iv isio n w ou ld not be a d v an ta g eo u s. T h e tw o crite ria that are u su ally
used are the dep th o f the tree at a node and the n u m b e r o f tria n g les in the node at w h ich the n ode is
co n sid ered a le a f node. A p rim itiv e co u n t o f tw o or few er is a re a so n a b le po in t at w h ich the node
should be co n sid ered a le a f node. T h e dep th o f the tree at the n o d e in c o n sid e ra tio n is also used
in co n ju n ctio n . T h is c riterio n m akes ev ery node at this d ep th into a le a f n ode irre sp e c tiv e o f the
n u m b er o f p rim itiv es it co n tain s. T he dep th is n o rm ally related to th e n u m b e r o f p rim itiv es in the
scene. O ne h eu ristic su g g ested by H av ran IH B 02] is to use a te rm in a tio n d ep th o f 1.2/r>r/2A7 + 2.
T his h eu ristic is fo u n d to w o rk w ell w ith m ost scen es w ith o u r im p le m e n ta tio n as w ell.

Autom atic T erm ination C riteria - A ltho u g h these te rm in a tio n c rite ria w ork very w ell, they
req u ire som e p rio r k n o w led g e o f the m o d els and u se r input. U sin g the SA H co st to d e te rm in e an
au to m ated term in atio n poin t w as an a ly se d [S F 91] and the te rm in a tio n p o in t w as d e te rm in e d as
the point at w h ich the co st b eg in s to in crease. T he au to m a tic te rm in a tio n m eth o d su g g este d by
H avran [H B 02] ach ie ves this by using the SA H co st to d e te rm in e in stan ces w h ere fu rth e r splits
m ay not actu a lly help. T h e SA H co st o f the p aren t is c o m p a re d to th e S A H co st w h en the p aren t
node is split. W h en this co st is above a certain p e rcen tag e, the split is d e te rm in ed as not helpful
and thu s c o n stru c tio n stops here. A n o th e r c riterio n su g g e ste d is w h en th e cost in d ic ate s a very
sm all p ro b ab ility o f the n ode b ein g hit. At this p o in t, the p ro b a b ility o f th e node in c o n sid eratio n
being hit by a ray is very sm all and h en ce sp littin g the n ode w o u ld be p o in tless.
2.3 A cc e le ra tio n S tru ctu res 22

Split plane

Large primitives

Split axis
Clipped triangle boundaries Split plane
Position on axis

F igure 2.7: P o ten tial fo r in co rrect p rim itiv e cou n ts. I f the tria n g le s are not clip p e d , th en th ey can
be in co rrectly c o u n ted . T h e tw o trian g les w ou ld be co n sid e re d as b ein g on b o th sid es, if they are
not clip p ed .

Im proving C onstruction Perform ance - E ven tho u g h the c o n stru c tio n o f SA H b ased k d -trees
has been show n to be in the o rd er o f 0 ( N l o g ( N ) ) , the c o n sta n ts a sso cia te d w ith th e c o n stru c ­
tion are quite high, m ak in g the p ro cess quite slow. A c c ele ra tin g th e c o n stru c tio n h as thus been
an active area o f research . It has led to alg o rith m s that base th e m se lv e s on finding th e a p p ro x i­
m ate m inim al cost point instead o f the exact split poin t w ith the m in im u m S A H co st [P G S S 06]
[H M S 06] |S S K 0 7 ]. T h is is ach iev ed by sam p lin g the SA H cost at a few p o in ts and a p p ro x im a tin g
the m inim al cost by m ath e m a tic a lly in terp o la tin g th e co st b etw e e n the sam p led lo c a tio n s. H unt
e t al. [H M S 06) refine this by tak in g fu rth e r sam p les in the interv al in w h ich the m in im u m point
o ccu rs. On the o th e r hand, P opov et al. [P G S S 06] use a h ig h er n u m b e r o f sam p les to d ete rm in e
a b e tte r ap p ro x im atio n . U sin g the sam p lin g m eth o d alo n g w ith p a ra lle lisa tio n o f the p ro cess w as
also investigated by S h ev tso v et al. [S S K 07] to fu rth e r acc e lerate th e co n stru c tio n p ro cess.

O nce the tree is co n stru c te d , the scen e can be ray traced by trav e rsin g every ray th ro u g h the tree.

2.3.2.2 Traversal

K d-trees are an efficient stru ctu re to traverse. T he trav ersal o f k d -tre e s has been an a rea o f active
research [SS92] [H K B v 9 7 ] [A rv 8 8 ]. A kd -tree is a b in ary tree en a b lin g sim p le d e te rm in atio n o f
the next node to traverse. T he c h ild node to trav erse can be d e te rm in e d w ith ju s t one p la n e -ra y
intersection. T his is a m a jo r a d v an ta g e fo r a kd-tree lead in g to a c h e a p p er node trav ersal cost.

A n o th e r ad v an tag e o f k d -trees, as o f m ost space su b d iv isio n stru ctu re s, is th at they can be trav ersed
in a true fro n t-to -b a c k order. W h en such a trav ersal is used, the o b je c ts in th is n o d e are g u aran teed
to be intersec ted b efo re o b jects in n o d es that are in tersected at a la ter tim e (the e x ce p tio n b ein g
o b jects that sp an m ore th an one node. A s long as the / p a ra m e te r o f the o b ject in te rse c tio n is
b etw een the n o d e 's i entry and 1exit, the o b ject in tersec tio n is w ith in the n ode and h en ce co rrect).
2.3 A ccelera tio n S tru ctu res 23

prim itives

split plane

bounding box

>• <1

-4
F igure 2.8: K d -tree C o n stru c tio n w ith the S u rface A rea H e u ristic and term in a tio n c rite ria -
m ax im u m trian g les in le a f n o d e = 2 .

T his property o f sp ace su b d iv isio n stru ctu res th at allo w s ray s to stop th e ir trav ersal u pon finding an
intersectio n - c alled early ray term in a tio n - saves several trav ersal step s and o b je c t in te rse c tio n s
due to o cclu sio n in the scene.

Initially, it is to be d eterm in ed w h e th e r the ray in tersec ts the b o u n d in g box o f the scen e - or in


o th er w ords, the ro o t node o f the k d -tree. T his is d ete rm in e d by u sing the slabs m eth o d d escrib ed
in Section 2.2.4. T h e m eth o d also p ro vid es the v alu es fo r t entry an(J texit - the e n try an d exit
p aram eters o f the ray. If th ere is an in tersec tio n , then the ray in te rse c ts the ro o t n o d e and hen ce
has to be trav ersed dow n the tree. O th erw ise, the ray m isses the root n o d e an d h en ce the en tire
scene.

If a ray in tersec ts the root node, th en it has to trav erse the tree in a fro n t-to -b a c k order. A n o th e r
p ro p erty o f sp ace su b d iv isio n stru ctu re s is that if a ray in tersec ts the p a re n t n o d e, th en it has to
in tersect at least one o f the tw o ch ild nodes. T h ere m ay be cases w h en the ray trav erses only
one child node. T h e trav ersal o rd e r is d ete rm in e d by c alc u la tin g the in te rse c tio n p a ra m e te r at the
split plane, t spu f , and co m p a rin g it to the values o f t mur and t mnx ■ I split is c o m p u te d as giv en in
E q u atio n 2.8, i.e., by u sing the term below .

I'sp lrt = ( S p H ta x is O a x is )d r e c ( 2 . 13)

If I split > t-min* then the ray in tersec ts the left ch ild node. S im ilarly if 1sput < tmax* the right
node o f the tree is trav ersed . If b oth co n d itio n s are true, th en the k d -tre e first trav erses the left
child and then the right child node. A s F ig u re 2.10 and L istin g 2.4 sh o w s, th e th ree p o ssib le case s
- ray traverses o n ly the left n o d e, ray trav erses o nly the right node o r ray trav erses b o th n o d es -
arc handled by ju s t th ese tw o co n d itio n s.
2.3 A c c e le r a tio n S tru ctu res 24

(a) Space m edian kd-tree for Dragon (b) SA H kd-tree for Dragon

F igure 2.9: S pace m ed ian an d SA H k d -trees c o n stru c te d on the D rag o n m odel. T h e SA H kd -tree
m ore clo sely w rap s the m odel and red u ces th e void area.

^ s p in d m in ^ s p lt < ^m ax ^ s p M t< d m a x
Ray in te rsects left n od e Ray in tersects both left n od e Ray in te rsects left n od e
and right n od e

F ig u re 2.10: K d-tree ray traversal.

If a ray trav erse s the tree and re a c h e s a le a f node, then the ray h as a c h an ce o f h ittin g one o f the
p rim itiv es c o n tain ed by the le a f node. T he ray then h as to be in tersec te d w ith e ach p rim itiv e and
the clo se st o b ject - as d ete rm in e d by the o b ject w ith the sm allest t in te rse c tio n p a ra m e te r less than
tm ax ~ is the first o b ject that the ray in te rse c ts in the scene. T he o b ject can then be u sed to shade
the c o rre sp o n d in g pixel an d sp aw n ad d itio n al reflectio n , refra ctio n and sh ad o w ray s as necessary.
T he p seu d o c o d e below show s the k d -tree traversal.

int RecursiveRayTraversal(node, tmin, tmax)


{

i f (node is a leaf node)


(
i f (node is empty)
return -1;
return ProcessLeafNode(node);
)
currentAxis = GetAxis(node);
2.3 A ccelera tio n S tructures 25

splitPos = GetSplitPosition(node);
t_sp = (splitPos - rayOrigin[axis])*rayDirectionReciprocal[axis];
i f ((t_sp > tmin))
{

i = RecursiveRayTraversal(node->leftNode, tmin, min(tmax,t_sp));


if (i != -1)
return i;
}
i f ((t_sp < tmax))
(

i = RecursiveRayTraversal(node->rightNode, max(tmin, t_sp), tmax);


)
return i;
}

Listing 2.4: Recursive Ray traversal algorithm. Algorithm computes ray-split plane intersection
parameters of the ray and traverses only the left node, only the right node or both child nodes, as
shown in Figure 2.10.

The traversal of the kd-tree is simple and efficient - as shown by the pseudocode.

Research has concentrated on comparing several data structures with kd-trees and it was deter­
mined by Havran [HavOl] that statistically, at the time of writing his thesis, kd-trees were the best
structures available for ray tracing static scenes. Recently, for some applications like dynamic ren­
dering and incoherent rays, BVHs [WBS07] [DHK08] are said to be a better structure. However,
it is a fact that kd-trees are among the best structures for ray tracing, especially for static scenes.

A disadvantage of space partitioning techniques is that the primitives may occur in more than one
leaf node. This can lead to a ray being intersected against a primitive several times. One of the
proposed solutions is mailboxing [APB87] [KA91] [AW87] - a technique that maintains a list of
already intersected primitives by the ray and avoids duplicate intersections. It is debatable as to
the advantages provided by mailboxing. The overheads associated with maintaining and checking
the list leads to a situation where for scenes with largely simple primitives like triangles it may
actually be preferable to intersect with the primitive again [Hav02], Hunt [Hun08] suggests a
simple modification to the SAH cost when mailboxing is included. A significant reduction in
intersections is shown when using the method, but the performance gained is still in the range of
5% with a maximum improvement of around 10% for one particular scene, leaving the usefulness
of mailboxing in doubt.

2.3.3 O ctrees

Introduced by Glassner [Gla84], octrees were one of the first space subdivision structure used. An
octree splits space into eight regular sub-spaces and the primitives in the scene are classified into
these sub-spaces. Since space is proportionally divided, the primitives may span more than one
sub-space. Figure 2.11 shows an octree structure in 2D. In 2D, since space is divided only in two
dimensions, each node is divided twice and is divided into four sub spaces. The structure is thus
called a quadtree.

The root node of an octree is a cube, and hence all child nodes of the octree are also cubes. For
ray tracing, the fact that the splits are even and are not according to the primitives in the scene can
2 .3 A c c e le r a tio n S tru ctu res 26

>
>
4 |
F igure 2.11: 2D v e rsio n o f an O ctree (Q u ad tree). S p ace is sp lit at even lo c atio n s a n d along all
the axes (in 2D ).

he a serio u s d isad v an tag e. As show n by k d -trces, effectiv e sep aratio n o f p rim itiv es can m ake a b ig
differen ce in th e n u m b e r o f n o d es trav ersed as w ell as the n u m b er o f o b jects in tersec ted .

T he trav ersal o f o ctrees is slig h tly co m p lica te d by the fact that each n o n -le a f n ode o f the o ctree
has eig h t child n o d es. H en ce, to m ain tain the fro n t-to -b ack order, the trav ersal o rd e r o f these eig h t
child nod es h as to be c o n sid ered . T he u n ifo rm ity o f the su b d iv isio n also allow s the use o f Three
D im e n s io n a l D igital D ifferential A n a ly s e r like a lg o rith m s |S u n 9 1 ] m o re p o p u la r fo r grids. T h u s,
the trav ersal o f octrees has been re search ed ex ten siv ely and several m eth o d s o f trav ersal have been
pu b lish ed [H a v 9 9 |.

T he trav ersal o f o ctrees can be in one o f c ith e r fo rm s. It can be a to p do w n a p p ro ac h w here the


traversal starts from the root node and d escen d s to the ch ild n o d es in a p a rtic u la r o rd e r until the
in tersec tio n is found [A G L 91] [G A 93]. Or, it cou ld be a b o tto m up ap p ro a c h w h ere th e trav ersal
starts in the tirst node in tersected by the ray and trav erses the n eig h b o u rin g v oxels [S am 8 9 ] o f this
node until it finds an in te rse c te d object.

T he a lg o rith m for octree trav ersal by R evelles et al. [RULOO] is a to p do w n trav ersal m eth o d that
uses the p a ra m e tric fo rm o f the ray and by c o m p a riso n s o f th is p aram eter, d e cid e s th e n ext voxel
to traverse. E ach voxel o f the o ctree is in d icated by an in teg er from 0 to 7. E ach p lan e is in d icated
by a b it and by settin g o r c lea rin g th ese bits. T he bits are set o r cle a re d b ased on th e value o f the
t in tersec tio n param eter. T he in teg er o b tain ed by the bit o p era tio n s d e term in e s the o rd e r in w hich
the o ctree is trav ersed . T he re su lts p ro v id ed show that fo r d en sely p a ck ed u n ifo rm scen es, the
octree is a m o re efficient stru ctu re than a sp ace m ed ian k d -tree as an o c tre e is o f le sse r dep th and
hence there are few er v ertical tree trav ersals.

A p p ly in g the SA H to o ctrees h as resu lted in a stru ctu re c alled O ctree-R [W S C + 95]. In the O ctree-
R. a cost fu n ctio n based on M acD o n a ld and B o o th 's S A H fu n ctio n is u sed to d e term in e the split
planes alo n g the X , V and Z axis. T h is resu lts in a m ore co m p lex stru c tu re - o n e w ith b etter
sep aratio n o f p rim itiv es. T he au th o rs claim a sp e ed -u p o f 4 % to 4 7% c o m p a red to n o rm a l octrees.
A s e x p ected th e co n stru c tio n o f the O ctree-R is m o re involved lead in g to slo w e r c o n stru c tio n
2 .3 A c c e le ra tio n S tru ctu res 27

tim es. H ow ev er, they c o m p a re the O ctree-R o nly to a n orm al o ctree and not to o th e r stru ctu res
like an S A H k d -tree o r a B V H .

2.3.4 G rid s

G rid s p a rtitio n the sp ace into sm all su b d iv isio n s [F T I8 6 J LCW8 8 ]. T h ey w ere first ap p lied to
ac celerate ray tracin g by F u jiin o to et al. [F T I8 6 J. A T h ree D im e n sio n a l D ig ital D ifferen tial A n a l­
y se r (3 D D D A ) trav ersal a lg o rith m is d ev elo p e d fo r the trav ersal o f ray s th ro u g h the grid. A fa ste r
trav ersal - also a v arian t o f the D D D A a lg o rith m - h as also b een u sed to trav erse g rid s [A W 87],
In this alg o rith m , the next vox el to travel is d e cid e d b ased u p o n the v alu es o f the t p a ra m e te r o f
the ray. Initially , it c o m p u te s the three entry t p a ra m e ters and an in c re m e n t value fo r ea c h axis
th a t co rre sp o n d s to the len g th eq u al to the v o x e l’s d im e n sio n alo n g th at p a rtic u la r axis. It re ­
sults in a fairly sim p le a lg o rith m that d e cid e s the next voxel to trav erse w ith sim p le a d d itio n s and
co m p ariso n s.

<
J m 7
* *
^ 7

< 4 1 1 ►
F igure 2.12: 2D v ersion o f a u n ifo rm grid. T h e en tire b o u n d in g v o lu m e o f the scen e is d iv id ed
into m any sm aller even sp aces.

A s w ith k d -trees. g rid s arc a sp ace su b d iv isio n stru c tu re in w h ich o b je cts can ex ist in m o re than
one cell / voxel. To solve the p ro b lem o f m u ltip le in te rse c tio n s w ith th e sam e o b ject, A m a-
n atid es [A W 87] uses the m ailb o x in g tec h n iq u e d e sc rib ed earlier. E ach ray is given a ra y lD and
fo r each o b je c t, the ray that m o st recen tly in te rse c ted it is m a in ta in e d and ch eck ed . If the ra ylD
is the sam e, then the o b ject has alread y been in terse c te d w ith th e o b ject and does not n eed to be
in tersected again . O th e rw ise , the in tersec tio n is c arried out an d the r a y lD is u p d ated .

Ize et al. [IS P 07) a n aly ze g rid c reatio n strategies th e o re tic a lly and em p irically . U sin g sim plified
a ssu m p tio n s, a th eo ry is d ev elo p e d . It is then tested em p irically . T h e a ssu m p tio n s are that

• T h e re are N p rim itiv es that are all p oints.

• A ll ray s hit the b o u n d in g box o f the scen e - a cube.

• E ach ato m ic fu n c tio n ’s co st can be treated as a co n sta n t.

• M a th e m a tic a l o p e ra tio n s rath e r th an m em o ry p e rfo rm a n c e , etc., d o m in ate p e rfo rm a n ce .

• T h e grid has m '1 = M cells.

U sing these assu m p tio n s, fo r u n ifo rm grids, they d ed u ce th at the m in im u m tim e to trace a ray is
given w hen th e n u m b er o f cells, 3 / , in th e grid is given as
2.3 A ccelera tio n Structures 28

M = 2N r^ inttTSCction (2 14)
Tstep

where
M - is the number of grid cells to create
TV - is the number of primitives in the scene
Tintersection - is the cost of intersection with a primitive
Tstep - is the cost of traversing a grid cell

Ray tracing performance on a grid with M cells, calculated according to the equation above,
is shown to be very close to the performance with the optimal grids selected empirically. The
variation is in the range of 1% — 5% even for scenes like the conference room scene [GW09] for
which the assumptions are far off. They also calculate the optimal number of cells when the scene
is composed of triangles with poor aspect ratio and for multi-level grids that work reasonably
well. To conclude, they also state that empirically, M = 0 ( N 7/ 9) and M = 0 ( N 4/ 3) produced
the perfect results for manifold-like models with compact triangles and triangles with poor aspect
ratio respectively.

Uniform grids work well for scenes that are uniform with primitives distributed evenly. How­
ever, for scenes that are non-uniform, they are not particularly well suited. This has led to the
development o f hierarchical grids and adaptive grids.

Jevans and Wyvill [JW89] initially create a uniform grid over the scene. These are then examined
and in spaces where the object density is high, they subdivide those particular voxels into several
( i.e., TV3) sub-voxels. The structure is very similar to an octree with the main difference being
the number of subdivisions in the voxel. Cazals et al. [CDP95] create a hierarchy of uniform grids
by finding clusters in the scene and dividing space occupied by each cluster into further uniform
grids. Klimaszewski and Sederberg [KS97] create an adaptive grid structure by first constructing a
BVH using Goldsmith and Salmon’s algorithm. The boxes are then divided into voxels containing
uniform grids to create the adaptive grid structure. Although a comparison with grids is provided,
and the adaptive structure is shown to perform better, a comparison with BVHs is not provided.

Due to the simplicity, uniform grids are normally used instead of the adaptive and hierarchical ver­
sions. The simplicity reflects in the ease and speed of construction as well as traversal. Although
the average traversal time per pixel can be slower than a kd-tree, a uniform grid can be constructed
in a fraction of the cost leading to its use in rendering dynamic scenes as Section 2.6 will show.

The traversal of an individual grid cell by a ray is inexpensive and among the cheapest. How­
ever, the penalty for this is that there are a larger number of cell traversals if the cells are small
enough. If the cells are large, then the average number of objects in each cell may be very high.
The average complexity of ray tracing a scene with scattered compact primitives using grids is
O ( V N ) [CW88] [ISP07], if N is the number of triangles in the scene. As the scene sizes increase,
there is a greater increase in ray tracing times than in hierarchical structures like the kd-tree.

2 .3.5 O b ject S u b division

Object subdivision structures are a method of subdividing the scene based on the objects and prim­
itives. These primitives are enclosed by simple bounding volumes so that they are only intersected
2.3 A ccelera tio n S tructures 29

when their bounding volume is hit by the ray.

This type of structure was first described by Clark [Cla76], though not specifically for ray trac­
ing. A structure that grouped objects into hierarchies of bounding volumes was used by Whit-
ted [Whi80]. In his first implementation of ray tracing, spheres were used as bounding volumes
as they lead to very simple construction and traversal methods.

These structures can either be flat or more often hierarchies. In a flat object subdivision structure,
bounding volumes are not organised as a hierarchy. However, it is advantageous to organise the
bounding volumes into hierarchies. If a ray can skip bounding boxes higher up in the hierarchy,
larger parts of the scene do not have to be tested. Hierarchies also allow smaller subsets of the
primitives to be tested. Hence, most ray tracers use hierarchies of bounding volumes rather than
flat bounding volumes. The hierarchical structure is more commonly referred to as a bounding
volume hierarchy ( BVH). Several different types of bounding volumes have been used.

2.3.5.1 Shapes of Bounding volumes

Spheres were one of the first bounding volumes used as they are very cheap to intersect with a
ray [Whi80]. A method to create spheres that closely fit the volume was provided by Ritter [Rit90],
However, spheres do not bound most volumes very closely [WHG84]. Thus, they result in a
hierarchy with a large number of bounding volumes resulting in higher traversals and primitive
intersections.

Slabs of arbitrarily aligned planes were used as bounding volumes by Kay and Kajiya [KK86]. The
arbitrarily aligned planes provided a very close fitting structure. However, the cost of intersection
was quite high leading to higher ray tracing costs.

The most popular kind of bounding volumes are axis-aligned bounding boxes [WHG84] [Gut84]
[BCG+ 96]. Axis-aligned bounding boxes (AABB) are rectangular boxes that, as the name sug­
gests, are aligned along a coordinate axis (one of X , Y or Z axes). Trees consisting of AABBs
were called R-trees [Gut84] in the context of spatial searching. Due to their abundance, the name
bounding volume hierarchies normally refers to a hierarchy of AABBs. As Weghorst [WHG84]
demonstrates through the introduction of the concept of void area, AABBs have a relatively low
intersection cost and at the same time fit the model reasonably closely. This compromise results
in a very effective shape for bounding volumes in the context of ray tracing.

Oriented bounding boxes (OBBs) [GLM96] [GotOO] are another kind of bounding volumes. Their
alignment is decided by the alignment of their component objects so that they fit the scene more
closely than their axis-aligned counterparts [RW80]. However, they are expensive to construct,
traverse and store.

2.3.5.2 Construction

The first bounding volume hierarchies were constructed manually by user input [RW80]. This is
of limited use as human construction is both tedious and prone to selecting non optimal options,
especially when the scene consists of a large number of triangles. An automated approach was
hence necessary. Kay and Kajiya [KK86] propose a few automated object grouping criteria. One
of them is to group objects as they occur in the scene representation. This approach depends
largely on the way the scene is represented. If the scene is represented so that closer objects are
2.3 A cceleratio n S tructures 30

in succession, then the scheme results in good BVHs, otherwise the BVHs can be quite poor.
Another proposed scheme is the median cut scheme whereby the objects are sorted according to
proximity, either on one axis or according to all axis (through the use of an auxiliary structure like
an octree or a kd-tree). The objects are then grouped into bounding slabs according to the median.
This scheme results in reasonably good trees.

The SAH approach, proposed by Goldsmith and Salmon [GS87] aims to take the automated ap­
proach further by considering the probabilities of uniformly distributed rays hitting a bounding
volume. The objects are inserted into the tree similar to insertion of data into a search tree and
determined if the insertion was a good one or not by calculating the surface area of the new node.
The order in which the objects are inserted is important as the node created depends on this order.
They attempt inserting the objects in the order present in the model or in a randomised manner.
This does not result in better trees, mainly due to the bottom up approach rather than the SAH
itself [WMG+ ], Although, the SAH for BVHs was not widely used, it led to the SAH for kd-trees
that, as already discussed, was the major catalyst to accelerate ray tracing speeds to interactive /
real time performance.

The main challenge with SAH for BVHs is to find the locally optimal bounding box, as there are
0 ( 2 n ) ways to split a set of primitives into two subsets [WBS07]. Muller [MF00] suggests a top
down SAH construction method that is very similar to a kd-tree construction method. They use
an SAH heuristic that has potential split positions at the two end points of the triangle along an
axis and selects the box with the minimum cost. Similar to the kd-tree process, each split point
along each axis is tested and the box with the minimal cost is selected. Masso and Lopez [ML03]
use a different cost model and apply that using a Goldsmith and Salmon tree building approach
with objects selected using another BVH built earlier using the object median or the approach
suggested by Muller [MF00]. Ng and Trifonov [NT03] apply random perturbations to the split
points and also investigate evolutionary approaches to improve the original tree generated. The
method provides only marginal improvements on the original tree while adding computational
overhead.

Wald et al.’s approach to building BVHs [WBS07] using the SAH is to find a good instead of
optimal partition. They investigate evenly spaced candidate planes, the bounding sides of each
triangle as candidate planes and planes placed at the centroid of each triangle. Unexpectedly,
all approaches resulted in similar performance. The centroid based approach was ultimately pre­
ferred. The triangles were classified as being in either sub-node based on the position of the
triangle’s centroid.

As with kd-trees, the use of SAH makes the construction relatively slow. However, similar to the
kd-tree heuristics, approximate SAH builds [Wal07] have been suggested to accelerate the build
process. With better quality BVHs available interest has significantly increased, especially for
deformable and dynamic scenes [WBS07] [LYMT06] [WMG+ ]. There has also been interest in
BVHs to trace incoherent rays [DHK08] [WBB08] for which BVHs consisting of more than two
child nodes at each level are shown to be an excellent structure, making efficient use of SIMD
packets.

A problem with BVHs is that the bounding volume of large triangles can be very large, leading to
a large empty space, especially for non-axis-aligned triangles with a poor aspect ratio. A solution
to this problem is to split the triangles with one of the two approaches that have been proposed.
The first solution, proposed by Ernst and Greiner [EG07], is to clip the larger triangles (based
on the surface area) with axis-aligned planes to create smaller triangles prior to construction of
2.3 A c c e le ra tio n S tru ctu res 31

F igure 2.13: B o u n d in g V olum e H ierarch y . E ach lo w er level co n sists o f b o u n d in g v o lu m e s that


en clo se sm aller parts o f the m o d el until the le a f n ode level w h ere each b o u n d in g v o lu m e has a
single primitive.

the B V H . T his allo w s the b o u n d in g b o x es to be clo se ly alig n ed and red u c e s em p ty sp ace in these
boxes. T he resu lts show that it is effectiv e fo r n o n -h o m o g e n e o u s scen es like the P o w e r p l a n t scene.
A n o th er solutio n , p ro p o sed by D a m m e rtz and K eller [D K 08], is to split trian g les a lo n g th e longest
side to create sm aller tria n g le s. A th resh o ld that d e te rm in es the n u m b er o f tria n g les g en e ra te d
is d eterm in ed by u sing a term called edge vo lum e defined as the fractio n o f the v o lu m e o f the
b o u n d in g box. T he resu ltin g B V H show s sig n ifican tly im proved p e rfo rm a n c e fo r p ro b le m a tic
scenes w ith o u t adding fu rth e r trian g le s fo r scen es that d o not benefit from su b d iv isio n .

2.3.5.3 Traversal

D ep en d in g on the sh ap e o f the b o u n d in g vo lu m e, the trav ersal o f th e B V H stru ctu re c h a n g e s


accordingly. For B V H s w ith sp h eres, the trav ersal c o n sists m ain ly o f in tersec tin g th e ray w ith
the sphere fo r the p a rtic u la r node. S im ilary fo r A A B B s, O B B s and slab s, in tersec tio n m eth o d s
are used to d eterm in e w h e th e r a ray is to be trav ersed th ro u g h a b o u n d in g v olum e o r not. A A B B
based B V H s - the m ost p o p u la r B V H used - have a v ariety o f in tersec tio n m eth o d s av ailab le
as d escrib ed in S ection 2.2.4. O ne o f the m ost fre q u e n tly u sed in te rse c tio n m eth o d s is th e slab s
m ethod |K K 8 6 ] w h ich e sse n tia lly in tersec ts the ray w ith the c o m p o n e n t p lan es o f the v o lu m e. It
is a g en eral so lu tio n th at w o rk s irresp ectiv e o f the o rie n ta tio n o f the c o m p o n e n t p lan es. A s a p p lied
to A A B B s the six b o u n d in g p lan es o f the box are in tersected w ith the ray to d ete rm in e if th ere is
an in tersec tio n .

The slabs m eth o d has been o p tim ise d |W B M S ()5] [G M 03] [B P04] [B P 0 5 ] and is u sed by m o st
recen t ray tracers [D H K 08] u sing B V H s. A lth o u g h p e r node trav ersals are re a so n a b ly fast, it
2.3 A ccelera tio n S tructures 32

has the disadvantage of being slower than that of kd-trees. With the kd-tree, each traversal step
involves only one ray-plane intersection. For BVHs, all the six planes of the box have to be
intersected.

Another disadvantage of BVHs is that they cannot be traversed in a true front-to-back order. Con­
sequently, early ray termination - used when a ray hits a primitive in a kd-tree leaf node - cannot
be used. Since the child nodes are not ordered spatially, it is necessary to traverse all of them.
Hence, for static scenes, kd-trees are better structures for ray tracing.

The traversal of the BVH can be described by the pseudocode below:


RecursiveRayTraversal(node)
(
i f (node is leaf node)
{
return closest object intersecting the ray
}
if(ray intersects first child node)
{
closestTrInFirstChild = RecursiveRayTraversal(first child node)
}
i f (ray intersects second child node)
{
closestTrInSecondChild =RecursiveRayTraversal(second child node)
}
return ClosestObject(closestTrlnFirstChild, closestTrlnSecondChild)
}

Listing 2.5: BVH traversal. Ray has to descend down both branches (all branches if the tree has
more than two branches) until the leaf nodes and select the closest triangle.

2.3.5.4 Advantages of BVHs

Although each step of a BVH’s traversal is more expensive than that of a kd-tree, it has several
advantages that make it competitive.

One advantage is that in a BVH, the objects are not duplicated. Hence, it is ensured that a ray
intersects a particular object just once. In a kd-tree, an object may occur in several leaf nodes
and consequently, if mailboxing is not used, a ray may have to compute the intersection with
one particular object several times during the traversal process. Mailboxing, that eliminates these
duplicate ray-object intersections does it at the cost of further overheads, making the advantages
minimal. Hence, the property of a BVH that it inherently eliminates duplicate intersections is
significant. This property also brings about a memory requirement advantage. Since objects are
not duplicated in the leaf nodes, the memory footprint is much smaller than that of kd-trees.

Another advantage is that BVHs create a box at each sub-node, whereas kd-trees separate space
only in one dimension. BVHs can, in theory, identify and separate empty space faster. This leads
to trees that are shallower resulting in fewer traversal steps and lower memory requirements.

The major advantage of BVHs is realised when they are applied to dynamic rendering wherein
geometry changes from one frame to the next. As described by Wald [WBS07], updating a BVH
when a part of the geometry deforms is possible with relative ease. In extreme cases, a full rebuild
maybe necessary. However, a full rebuild of a BVH would be faster than a kd-tree rebuild due to
faster construction methods for BVHs.
2.4 Packet Ray Tracing 33

Acceleration structures like grids, kd-trees and BVHs have been responsible for significant in­
crease in ray tracing performance. Acceleration structures are one way of reducing the number of
traversals and intersections per pixel. Although tracing a single ray through a well built BVH or a
kd-tree performs reasonably well, in recent times, the observation that several rays follow a simi­
lar path through the tree has been used extensively to accelerate ray tracing. This method, called
packet ray tracing, has enabled ray tracing to be competitive with the fastest rendering methods
available for static scenes.

2.4 Packet Ray Tracing

Observation of the path a ray takes through an acceleration structure reveals that neighbouring
rays take similar paths. In some cases, the neighbouring rays even intersect the same primitive.
This property - called image coherence - has been used along with acceleration structures (that
utilise object coherence) to accelerate ray tracing. This idea of image coherence is demonstrated
well by Benthin [Ben06].

The use of image coherence to accelerate ray tracing has been attempted from very early on
during the development of ray tracing algorithms and structures. The idea is to group several
neighbouring rays into a packet and traverse them together through the structure.

Earlier attempts to utilise image coherence traced groups of rays with different boundary shapes.
Heckbert and Hanrahan [HH84] traced beams - groups of rays with the actual primitives as bound­
ary shapes - through the scene. When there was a high level of coherence in the scene, the beam
tracer was shown to be much faster than standard ray tracers. They also showed that beam tracers
could achieve anti-aliasing and further ray tracing effects like reflection and refraction - which can
be a major problem with packet ray tracing. Shinya et al. [STN87] trace a group - or pencil - of
rays using a paraxial ray and a system matrix to represent the group of rays. Shaft culling [HW91]
that classifies objects as being inside or outside a shaft is another way coherence has been used
to accelerate ray tracing. Pyramidal clipping by Zwaan et al. [ZRJ95] traces a pyramidal group
of rays through a kd-tree and a grid by intersecting a convex polyhedron with a solid as given by
Greene [Gre94].

It was determined that supercomputers with vector operations were well suited to ray trace pack­
ets of rays [PB85] as far back as 1985 when only advanced supercomputers of the time pro­
vided vector instructions. The idea has been very popular in recent times with the introduc­
tion of vector instructions for general purpose CPUs available in almost all current architectures.
Wald [WBWS01] popularised this concept of tracing packets of rays using SIMD instructions. In
addition to amortizing the calculations amongst the number of rays in the packet, Wald pointed
out that cache and memory efficiency also improved with packet ray tracing as fewer nodes were
accessed.

The SIMD instructions used were Intel’s SSE instructions [SSE09b] [SSE09a] introduced in the
Pentium III [Int08] processors. The introduction gave Pentium III and later processors eight SSE
registers and several new instructions. The SSE registers were 128 bits wide allowing four floating
point numbers to be in the register at a time. To complement this, the new instructions allowed
these four floating point numbers in the registers to participate in arithmetic operations. As can
be seen, this allows four floating point operations to be undertaken with a single instruction. Most
of the instructions have performance comparable to the respective single floating point operation.
Thus, the use of SSE instructions and registers are a good way to accelerate operations that can
2.4 Packet Ray Tracing 34

be parallelised - like ray tracing. Additionally, to ease development, the compilers provided
intrinsics [VCI09] [Int09] that allow access to SSE instructions without the use of assembly code.

The use o f SIMD instructions to trace packets is a brute force approach where all the rays are
traversed through the tree. However, several other methods where only a few rays or representative
boundary volumes of packets are traced have also been popular. The longest common traversal
sequence [HBOO] algorithm, stores the traversal history of a set of boundary rays and constructs
the longest common traversal sequence (LCTS) from these. This LCTS is a set of nodes that each
ray in the node traverses and thus the number of traversal steps can be reduced.

Beam tracing, introduced by Heckbert and Hanrahan [HH84], uses beams to intersect with prim­
itives. However the fact that they do not use a hierarchical structure had results in relatively slow
performance. Overbeck et al. [ORM07] recently published a similar algorithm that uses similar
beams but traverses a kd-tree to show greatly accelerated performance. They start off with the en­
tire viewing frustum as a beam and progressively sub-divide the beams according to the primitive
boundaries. These beams are then traversed through the kd-tree using a frustum proxy method
similar to the LCTS and the Multi Level Ray Tracing Algorithm [RSH05] ( MLRT, MLRTA). The
algorithm is extended for soft shadows and performance is shown to be much faster than previous
methods.

The Multi Level Ray Tracing Algorithm [RSH05], traces a hierarchical beam of rays through a
kd-tree constructed with parameters specialised to the algorithm to realise speeds that are almost
real-time making it probably the fastest method of ray tracing. The entire scene is considered as a
beam and by splitting these beams into tiles of various sizes depending on the nodes traversed, a
hierarchical beam tree is built. Using this hierarchical beam tree, entry points deep inside the tree
are found enabling a large part to be disregarded. An inverse frustum culling where the frustum is
culled by the axis-aligned planes of the kd-tree is used to determine if a group of rays intersects
an AABB or not.

Another important contribution of the paper is the application of interval arithmetic to perform
packet ray tracing. Packet ray traversal with interval arithmetic works so that the traversal for
the entire group can be determined using just one interval computation. The interval represents
overestimates for the entire group and is a conservative computation. Partly due to this fact and
partly due to reduced probability of the entire packet (all rays in a packet) hitting a node for larger
packets, they cannot consist of a large number of rays. Reshetov et al. [RSH05] mention optimal
packet sizes of 4 x 4. In our implementation, a packet size of 8 x 8 provided the best performance.

A very similar but slightly modified version of this traversal is used in our implementation. In
order to trace a packet of rays, the rays with the earliest entry point and the latest exit point for each
axis are selected. Using these six rays, the entire packet is traversed. If rEntry [ x ] , rEntry [ Y ] ,
rEntry [ z ] , rExit [ x ] , rExit [Y], rExit [z] are the six rays, then the packet is traversed as shown
by the following pseudocode:
2.4 Packet R a y Tracing 35

char RecursiveRayTraversaiInterval(node, trnin, tmax)


{

i f (node is a leaf node)


{

if(node is empty)
return 0 ;
Proc.essLeafNode (nodeindex) ;
i f (allRaysIntersected)
return true;
)

axisCur= GetAxis(node);
rayDirection = dtaxisCur] > 0;

temp = splitPoint[axis] - o[axisCur];


tSpMin = temp*dRec[rEntry[axisCur]];
tSpMax - temp*dRec[rExit[axisCur]];

if(tSpMax > tmin)


(
allRaysIntersected = RecursiveRayTraversallnterval(node->leftNode, tmini,
MIN(tmaxi,tSpMax));
if (allRaysIntersected)
return allRaysIntersected;

if (tSpMin < tmax)


{
allRaysIntersected = RecursiveRayTraversalInterval(node->rightNode, MAX(
tmini, tSpMin), tmaxi);
}

return allRaysIntersected;
)

L isting 2.6: R ecu rsiv e ray packet traversal a lg o rith m u sing in terval arith m etic. A lg o rith m
co m p u tes ray -sp lit plane in tersec tio n p a ram eters o f tw o b o u n d ary ray s (as show n in F ig u re 2.14)
an d d eterm in es if the en tire packet tra v e rse s the n ode o r not.

As the p se u d o c o d e show s, the trav ersal m eth o d is sim ila r to the sin g le ray traversal. T h e o nly
distin ctio n is in the calc u la tio n o f the t p a ra m e te r and the trav ersal term in atio n . W h e re as only
one ray is to be co n sid ered in the sin g le ray v ersio n , the p ack et v e rsio n uses the tw o rays that
en ter and exit the p lan es along the axis in c o n sid e ra tio n . It is to be no ted that these ray s are
p red eterm in e d fo r the p ack et and hen ce no a d d itio n al c o m p u ta tio n o r d ete rm in a tio n is n ec e ssa ry
d u ring the trav ersal. T he tw o ray s, rEntry[axis] and rExit [axis], are in tersec ted w ith the split
plane to get the tw o p aram eters, tSpMin and tSpMax. T h ese p ara m e te rs are su b stitu ted instead
o f the single t p ara m e te r used in the single ray v ersio n to d eterm in e the n o d es to be trav ersed .
Figure 2.14 sh o w s the node trav ersal w ith fu rth e r clarity.

O nce the p ack et is trav ersed th ro u g h the tree, at the le a f n o d e the ray s are d e co m p o se d in to th e ir
individual c o m p o n e n ts and tested ag ain st the p rim itiv es. H ow ever, to use S SE , th e co m p o n e n t
rays in the p ack et are d e co m p o se d into g ro u p s o f fo u r and in te rse c ted against the p rim itiv e s to
g en erate the final im age. S im ila r to single ray tra c in g , p a ck et ray tracin g can also use early ray
term in atio n w ith the differen ce bein g that all the c o m p o n e n t rays in the p ack et have to have fo u n d
an in tersectio n .

Packet ray tra cin g has brou g h t ab o u t a m ajo r p e rfo rm a n c e gain re su ltin g in in teractiv e to real
2.5 A n ti-A lia sin g a n d In co h eren t R ays 36

b o u n d a ry ra y s tS p litM in

tSplitM ax

split
tS p litM in
plane

tSplitMax > t b tSplitMax > tjrt))


=> traverse left node => traverse left node tSplitMin < tnu_
=> traverse right node
tSplitMin < to-i
=> traverse right node

F ig u re 2.14: K d -tree p a c k e t traversal.

tim e p e rfo rm an c e fo r ray tracin g . H ow ever, a m a jo r d isad v an tag e o f pack et ray tra c ers is that
w hen the rays are n o t c o h eren t, i.e., w h en the d irec tio n s are not sim ilar - freq u en tly o cc u rrin g
for seco n d ary rays like shadow , reflectio n and re fra ctio n rays, packet ray tracin g c an n o t y ield the
sam e p e rfo rm a n ce h en elits as it c an fo r p rim a ry ray s. T h is p ro b lem is sig n ifican t, as seco n d ary
effects are im p erativ e fo r g e n e ra tio n o f high q u ality im ag es.

2.5 Anti-Aliasing and Incoherent Rays

Since ray tracers trace o nly one ray p e r pixel, a liasin g c an o cc u r at the e d g es o f o b je cts. T h ere
have been several so lu tio n s p ro p o sed to allev iate th is p ro b lem . O ne o f the sim p lest m e th o d s is to
su p er-sam p le and trace m ore th an one ray (4 to 16 ray s) p er pixel. T h is in c rea se s the re so lu tio n o f
the re n d ered im age and sig n ifican tly red u ces aliasin g artifacts. O n the o th e r hand, tra cin g several
rays p e r pixel is co m p u ta tio n a lly exp en siv e. H en ce, o th e r m eth o d s have b een attem p ted .

A daptive sam p lin g [W hi80] w o rk s by c o n sid e rin g m ore sam p les at lo c a tio n s w'here alia sin g can
be m ost p ro m in en t. R ays are cast at the fo u r c o rn ers o f the pixel and if one or m o re in ten sities
d iffer sig n ifican tly fro m the o th ers, then m o re ray s are cast inside this area. O n ce the ray s are
traced, the w eig h te d in ten sities are fo u n d an d the p ix e l's c o lo u r is d eterm in ed . T h is m e th o d adds
fu rth er rays lead in g to ad d itio n al cost.

A m an atid es [A m a84] p ro p o se s re p re sen tin g a pix el as a re c ta n g u la r area lead in g to the ray s being
p y ram id al. H ow ever, fo r sim p le r calc u la tio n s, a p p ro x im a tin g the p y ra m id a l v o lu m e to a co n ical
volum e is p ro p o sed . D ue to the fact th a t the co n ic a l ray co v ers a la rg e r area o f the p ix el, the
hard ed g es are red u ced lead in g to a red u ctio n in a lia sin g artifacts. How'ever, tracin g c o n ic a l ray s
im p lies in terse c tin g the acceleratio n stru ctu re an d p rim itiv es w ith co n es, w h ich is ex p en siv e.

A n o th er m eth o d p ro p o sed is sto ch astic sam p lin g w h ich adds ran d o m p e rtu rb a tio n s / jitte r to the
ray lo catio n s in a pixel [C 0 0 8 6 ] to red u ce th e effects o f aliasing. Several ray s that p ass th ro u g h
2.6 D yn am ic R ay Tracing 37

non-uniformly jittered positions inside the pixel are traced. The colour of the pixel is then de­
termined by applying a resampling filter that calculates the value at the pixel. The idea borrows
heavily from the workings of the human eye. Even though the eye uses a limited number of pho­
toreceptors, it is not prone to aliasing. The method can be combined with super-sampling and
adaptive sampling to reduce the effects of aliasing. Cook et al. [CPC84] use stochastic sampling
for further physical effects such as motion blur and depth of field.

Although packet tracers with kd-trees are very fast, the coherence when tracing secondary rays
like shadow, reflection and refraction rays is significantly reduced. The solution proposed is to
return to tracing single rays, but optimizing the algorithm by making use of the SIMD instructions
to traverse multi branching bounding volume hierarchies. QBVHs [DHK08] are BVHs that have
four children at each node. These are constructed by collapsing a binary BVH to a quad BVH. This
reduces the memory storage and memory bandwidth requirements by requiring fewer nodes. At
the same time traversal is achieved using SIMD instructions that allow the same ray to be traced
through the four child nodes simultaneously. In effect, this is opposite to SIMD packet tracing
where four rays are traced through one node. The new Larrabee architecture from Intel [SCS+ 08]
proposes SIMD registers that are more than 4 wide, i.e., 16 wide, and hence multi-BVHs with
more than 4 children have also been investigated [WBB08].

2.6 Dynamic Ray Tracing

Ray tracing dynamic scenes - i.e., scenes that change between frames is an area that is currently
being heavily researched. With advanced methods available for ray tracing static scenes, the speed
of ray tracing has reached interactive to real time performance. However, as discussed, most of
this speed-up is due to the use of sophisticated data structures - mostly kd-trees built using the
SAH. Unfortunately, construction takes a significant amount of time. Even though the creation of
the SAH kd-tree has been achieved in 0 ( N l o y ( N )) time, it is still too slow if the tree needs to be
created before rendering each frame. In addition, it is extremely difficult to update a kd-tree when
triangles in a scene move.

Even then, there have been instances when kd-trees have been used for ray tracing dynamic
scenes. Algorithmically, it was shown that kd-trees could be built in 0 ( N l o g ( N )) [WH06].
However, the constants associated with it are too high for dynamic rendering. Thus, approxi­
mation techniques where a few samples are used to find a reasonably good SAH split have been
attempted [HMS06] [PGSS06]. In addition, due to increased popularity of multi-core proces­
sors, parallelizing the kd-tree building has also been attempted with two [Ben06] and four threads
[SSK07] so that dynamic rendering could be achieved using kd-trees. However, in recent times,
other structures have been used more frequently for dynamic ray tracing.

In a dynamic scene, either the entire structure has to be rebuilt from scratch at each frame or the
data structure is built once at the beginning and updated as and when the scene changes. The
second approach only works when the component triangles of the scenes do not change - i.e.,
triangles only move but triangles are not added or deleted from the scene.

If the data structure has to be rebuilt from scratch, the effective rendering time is the sum of the
structure construction time and rendering time. It is generally believed that when more time is
spent in creating a high quality structure, the rendering time decreases. Thus, an optimal structure
would be one that is relatively easy to create and at the same time efficient to traverse.
2 .7 O th er Visibility' M eth ods 38

One of the simplest data structures to create are grids and have been used for dynamic ray tracing
effectively [WIK+ 06], Since grid construction is very quick, they can be very effective when used
to ray trace dynamic scenes. An algorithm that is not based on the 3DDDA traversal algorithm
is developed for grids, due to the inability of the 3DDDA algorithm to be effective when used
with packets. The algorithm combines packet ray traversal, SIMD instruction usage and frustum
culling. Instead of testing against a single cell of the grid, it intersects a slice of a grid with
the packet to get all the intersected cells. In order to enable fast primitive intersection tests, the
triangles are culled against the frustum formed by the packet. Mailboxing is used to ensure that
intersection tests are not repeated. The combination of these methods enable grids to be used as
an effective method for dynamic ray tracing.

Another class of structures that combines kd-trees and bounding volume hierarchies has also been
investigated. These structures are called Bounding Interval Hierarchies (BIH) by Wachter and
Keller [WK06] and Spatial kd-trees (S-kd-tree) by Havran et al. [HHS06]. Both these structures
concentrate on creating kd-tree like structures but with faster construction. They use two split
planes that effectively form a bounding volume to create a partition. The split planes fully enclose
the primitives so that they are similar to bounding volume hierarchies. They are shown to be
fast to create and traverse leading to an effective structure for ray tracing dynamic scenes. The
differences between BIH and S-kd-tree are in that the BIH used a spatial median-like splitting
method whereas the S-kd-tree used an SAH-like splitting method to identify good partitions.

In recent times, BVHs are also used to achieve dynamic ray tracing [WBS07] [LYMT06] [WMG+ ].
Compared to kd-trees, BVHs are faster to create. In addition, it is relatively simple to update the
BVH when a part of the scene moves. By applying packet ray tracing concepts developed for
kd-trees, BVHs are shown to be an effective structure for dynamic ray tracing.

The vast research in ray tracing shows continued interest in new techniques to effectively under­
take ray tracing. As a visibility determination method, ray tracing can be one of the methods
used, and can be especially beneficial for very large scenes where it shows a logarithmic average
complexity per pixel. However, rasterisation based methods are normally faster than ray tracing
based methods, possibly due to simpler calculations and effective hardware implementations. A
few of the more widely used and relevant methods of visibility determination in rasterisation will
be described. However, since this thesis is mainly concerned with ray tracing like algorithms and
structures, not all rasterisation based methods will be described.

2.7 Other Visibility Methods

Rasterisation, as defined by Hill [HilOO], is the process of taking high-level information like posi­
tions and colours of vertices to determine the colours of many pixels in the frame buffer. Visibility
determination in rasterisation has been achieved using a variety of methods. In most scenes it is
highly likely that there are several objects that overlap a pixel. The first object along the pixel oc­
cludes other objects. Thus, visibility / occlusion determination is very important to obtain accurate
images. Since occluded objects do not contribute to the final image, they need not be rendered.
Thus, by accurate and fast occlusion detection, performance can be improved by processing only
the set of objects that are visible. A few methods to determine visibility [FvDFH90] [HB97] are
by using area subdivision, scanline algorithms, Z-buffer, depth sorting, BSP trees, octrees, occlu­
sion queries and Hierarchical Occlusion Maps. Some of these methods will be described in brief
in the following sections.
2 .7 O th er V isibility M eth ods 39

2.7.1 A rea S u b d iv isio n M eth od s

Area subdivision methods follow a divide-and-conquer strategy to determine visibility. Areas of


the image are considered and if the polygons projected onto this area can be determined unam­
biguously, then they are drawn. Otherwise, the area is divided into smaller areas until the polygons
can be unambiguously determined.

W arnock’s [War69] area subdivision method divides each area into four smaller areas. At each
stage, each area can be classified into one of the cases below.

• The area is surrounded by a single polygon - If there is a single polygon projection that
completely surrounds the area, then the area can be filled with this polygon’s colour. This
is the polygon that is visible from all the pixels in the area being considered.

• Area contains or intersects a single polygon - In either case, the area is first filled with the
background colour. If a single polygon intersects, then the intersecting part of the polygon
is filled with the polygon’s colour. If the polygon is contained by the area, the polygon is
rendered.

• The area is disjoint from the polygons - The polygons have no effect on the area and hence
the area is given the background colour.

• More than one polygon intersects, is contained by, or surrounds the area, but the closest
polygon is a polygon containing the area - In this case, the area is given the colour of the
closest surrounding polygon.

• More than one polygon intersects, is contained by, or surrounds the area, but a closest
surrounding polygon cannot be identified - In this case, the area is further subdivided into
four smaller areas and recursively tested for the above cases until either the determination
can be made or until the pixel level is reached. If, even at the pixel level, the polygon cannot
be identified, the Z values of the polygons at this point are calculated and the polygon with
the closest Z value is selected.

While W amock’s method handled only polygons, Catmull [Cat74] introduces a subdivision method
that handles curved surfaces. By subdividing the curved surfaces themselves into smaller patches,
until a patch only covers a single pixel, the pixels occupied by the surface were identified. A no­
ticeable difference between the two algorithms is that while W arnock’s method subdivides screen
space, Catm ull’s method subdivides the object.

While subdivision methods considered subdivided parts of the screen or objects, scanline algo­
rithms processed the objects one scanline at a time. These methods have been quite popular due
to their simplicity.

2.7.2 S ca n lin e A lg o rith m s

One of the earliest methods of visible surface determination was through the scanline method
[WREE67] [Bou70] [Wat70]. Scan conversion is the process of converting a polygon from the
world space to image space one scanline at a time. The methods described in the above papers
are similar and the method introduced by Wylie et al., who generate images of objects created
with triangles, will be described in brief. The view plane is considered as being made up of a
series of scanlines. Initially, the triangles in the scene are projected to get their screen coordinates.
2 .7 O th er Visibility>M eth ods 40

The vertices of each triangle are sorted according to the Y coordinate and V-entry and exit tables
are created. Using these, for every triangle, a Y occupied flag is maintained that indicates if a
given triangle occupies a Y coordinate. Once this is done, the scanlines are considered one by
one. For each scanline, the triangles intersected are obtained by checking the y-occupied flag.
The X -values of the intersection of the scanline with the projected triangle are calculated, sorted,
and X -entry and X -exit tables are created. The individual pixels along the scanline are considered
next. Similar to a Y -occupied table, an X-occupied table is created. If at a pixel, there is more than
one triangle, the distances between the viewpoint and the triangle in world space are calculated
and the closest triangle is selected to get the triangle for the pixel.

While the scanline method does not involve preprocessing, researchers have also investigated
visibility determination methods whereby the scene is preprocessed. Two of these methods are the
Depth Sorting method and the BSP tree method that essentially provide a priority to the objects in
which they are to be rendered.

2 .7.3 V isib ility D eterm in a tio n by D ep th Sortin g

The depth sorting method was developed by Newell, Newell and Sancha [NNS72], As described
by Foley et al. [FvDFH90], it is a simple method wherein the polygons are first sorted according to
farthest depth. When the depths overlap the polygons are split to resolve ambiguities. In the final
step, the polygons are drawn back-to-front, i.e., according to decreasing depth. This determines
accurate visibility by overwriting objects further away from the viewpoint with closer objects.

2.7.4 V isib ility D eterm in a tio n u sin g a B S P Tree

A BSP tree was used by Fuchs et al. [FKN80] to undertake visibility determination. They propose
a new solution to the approach followed by Schumaker et al. [SBGS69]. The use of the BSP tree
eliminates distance calculations to polygons. A scene containing polygons is taken as input and
a binary tree - called a Binary Space Partitioning Tree (BSP tree) - is built. A simple building
process is used whereby a splitting plane is selected and the polygons are classified as being on
one side or the other of the splitting plane. If a polygon lies on both sides, it is split by the splitting
plane. The first splitting plane is made as the root node. The process is recursively followed until
each node has just one polygon. In order to determine visibility using this tree, a back-to-front
traversal (achieved by an in-order traversal of the tree) determines the polygons that are written.
The order ensures that polygons of lower priority are written before higher priority ones. In other
words, polygons that are farther away are overwritten by polygons closer to the viewpoint to
ensure that the right polygon is written to the right pixel. To handle one of the drawbacks - that
there might be an increase in the number of polygons - a split plane that causes the least number
of splits is selected.

The back-to-front traversal in the above method means that polygons cannot be skipped even if
a closer polygon fully occludes farther polygons. To alleviate this, Gordon and Chen [GC91]
traverse the BSP tree in a front-to-back manner. A scanline method is used to render the polygons
and an auxiliary structure called the dynamic screen is used to identify areas of the screen that can
be rendered over. The structure represents unlit (not yet rendered) pixels of each scanline. The
pixels that the polygon can write to are identified by a merge process. This simple modification is
shown to make rendering more efficient for scenes with a higher number of polygons.
2 .7 O th er V isibility M eth o d s 41

2.7.5 Z -b u ffer

The Z-buffer method o f visibility determination is a simple technique. An additional buffer with
the closest Z value at each pixel is stored. As the rendering progresses, the Z-buffer is updated to
maintain the Z value o f the closest object at that point. When a new object is being rendered, if its
Z value on a certain pixel is determined as being closer than the existing value, this object projects
onto the pixel and the pixel’s Z value is updated. It was first described by Catmull [Cat74]. Due
to its simplicity, the Z-buffer has been one of the most popular methods to determine visibility.
The simplicity also means that it is easily implemented in graphics hardware.

The main problem with the Z-buffer is that it can only determine visibility of a polygon at one
pixel. This implies that each polygon that is being rasterised has to be processed to the single pixel
level. It also means that a polygon cannot be determined as being occluded until it reaches the
pixel level. This is a major disadvantage, as significantly larger number of polygons are processed
in cases where occlusion is high. Compared to ray tracing - that processes very few occluded poly­
gons - this approach to rasterisation appears very primitive. To alleviate some of the problems,
hierarchical visibility methods have been proposed

2.7 .6 H iera rch ica l M eth od s

H ierarchical Z -B uffer
Greene et al. [GKM93] introduce a technique called Hierarchical Z-buffer visibility. The tech­
nique described by them uses a ray tracing structure - an octree, and an adaptation of the tradi­
tional Z-buffer - a hierarchical Z pyramid. The Z pyramid is a hierarchical structure where the
lowest level consists of the normal Z-buffer. Each higher level represents the farthest Z value of
the four values it represents. Whenever a polygon is rasterised, the Z pyramid is updated to keep
the values current. To further improve the speed, the octree nodes are rasterised and checked for
visibility. If the octree node is not visible, then the polygons in it are also not visible and need not
be processed. When the polygons have to be rasterised, the polygon is tested with the appropriate
Z pyramid level by comparing the polygon’s closest Z to the value in the pyramid. If it cannot
be definitively answered, then the pyramid’s next level is checked, again with the closest Z of the
polygon. Although the closest Z of the polygon in the quadrant can be used, it is stated that this
Z value is expensive to compute and hence the simpler approach is used.

H ierarchical Polygon Tiling w ith C overage M asks


Another similar method is the use of coverage masks [Car84]. Coverage masks indicate areas of
the screen that are covered by a polygon. Greene [Gre96] uses a modified version of coverage
masks to accelerate visibility determination. The underlying idea of coverage masks is that for a
given edge, all possible tiling patterns crossing a grid of samples can be pre-computed and later
retrieved - indexed by the points of intersection. By andmg all the coverage masks of all edges
of a polygon, the coverage mask for the polygon can be found. Similarly, by compositing all
previously rendered polygons, the coverage mask for the image can be found. Greene modifies
the coverage mask so that it indicates three states of the edge. A given sample is either inside
the edge - indicated by a state of Covered in the mask, outside the edge - status of Vacant in the
mask, or intersecting the edge - status of Active in the mask. These new masks are called Triage
coverage masks.
2 .7 O th er Visibility’ M eth o d s 42

Using these triage coverage masks, a coverage pyramid for the entire image is built. In the finest
level, one bit coverage masks (indicating only Covered and Vacant states) are used. Thus, for a
512 x 512 image with 8 x 8 oversampling, the finest level of the pyramid consists of 512 x 512 one
bit coverage masks. The higher levels of the pyramid consist of triage masks containing 04 x 04,
8 x 8 and l x l values. Each of the masks represent the corresponding area of the image. When a
pixel is rendered, the lowest level of the mask is updated and the change is propagated upwards.

A BSP tree as described by Fuchs et al. [FKN80] is created and the polygons are processed in
front-to-back order. Each of the polygons are rendered using a Wamock style subdivision method
to make use of the logarithmic search properties of the subdivision method. At each step of the
subdivision, the coverage mask is tested to see if the area being investigated is covered. The cov­
erage mask of the polygon is found by using the pre-computed edge tables. By compositing the
polygon’s coverage mask and the corresponding mask in the coverage mask pyramid, it can be
determined if the polygon is entirely hidden, entirely visible or its visibility is uncertain. When
cells are entirely hidden, they can be ignored. If they are entirely visible, they can be displayed.
When their visibility is uncertain, then they are recursively subdivided. Whenever a pixel is dis­
played, the coverage mask and the pyramid are updated. When all the polygons are recursively
subdivided in a front-to-back manner, the image is generated.

To make the process faster, Greene also proposes to organise the scene into an octree of BSP trees.
An octree is built on the scene in the first stage. In the second stage, at each leaf node of the octree,
a BSP tree is built. First the octree is traversed in a front-to-back manner and the visible nodes are
tested against the coverage masks to determine if they are visible. At a leaf node, the BSP tree is
traversed in a front-to-back manner to render the polygons in the right order. By this process, only
polygons with a high possibility of being visible are tiled, making optimum usage of coverage
masks.

H ierarchical O cclusion M aps


Another hierarchical method to determine visibility is Hierarchical Occlusion Maps (HOMs)
[ZMHH97] [Zha98]. HOMs are adapted for two of the algorithms described in the thesis - Co­
herent Rendering and Row Tracing.

An occlusion map is a gray scale image of the parts of the scene rendered. Using this, a pyramid
or hierarchy of images are created. In this hierarchy, each pixel in each image represents a part
of the image. The pixels indicate whether the corresponding area is occluded. As described, the
benefit of HOMs over coverage masks is that HOMs are generated using graphics hardware.

Prior to rendering, the method selects a set of occluders that are an estimate of the objects that
are likely to be visible. For static scenes, they are selected based on a few criteria like the size of
the object, spatial locality (i.e., the bounding boxes are small compared to the size of the scene
and the bounding boxes should have small aspect ratios - bounding boxes that are too big or
ill-shaped would cause problems for the depth estimation), rendering complexity (simple objects
with low polygon count) and redundancy (objects attached to other objects are not considered).
For dynamic scenes, the occluders are selected at run time. One of the methods used is to select
objects based on their distance from the viewpoint and until a particular object count is reached.
Once occluders are selected, they are rendered and the occlusion maps are created.

To perform the occlusion test, the HOMs created earlier are used. There are two parts to the
occlusion test. The first part tests if an object overlaps an area occupied by the occluders. If the
first test is positive, the second part ensures that the depth of the object being rendered is greater
2 .8 Sum m ary 43

than the occluders, i.e., the occluders are fully in front of the object.

The overlap test checks if the region occupied by the object overlaps an area occupied by the
occluder objects. For this test, the area covered by the object is necessary. This can be achieved
by projecting the object onto the screen. However, this is very prohibitive as the object may be
quite complex. Hence, the bounding box’s eight vertices are projected and the extent is used as
the area to be tested. The traversal of the HOM is started at a level where the bounding box is
enclosed by a pixel (corresponding length and breadth represented by the pixel is just greater than
the bounding box’s). If the pixel identified is opaque then the bounding box and consequently the
object overlaps an area occupied by the occluders.

Once it has been identified that an object overlaps an area cumulatively overlapped by the occlud­
ers, it implies that the object may be occluded if it is completely behind the occluders. This is
tested by a depth estimation buffer. The image area is divided into smaller areas and for each area,
the farthest Z value is stored in the pixel. For each overlapped object, the buffer is tested. If the
object’s nearest depth is greater than the value in the depth buffer, it is occluded.

C oherent H ierarchical C ulling


The Coherent Hierarchical Culling [BWPP04] method makes use of occlusion queries supported
by recent graphics hardware. Hardware occlusion queries allow determination of whether a given
object is occluded or not. The query takes the given object as input and the GPU returns the
visible fragments of the object. The main problem, however, is that there is an associated latency.
One kind of query is an NV_occlusion_query, an OpenGL extension introduced by NVIDIA on
their Geforce 3 graphics cards. The NV_occlusion_query returns the number of visible pixels of
the object. In addition, the NV Query also allows queuing queries before asking for the results.
Bittner et al. aim to utilise this feature to improve visibility performance.

In the simplest method, a kd-tree is used and is traversed front-to-back. The bounding boxes of
the kd-tree nodes are sent to the GPU to test for occlusion. The result is awaited and depending on
the outcome, the tree is traversed. However, due to the latency of the query, this is not an efficient
method. To overcome this, temporal coherence is used to reduce the number of occlusion queries.
Also, the queries are issued and stored in a queue until done by the GPU. This allows interleaving
occlusion determination instructions with instructions to show visible polygons. A breadth first
traversal of the kd-tree based on a priority queue is also proposed to optimize utilization of oc­
clusion queries. The nodes are prioritised according to the inverse of the minimal distance of the
viewpoint to the node’s bounding box. The use of temporal coherence is shown to be an effective
method to accelerate visibility determination when a walk-through of a large model is attempted.

The above hierarchical methods are just a few of the methods that have been used to determine
visibility efficiently in rasterisation. The popularity of hierarchical methods imply that they are
useful for fast determination of visibility.

2.8 Summary

Due to the use of acceleration structures ray tracing has become a viable form of visibility de­
termination / rendering. Acceleration structures enable ray tracing by intersecting with only a
fraction of the primitives in a scene. Furthermore, intelligent structure creation strategies like the
SAH have further improved the performance of ray tracing. The SAH, introduced initially for
2 .8 Sum m ary 44

BVHs, when adapted to kd-trees produces the best structure for ray tracing static scenes. Further
performance benefits, especially for primary rays (relevant for visibility), have been brought about
by making efficient use of image coherence (tracing a group of rays together). Through the use
of SSE, four rays have been traced simultaneously. Groups of rays have also been traced using
frustums or by using interval arithmetic. The introduction and development of packet tracing has
culminated in ray tracing primary rays being considered almost a solved problem. Recent re­
search, focusing more on incoherent rays and dynamic rendering has shown that intelligently built
BVHs can be very competitive with kd-trees - even for static scenes. In addition, BVHs with their
ability to be quickly built and partially rebuilt are believed to be a better structure for ray tracing
dynamic scenes.

At the same time, several other visibility algorithms have also been developed. The Z-buffer
algorithm is widely used due to its availability on most graphics hardware. Hierarchical methods
combining ray tracing structures (kd-trees, octrees, etc) with hierarchical occlusion information
(hierarchical Z-buffer, HOMs, coverage masks, etc) - are also widely researched.

In addition to algorithmic advances, recent hardware developments like SSE and multiple cores
have enabled ray tracing - a highly parallelisable algorithm - to be a feasible alternative to raster­
isation. The investigation of ray tracing structures and algorithms for visibility is thus believed to
merit further consideration.
Chapter 3

RBSP Trees

C on ten ts________________________________________________________________________
3.1 M o tiv a tio n ................................................................................................................... 45
3.2 RBSP Trees C oncept................................................................................................... 47
3.3 Data S tru ctu re............................................................................................................ 49
3.4 C onstruction ................................................................................................................ 51
3.5 RBSP Tree Traversal ............................................................................................... 56
3.6 Data Structure Visualised for Various M odels..................................................... 63
3.7 R e s u lt s ......................................................................................................................... 65
3.8 Further Research on Structures with Non-Axis-Aligned Splitting Planes . . 72
3.9 Sum m ary...................................................................................................................... 75

This chapter introduces and studies a new structure based on kd-trees and BSP trees. The structure
uses several arbitrarily aligned splitting directions to create a space partitioning structure that more
closely wraps the scene. The structure is shown to reduce the number of node traversals as well
as the number of triangle intersections and results in a more efficient structure for ray tracing.
Part of the work in the chapter has been published as - Kammaje, R.P.; Mora, B., “A Study o f
Restricted BSP Trees fo r Ray Tracing,” IEEE Symposium on Interactive Ray Tracing, 2007. R T
2007., pp.55-62, 10-12 Sept. 2007. [KM07]

Ray tracing is one of the most researched areas of Computer Graphics and the use of innovative
data structures and algorithms has made it a feasible, even preferable, rendering method. How­
ever, due to the expensive nature of computations for each ray, single ray tracing is not very
frequently used as a visibility method. The most frequent use of ray tracing is when realistic op­
tical effects are desired. At the same time, it is also acknowledged that ray tracing is scalable,
with an average complexity of l og( N) per pixel (where N is the number of primitives in the
scene) [HBOO] [Hur05] [WSS05] [HHS06] [YLM06] [WBWS01] as compared to a much higher
N x s (where s is the average projection size (in pixels) per triangle) per pixel complexity of Z-
buffer algorithms. Wald et al [WBWS01] show the linear and logarithmic complexities of Z-buffer
based and ray tracing algorithms respectively. Due to this, and ray tracing’s built in occlusion de­
termination property - early ray termination, as the number of primitives in the scene increase, the
complexity advantage of ray tracing manifests itself. For very large models a significant perfor­
mance advantage over the brute force approach of hardware rasterisation based techniques may
be obtained, even for visibility determination.

45
5.7 M o tiva tio n 46

The complexity advantage of ray tracing is realised through data structures like kd-trees, octrees
and BVHs. Restricted Binary Space Partitioning trees, henceforth referred to as RBSP Trees, are
introduced as a new addition to this class of data structures. They are binary trees that subdivide
space into two partitions at each step and classify the primitives into one of the two partitions. The
structure attempts to achieve better ray tracing performance by combining advantageous concepts
of general BSP trees and kd-trees.

3.1 Motivation

The primary motivation for the existence of RBSP trees is rooted in the concept of void area. As
defined in [WHG84], void area is the difference in projected areas of the bounding volume and the
actual model. For ray tracing, the void area indicates the space in which the rays miss the scene,
but still have to be processed. Reducing the void area results in a decrease in the number of these
rays, thus improving the efficiency of ray tracing.

Although, the concept of void area is described for bounding volumes, it is equally applicable
to space subdivision structures like the kd-tree. Each node of a space subdivision structure can
be thought of as a bounding volume for the triangles enclosed by the node. Similar to bounding
volumes, the number of rays that miss the triangles in the node, but still hit the node is given by
the void area. Hence the concept, and the complimentary result pertaining to rays, is also believed
to apply to space subdivision structures like kd-trees, octrees, and BSP trees.

It is also one of the reasons for the efficacy of the Siuface Area Heuristic (SAH) for kd-trees.
Although the SAH for kd-trees very effectively reduces the void area of kd-trees, it is limited
by the kd-tree’s property that splits can only be along the X , Y or Z axes. For scenes that
predominantly consist of triangles that are axis-aligned (e.g. architectural scenes), the kd-tree’s
split axes are highly customised to the scene. Compared to general BSP trees, it also results in
the structure being relatively easy and quick to build. This is due to the fact that the selection of a
split plane requires significantly fewer SAH calculations.

Other kinds of scenes, in which the triangles are more arbitrarily aligned, expose the kd-tree’s
limitation. As Figure 3.1 shows, the difference between the projected areas of the kd-tree and the
actual scene (i.e., the void area) is quite considerable even at lower tree depths. Having potential
split axes that are arbitrary in alignment and number could potentially create a structure with a
smaller void area. This is one of the main motivations for the introduction and study of the RBSP
tree.

The RBSP tree can consist of split planes that are very close to the component triangles’ alignment.
As Figure 3.2 shows, compared to Figure 3.1, the RBSP tree has a considerably smaller void area
than kd-trees. Thus, a study is warranted to explore the promise of better rendering performance
provided by the use of RBSP trees.

Along with the concept of void area, Weghorst et al. [WHG84] also quantise the rendering time in
Equation 2.11 which indicates that the rendering time varies according to four variables. A con­
siderable increase in any of the four variables can significantly affect performance. On the other
hand, efficient structures that reduce the number of bounding volume and primitive intersections
can be very efficient. In addition to the number of intersections, the cost of this intersection is also
a prominent variable with the possibility of increasing the ray tracing cost significantly. Hence, a
3.1 M otivation 47

(a) Kd-tree for B unny (b) Kd-tree for A rm a d illo

F igure 3.1: K d -trees on scen es w ith p re d o m in a n tly n o n -a x is-a lig n e d trian g les. E v en th o u g h the
SA H kd-trees co n v erg e to the m odel quickly, th eir v oid a re a is still sig n ifican t. T h is is b e c a u se
k d -tre es are restricted to u sin g axis alig n ed sp littin g p lan es.

structure that red u ces the n u m b e r o f in te rse c tio n s w h ile b ein g c o m p u ta tio n a lly c h eap to trav erse
w ould be very efficient.

D eterm in in g a ra y -k d -tre e node in te rse c tio n is co m p u tatio n a lly v ery ch eap . H ow ever, p rio r to the
in tro d u ctio n o f the S A H , the kd -tree w as not w idely accep ted as o n e o f the fastest stru c tu re s for
ray tracing. T he n u m b er o f ra y -n o d e trav ersals and ray —p rim itiv e in te rse c tio n s w ith a n o n -S A H
kd-tree w as high. H ow ever, the SA H |M B 9 0 ] - a h eu ristic that effectiv ely re d u ces void a rea and
separates em pty sp aces - c h a n g e s this. T h e k d -tree thus built, re su lts in a m u ch lo w er n u m b e r o f
in tersec tio n s fo r ray tracin g . T h e red u ced n u m b e r o f in te rse c tio n s c o u p le d w ith the c h ea p trav ersal
cost realises the true p o ten tial o f k d -trees - resu ltin g in a sig n ific a n t p e rfo rm a n ce in crease.

B SP trees - in w hich the split axes can be alig n ed a rb itra rily - are the g en eral form o f k d -tre e s.
In its general fo rm , th e B S P tree has several d isa d v a n tag e s th a t m ak e it u n feasib le fo r ray tra c ­
ing [F S 8 8 ] . O ne am o n g th em is the co st o f in te rse c tin g a B S P tree n o d e. A t each trav ersal step,
the co m p u tatio n ally ex p en siv e p la n c -ra y in tersec tio n is n ecessary . H ow ever, if the o th e r d isa d ­
vantages w ere o v erco m e, th e a d v an ta g e a B SP tree p ro v id e s is th at its sp littin g p lan es are selected
from the scene itself. T he B S P tree can . in theory, he a stru ctu re w ith the low est void area p o s­
sible. T his w ould im ply c o n sid e ra b ly red u ced n u m b er o f in te rse c tio n s th at co uld , even w ith the
increased traversal co st o f B S P trees, be m ore efficien t than k d -trees.

T he high degree o f d ifficulty in c o n stru c tin g a good B S P tree is o n e o f the m ain d ra w b a ck s h in d e r­


ing it from bein g a feasib le stru c tu re fo r ray tracin g . T h e m ain ad v a n ta g e o f the B S P tree - th a t its
sp litting p lanes are d raw n from the scen e - tu rn s out be a m ajo r d isad v a n ta g e in p ractice. T h e fact
that there are as m any p o ten tial sp littin g axes as there are tria n g le s in the node b ein g sp lit m ean s
that the best sp littin g plan e can be alo n g any o f these axes. T h e larg e n u m b e r o f p o te n tia l split
po sitio n s that need to be ex a m in e d leads to very high c o n stru c tio n co sts. A n o th er d isa d v a n tag e o f
the B S P tree is that the sp littin g plan e - in d icated by fo u r floats - has to be stored in its e n tirety
at each node. T he in creased m em o ry re q u irem e n ts b ro u g h t ab o u t by this are sign ifican t - at least
3.2 R B S P Trees C o n cep t 48

(a) R BSP tree for B unny (b) R B S P tree for A rm a d illo

F ig u re 3.2: R B S P trees on scen es w ith p re d o m in a n tly n o n -a x is-a lig n e d trian g les. T h e u se o f


several ad d itio n a l sp littin g axes allo w s the R B S P tree to m ore c lo se ly w rap the m o d el and
m in im ise void area.

4 x that req u ired fo r a k d -tree node. It also lead s to d e g rad ed p erfo rm a n c e due to in c rea se d c a c h e
m isses. B ecause o f these d ra w b a c k s, the B S P tree is g en erally not c o n sid e re d as a viab le stru c tu re
fo r ray tracin g .

Irresp ectiv e o f these d isa d v an ta g es, the p ro m ise o f re d u ced void area, resu ltin g in a stru ctu re
w ith sig n ifican tly red u ced in tersec tio n n u m b ers, is hard to ignore. I f a stru ctu re can co m b in e the
re d u ced in tersec tio n co st o f a k d -tree w ith the re d u ced n u m b er o f in te rse c tio n s o f a g e n e ra l B S P
tree w h ile being sim p ler to c o n stru c t, it c o u ld be the ideal stru ctu re fo r ray tracin g . R B S P trees are
an attem p t at such a stru ctu re.

3.2 RBSP Trees Concept

R B S P trees are a sp ec ia lise d fo rm o f B S P trees in w h ich the sp littin g p lan es are se lec ted fro m a
re s tric te d set o f p red e te rm in e d p lan es th a t can be a rb itra rily alig n ed . T he re d u ce d set o f p lan es
pro v id es the o p p o rtu n ity to red u ce the co st o f in tersec tio n in ad d itio n to c o m p a c t re p resen ta tio n .
In co m p a riso n to k d -tre e s, R B S P trees can have a w id e r se le ctio n o f sp littin g p lan es av ailab le
re su ltin g in a red u ced n u m b er o f in tersec tio n s.

T he use o f a stru ctu re w ith n o n -a x is-a lig n e d p lan es is not new to R B S P trees. K ay an d K a-
jiy a [K K 8 6 ] investigate a stru ctu re th at uses slab s o f n o n -a x is-a lig n e d p lan es to build b o u n d in g
v olum e hierarc h ie s. H ow ever, b o u n d in g v olum e h ie ra rc h ie s are m o re ex p en siv e to trav erse than
space p artitio n in g tech n iq u es. In ad d itio n , the lim ita tio n o f h ard w are m e a n t that they c o u ld o n ly
test it w ith a lim ited n u m b e r o f axes. It is also stated th at ad d itio n a l axes w o u ld be very a d v a n ­
tag eo u s. T he R B S P tree ad d resses these by b ein g a sp ace p a rtitio n in g te ch n iq u e w ith a c h e a p e r
trav ersal cost and w ith the p o ssib ility o f a larg er set o f sp littin g axes.

T he sp littin g p lan es are p red e term in e d as a set o f axes, that are n o rm a ls to the actual sp littin g
3.2 R B S P Trees C o n cep t 49

planes. S ubsequently, each sp littin g plan e can be re p re se n ted as a po in t on one o f these axes.
Further, if the m in im u m and m ax im u m p o in ts alo n g an axis fo r the scen e are k n o w n , then any
po in t along the axis b etw een these tw o p o in ts can be re p re sen te d by a d isc re tisa tio n e n a b lin g an
easy and com p ac t rep re se n ta tio n o f the sp littin g plan e. If the X , Y and Z axes are u sed as th e set
o f axes fo r R B S P tree co n stru ctio n , a k d -tree is o b tain ed . T h us, an R B S P tree is a m o re g en eral
form o f a kd-tree.

Scene
Root Node boundaries' ajX+bjy+c.
F igure 3.3: A 2D R B S P tree: In 2D , the sp littin g p lan e s (lin es) can d irec tly be u sed in stead o f
the norm als (as done in 3D ). H ence, the p o ten tial set o f p a rtitio n in g p la n e s arc sh o w n (lab elled
1.2.3,4). U sin g this set o f axes, a tree th a t p a rtitio n s sp ace into tw o p arts at each step is built.

Figure 3.3 show s a 2D R B S P tree and F ig u res 3.1 and 3.2 show k d -trees and R B S P trees built on
the B unny and A rm a d illo m odels. T he figures reveal the p o ten tial o f R B S P trees. F o r sc e n e s that
consist o f p red o m in a n tly n o n -a x is-a lig n e d trian g les, the R B S P tree ap p e a rs to red u c e th e e m p ty
space im m ensely. T he sp littin g p lan es are m ore clo se ly alig n ed w ith th e plan es in the scen e.

T he use o f n o n -a x is-a lig n e d p la n e s is not u n iq u e to R B S P trees. A s d escrib ed in S ectio n 2 .3 .3 .1 ,


the structure d eveloped by K ay and K ajiy a [K K 86] is o n e o f the stru ctu re s that u ses n o n -ax is-
aligned planes. H ow ever, the stru ctu re p ro p o sed by th em is a b o u n d in g v olum e h ie ra rc h y in
contrast to the sp ace su b d iv isio n n atu re o f R B S P trees. T h is is an ad v an ta g e fo r the R B S P tree
as it needs ju s t a lin ear in te rp o latio n or, at w orst, a p la n e -ra y in te rse c tio n at each trav ersal step,
w hereas the B V H req u ires a ra y -b o u n d in g -v o lu m e in te rse c tio n at each trav ersal step. K lo so w sk i
et al. [K H M + 9 8 | d etail a sim ila r stru ctu re w ith m ore p lan es but in v estig ate its use fo r c o llisio n
d etection. K ay and K a jiy a lim it the n u m b er o f p lan e n o rm a ls used as th ey h ad lim ited h a rd w a re ,
how ever in creasin g the n u m b er o f plan e n o rm a ls is m e n tio n e d to be ben eficial.

T he n u m b er o f sp littin g p lan es that the R B S P tree can use is flexible, an d th e o re tic a lly not re ­
stricted to any num ber. H ow ever, as S ectio n 3.4 w ill show , in creasin g the n u m b e r o f split axes
significantly increases the c o n stru c tio n tim e to o b tain a g o o d R B S P tree and thus b e c o m e s a lim ­
iting factor. In p ractice, it is n e cessary to re stric t the n u m b e r o f split axes to aro u n d 24. E ven so,
3 .3 D a ta Structure 50

this is significantly more than what Kay and Kajiya used and allows a range of alignments to be
easily investigated.

In brief, the construction and use of the RBSP tree can be described as follows. Initially, a number
for the split axes is fixed. Depending on the number, the actual split axes are determined, either
manually or using some heuristic. Subsequently, the actual tree is constructed using planes along
these axes. The scene is now ready for ray tracing using the constructed RBSP tree. To ray trace,
the individual rays are traversed through the tree to find the closest intersection and the image is
rendered.

An important feature of RBSP trees is the flexibility provided. RBSP trees can be used to simulate
a kd-tree or a general BSP tree. They can be used both to accelerate ray tracing and also to study
BSP trees. Due to the difficulty of constructing good BSP trees, RBSP trees can be used as a good
substitute.

Geometrically, the RBSP tree has nodes formed by the intersection of a number of planes. Hence­
forth, since the number of splitting axes is variable, it will be represented by m. This implies that
the number of planes is 2m. Due to the planes being arbitrarily aligned, the nodes have a variable
number of faces with a variable number of edges. Since m is the number of splitting axes used,
the maximum number of faces that a node may possibly have is 2m. However, a node need not
have faces corresponding to all the planes. A given node may be missing either one or both the
faces corresponding to a particular plane. This is an important distinction from a kd-tree where
the nodes always consist of six rectangular faces and leads to several distinctions in the creation of
the tree. The mid-point calculation undertaken during the space median heuristic is more involved
to ensure that the mid-point of the two bounding points along the axis is inside the node. During
the SAH process, the calculation of the surface area and the counting of triangles in a node is more
complex. At the same time these geometric properties also ensure that the object is more closely
wrapped by the node.

Although RBSP trees are conceptually simple, sufficient thought must be put into the implemen­
tation for it to be competitive with the extremely well researched kd-tree structure. The main parts
are the data structure representation, construction heuristics and finally ray-structure traversal -
each o f which will be discussed in detail in the following sections.

3.3 Data Structure

Representing the RBSP tree in a compact and easy manner is important if RBSP trees are to be a
viable alternative to kd-trees. It was noted that one of the main drawbacks of the general form of a
BSP tree was its increased memory requirements due to the nodes’ data. In order for RBSP trees to
be an efficient structure for ray tracing, a compact representation is imperative. The RBSP tree has
three pieces of information - information that is common to all the nodes, the nodes themselves,
and a list of triangle pointers contained by all leaf nodes.

The header contains the common data for the tree and consists of the following information.

• List of Predetermined Split Axes - The axes determined prior to construction are stored in
the header as an array of axes. The axes are normal vectors and are represented by three
floating point values. Once the array is stored, the index of the axis in this array is used to
indicate every node’s split axis.
3.3 D a ta Structure 51

• Information needed only when loading or saving the tree.

- Number of Split Axes.

- Number of Nodes.

- Number of Triangles - The number of triangle pointers contained by all the leaf nodes
in the tree.

Compactness is an important attribute for representing nodes as it can affect the performance and
memory requirement of the tree. In order to obtain the most compact representation, a node is
represented by 8 bytes (64 bits). Table 3.1 indicates the exact structure of a node.

Represents Bits used


A unique pointer to children 32
Index to split axis 16
Quantised value of split position 14
Leaf node flag 1
Unused bit 1

Table 3.1: RBSP tree node structure

The tree is stored as an array of nodes with each node pointing to its children through an index.
Since the two child nodes are adjacent, only the index to the first node (ordered front-to-back along
the axis used) is stored. The second child node is implicitly addressed by adding one to the stored
index. Using the array form for representing the tree improves cache performance as the nodes and
its children are closer in memory with this representation than if the pointer / tree representation
were to be used. Also, it enables better memory efficiency through the use of implicit second child
addressing.

With the child node pointer taking up 32 bits, it is important that the remaining data of a node is
represented as efficiently as possible. Normally, the split position is stored using a single floating
point value. However, this requires a further 32 bits which is prohibitive. The split position is the
point along the axis selected where a plane that is normal to the axis splits the node into two parts.
It is a point that lies between the extents of the parent node along this axis. Thus, if this distance
between the end points of the node along the axis is discretised into a certain number of values,
it is possible to use fewer bits to indicate the split point. For an RBSP tree, a discretisation of the
distance into 214 — 1(16383) points provides a good result - both in terms of memory usage and
precision of the split point. Thus, the split point can be indicated in the node with just 14 bits.
Although, there is an unused bit available to increase the accuracy of the discretisation, the current
precision was found to produce good results and hence only 14 bits are used.

Since the header consists of all the potential split axes for the tree, the node’s split axis can be
indicated by just the index to the axis. Even though 16 bits are allocated for representing the
index, the practical limit for the number of split axes is about 32 which can be represented by 5
bits. However, the only other piece of information needed in the node is a flag - requiring just one
bit - to indicate if it is a leaf node. Hence, using 16 bits for the axes allows the data to be easily
accessible.
3 .4 C on stru ction 52

W hen the construction heuristic determines that a node is a leaf node, the structure represents
different information. The information stored in a leaf node is indicated by Table 3.2

Represents Bits used


A pointer to the start of the triangle list 32
The number of triangles in the leaf node 16
Leaf node flag 1
Unused bits 15

Table 3.2: RBSP tree leaf node structure

The leaf node consists of a certain number of triangles. The triangles contained by all the leaf
nodes are stored in a separate list. Each non empty leaf node has an index that indicates the first
triangle contained by it. Together with the number of triangles, the index is used to obtain the
triangles in this node. Figure 3.4 shows this diagramatically.

Tree node array

Leaf node triangle list

Figure 3.4: Leaf nodes and pointers to their component triangles. Nodes in green are leaf nodes
that point to an index in a global list of leaf node triangles.

As shown, the data for a node (both internal as well as a leaf node) requires just 8 bytes. This is
the same amount of memory required for a kd-tree node. Thus one of the claims of the RBSP tree
- that it is compact to represent - is fulfilled. Having described the compact data structure to store
the tree in memory, the construction process itself can be described.

3.4 Construction

The RBSP tree is a recursive structure in which every internal node can be considered as the
root node o f the corresponding subtree. A recursive tree construction process is thus the natural
method to construct the structure. The following pseudocode describes the high level RBSP tree
construction algorithm.
3 .4 C onstruction 53

constructRBSPTreeMain()
{

Vec t o r [] splitAxes = findSplitAxesForTree();


constructRBSPTree(root, alITriangles);
}

Listing 3.1: High level RBSP tree construction. First, the axes for the tree are selected and
subsequently the tree is constructed in a recursive manner.

Before the actual construction of the tree, it is necessary to determine the directions of the potential
split axes. One method to determine the axes automatically is to have them evenly distributed
across space. This is achieved by a heuristic that uses evenly spaced points on a sphere [PSA07],
Once the number of axes that the RBSP tree will use is selected, the same number of evenly
distributed points on a unit sphere with the origin as its center are found. The normalised vectors of
the lines connecting the origin to these points provide a set of vectors that are aligned evenly across
3D space. These vectors are chosen as the potential split axes. However, due to the proliferation
of scenes consisting of predominantly axis-aligned planes, it is desirable to include the X , Y and
Z axes as potential axes. The axes that are closest in alignment to the three coordinate axes are
replaced by the X , Y and Z axes.

The recursive RBSP tree construction algorithm, const ructRBSPTree, is the same as the construc­
tion of a kd-tree, shown in Listing 2.3. The methods utilised for findsplit Axis, findSpiitPosition,
findLe ft Triangles, f indRightTriangies is where it differs from the kd-tree construction al­
gorithm.

The methods findSpiitAxis and findSpl itPositdon select an axis and a split position along
this axis. These methods vary according to the heuristic used. On the other hand, counting and
classifying the number of triangles on each side of the split plane is independent of the heuristic
used. It is more involved than the process followed in the construction of the kd-tree, complicated
by the fact that each node can have a variable number of polygons that themselves have a variable
number of sides. A very straightforward method is used to achieve this. All the triangles in the
node are clipped by all the component planes of the node to find the parts of the triangles that are
contained by the node. Only triangles that have some part within the node are counted as being
inside the node. Clipping the triangles ensures that each node is attributed only those triangles
that have at least some part of themselves in the node.

The construction could be optimised by storing the clipped parts of the triangles for each node
as the construction progresses down the tree. However, since the main aims are to investigate
the advantages offered and to study the RBSP tree's applicability for ray tracing, the construction
process has not been optimised.

Initially, the extremal points of the root node are found for every split axis and stored as a list of
points. The split point at which a node is split becomes the minimum point for one of the child
nodes and the maximum point for the other. The list of node extremities are updated accordingly.
This ensures that the bounding planes for the current node being processed is accurate. These
bounding planes are the clipping planes for the triangles inside the node.

Two heuristics - Space Median and Surface Area Heuristic - to determine the split axis and split
position are investigated. Both have been adapted from their well researched kd-tree versions and
will be discussed in greater detail in Sections 3.4.1 and 3.4.2.

When the split plane for a node is found, the node is split and the triangles are clipped and clas­
3 .4 C on stru ction 54

sified. The process is continued down the tree until a termination criteria is met. When a node
satisfies a termination criteria, it is considered as a leaf node. For the RBSP tree, only two termi­
nation criteria are used - the depth of the node in the tree and the number of triangles in the node.
If the depth reaches a predetermined depth or if there are less than or equal to a certain number
of triangles in the node, the node is made into a leaf node and the recursion stops. The number
of triangles in a node at which it is made into a leaf node is fixed at 2. For the RBSP tree, the
maximum depth is calculated as a function of the number of triangles in the scene. The term given
by Havran and Bittner in [HB02] for kd-trees is used even for RBSP trees. It is:

dynax — k\ l og2( N ) + k 2 (3.1)

where d max ~ is the maximum depth of the tree


N - is the number of triangles in the scene
fci and k-2 - are constants chosen to achieve critical performance.

Havran and Bittner [HB02] use the values of kj = 1.2, k 2 = 2 that is also used for RBSP trees.

3.4.1 S p ace M ed ian C on stru ction

The simplest heuristic for building kd-trees is the space median, in which the axis is selected either
on a round robin basis or the longest axis basis and the mid-point along this axis is selected as the
split point. The same concept can be applied to RBSP trees. If there are m axes, then one of these
axes is selected on a round robin basis. The mid-point along this axis is selected to obtain the split
plane for the node.

Finding the mid-point along arbitrarily aligned axes is slightly more challenging. With kd-trees,
the mid-points along the A", Y or Z are easily found as they are half of the sum of the minimum and
the maximum value of the X , Y and Z coordinates of the node’s bounding box. In comparison,
to find the mid-point along the potential split axes, the bounding box’s eight vertices are projected
onto the axes to find the minimum and maximum projection point of the bounding box along each
axis. The center point between these two extremities gives the necessary point along the axis to
place the split plane.

Figure 3.5 shows that the space median construction results in poor trees, especially for the pur­
pose of ray tracing. The reasons for this are quite obvious. The splits are aligned and placed
arbitrarily. Due to the fact that a round robin scheme is being used for selecting the axes, there is
no relation between the particular node and the splits. The problem is compounded by the split
plane being placed at the mid-point of the axis with no consideration of the location of the trian­
gles in the node. The poor quality of the trees created necessitate a better heuristic to realise the
potential of multiple differently aligned planes.

3.4.2 S u rfa ce A rea H eu ristic

The Surface Area Heuristic (SAH) that results in kd-trees better suited for ray tracing is adapted
to construct RBSP trees. It is based on the hypothesis that the probability of a ray hitting a node’s
triangles is directly proportional to the surface area of the node and the number of triangles in
3 .4 C o n stru ctio n 53

(a) Sp ace m edian R B S P tree for B unny (b) Sp ace m edian R BSP tree for A rm a d illo

F ig u re 3.5: S pace m ed ian R B S P trees on scen es w ith p re d o m in a n tly n o n -a x is-a lig n e d trian g les.
T h e split p lan es are not in tellig en tly p lace d and h en ce the sp ace m ed ian m eth o d to co n stru ct
R B S P trees is not b en eficial.

the node. T h is p ro b a b ility p ro v id es a co st fu n ctio n , given by E q u a tio n 2.12, th a t in c o rp o ra te s the


p ro b ab ility o f a ray h ittin g the node as w ell as the co st o f in te rse c tin g the g e o m e try in it. T h e axis
and po in t at w h ich the cost is a m in im u m p ro v id es the lo cally o p tim u m split p o in t fo r the node.
M in im isin g this cost w h ile selectin g each split p la n e fo rm s th e esse n c e o f the S A H . T h e h e u ristic
fo r k d -trees is very effectiv e and in creases the p e rfo rm a n ce o f ray tracin g su b stan tially .

F or the R B S P tree, a h eu ristic like the SA H is even m ore im p o rta n t. It ad d s in te llig e n c e to the
se lectio n o f a split axis and the p la c e m e n t o f th e split plane. A s F ig u re 3.6 sh o w s, R B S P trees
b uilt w ith the in tellig en ce p ro v id ed by the S A H are m u ch b e tte r th an those b u ilt w ith the space
m ed ian h eu ristic. T h ey have sp lits that arc alig n e d and p lace d in rela tio n to th e c o n te n ts an d
p ro p e rtie s o f the n o d es.

A d a p tin g the SA H to the R B S P tree c o n stru ctio n is c o n c e p tu a lly stra ig h tfo rw a rd as E q u a tio n 2.12
can be used fo r the cost. H ow ever, calc u la tin g the co st is c o m p lic a te d by the n e c e ssity to ex a m in e
several axes in stead o f ju s t three.

T he m ain p u rp o se o f the S A H . w hen a p p lied to the R B S P tre e, is to select th e b est ax es am o n g


the n u m e ro u s axes, and the best split poin t alo n g that se lec ted axis. A ll p o ten tial axes have to be
c o n sid e re d as the o p tim u m co st can be alo n g any o f the p o ten tial axes. T h ere are an infinite n u m b e r
o f p o ten tial split p o in ts alo n g any axis. H o w ev er as the SA H fo r th e k d -tree d o es, th e n u m b e r o f
split p o in ts to be inv estig ated can be lim ited to a finite num ber. E v ery tria n g le is first c lip p e d by
the b o u n d in g box o f the n ode to o b tain an accu ra te list o f tria n g le s c o n ta in e d in the n o d e. S ince
the n u m b er o f trian g les can only d iffer at the tw o e x trem itie s o f th e clip p e d trian g le a lo n g an axis,
o nly these tw o po in ts are a d d ed to the list o f p o ten tial split p o in ts. T h e e x tre m itie s a lo n g an axis
are fou n d by p ro je c tin g the v ertices o f the clip p ed trian g le o n to the axis. B y fo llo w in g th is p ro cess
for every trian g le in the node, a list o f p o ssib le split p o sitio n s a lo n g the axis is fo u n d . F ig u re 3.7
3 .4 C onstruction 56

(a) R B SP tree for B unny (b) R B S P tree for A rm a d illo

F igure 3.6: SA H R B S P trees on scen es w ith p re d o m in a n tly n o n -a x is-a lig n e d tria n g le s. U sin g the
in tellig en ce p ro v id ed by the S A H , the R B S P tree c o n stru c te d is o f b e tte r q uality.

show s the process w ith m ore clarity.

O nce all the p o ten tial split p oin ts alo n g an axis are d e te rm in e d , the SA H co st at e a c h o f these
points is calcu lated . To calc u late the SA H co st, the su rface are a o f the left and rig h t n ode c re ate d
by a split plane placed at the p o in t in c o n sid e ra tio n an d the n u m b e r o f tria n g les in th ese n o d es are
to be d eterm ined . By u sin g the list o f p ro jected end p o in ts, it is sim p le to d e te rm in e the n u m b e r
o f trian g les on e ith er side o f a p o ten tial split point.

C o m puting the surface area o f the tw o n o d es created by the sp lit is not as stra ig h tfo rw a rd . T h is
is co m p o u n d ed bv the fact that the o n ly v alu es sto red fo r a n ode arc the ex tre m itie s a lo n g each
split axis. To calc u late the surface area the n o d e ’s a c tu a l p o ly h e d ro n - fo u n d by c lip p in g ev ery
bounding plane o f the n ode (given by the split axis and an e x tre m ity alo n g the a x is) w ith ev ery
oth er bo u n d in g p lan e - is d eterm in ed . T h e clip p in g p ro c e ss re su lts in a list o f p o ly g o n s - the
faces o f the n o d e ’s b o u n d in g volum e. T h ese faces are then c lip p e d w ith th e sp littin g p lan e in
co n sid eratio n to ob tain the faces o f the tw o sp lit n o d es th at w o u ld be created w ith th is split. A lso ,
the splitting plan e itself is c lip p ed w ith each o f the b o u n d in g p la n es to o b tain the sp littin g p la n e ’s
polygon. T he p o ly g o n s thus fo u n d can then be d e c o m p o se d in to trian g le s to c a lc u la te the su rface
areas o f the tw o nodes p o ten tia lly created by the split p lan e in c o n sid e ra tio n .

P lugging in the values for the n u m b e r o f tria n g les an d the su rfa c e area in E q u atio n 2.12, the SA H
cost at each p o ten tial split p o in t along each p o ten tial split axis is a sce rta in e d . T h e S A H also allo w s
biasing the cost. E m pty n o d es are b eneficial for ray tra c in g as they id en tify em p ty sp ace allo w in g
the rays hitting them to be sk ip p ed . T h is is c o n sid e re d in the h e u ristic for R B S P tree b u ild in g by
reducing the co st to 80% o f the o rig in al co st if a p o ten tial sp lit c re a te s an em p ty node. T h e axis
and the point w ith the m in im u m w eig h ted co st is selected as th e sp littin g p lan e fo r the node.

It is to be noted that w h ile the SA H p ro v id es a m eth o d to c reate fa irly g o o d tre e s, th e tree is not
optim al. T he co st calc u la te d fo r each split is the lo cal o p tim u m th at ap p lies o n ly to that node
irrespective o f e a rlie r and fu tu re splits.
3 .5 R B SP Tree T raversal 57

Node bounding volume

Triangle outside
bounding volume

Triangle end points

Potential split points


Potential split axis

Figure 3.7: Potential split points for SAH along an axis. Projecting the end points of the clipped
triangles onto the axis in consideration gives the potential split plane positions. The SAH cost at
these potential points are calculated and the minimum point is selected as the locally optimal
split position.

3.5 RBSP Tree Traversal

Constructing the tree prepares the scene for ray tracing. Each individual ray is traversed down
the tree in a front-to-back order until it is determined if the ray misses the scene completely or
intersects a primitive. When a ray hits a primitive, the pixel corresponding to the ray is shaded
using the primitive’s properties.

Using the parametric equation of the ray, given by Equation 2.1, the ray can be traversed through
the tree using different methods - each with its own advantages and disadvantages. Each of these
methods are described subsequently.

3.5.1 A lg o rith m 1.1 - Traversal by L in ear In terp olation o f R a y -P la n e In tersection


P a ra m eter

This method for ray traversal is based on the kd-tree traversal [SS92] adapted to the RBSP tree.
The high level algorithm can be given by the following two methods described in pseudocode.
3 .5 R B S P Tree T raversal 58

e x it

s p lit

Splitting plane
N od e boundaries
F igure 3.8: R B S P tree ray traversal (2D ). S im ila r to the k d -tree trav ersal - if t sput > t entry the
left node is trav ersed , and if 1spu t < t exit the rig h t n ode is trav ersed . If both co n d itio n s hold, as in
the figure, b oth n o d es are trav ersed .

int rayTraverse()
{
tMin = INFINITY;
tMax = -INFINITY;
//Compute the intersection parameters
//to the entry and exit planes
//along each axis
for (each split axis index, i)
{
tEntry[i] = findTEntryBoundingPlane(i);
tExit[i] = findTExitBoundingPlane(i);

tMin = maxVal(tEntry, tMin[i]);


tMax = minVal(tExit, tMax[i]);
}
if(tMin > tMax)
return -1; // There is no intersection
// with the bounding volume
// of the node

return recursiveRayTraversal(root, tEntry, tExit, tMin, tMax);

L istin g 3.2: H igh Level R B S P tree traversal alg o rith m . T he alg o rith m in tersects th e ray w ith each
plane o f the b o u n d in g volum e o f the tree. If th ere is no in te rse c tio n , it term in a tes. O th erw ise, the
ray is recu rsiv ely tra v e rse d dow n the tree.
3 .5 R B S P Tree T raversal 59

int. recursiveRayTraversal (RBSPNode node,


float [JtEntry, float [jtExit,
float tMin, float tMax)
{

if (tMin > tMax //ray does not intersect node


o r tMax < 0) //intersection is behind origin
return -1;
if (node is a leaf node)
{

if (any triangle intersects the ray)


return index of first triangle hit by the ray;
return -1;
}

tSplit = findSplitParameter(node, tEntry, tExit);


i f (tSplit > tMin)
<

//Recursively traverse the left node


triang]elndex=recursiveRayTraversa1 (node.left, tEntry, tExit,
rriin (tSplit, tMin), tMax);
i f (trianglelndex != 1 )
return trianglelndex;
}
i f (tSplit < tMax)
(

//Recursively traverse the right node


return recursiveRayTraversal(node.right, tEntry, LExit,
tMin, max(tSplit, tMax));

L istin g 3.3: R ecu rsiv e R B S P tree trav ersal alg o rith m .

A s the p seu d o c o d e show s, th ere are tw o m ain c o m p o n e n ts th at m ak e up the tree traversal. T he


first step is to ca lc u la te the in tersectio n p a ram eters for the m in im u m and m ax im u m b o u n d in g p lane
along each axis. Losing these, the ra y ’s en try and exit p ara m e te rs. t (:ntry and t CXit, w ith respect to
the b o u n d in g v o lu m e o f the tree are found. If it is d e term in e d th at the e n try p a ra m e te r is g re a ter
than the exit p aram eter, the ray d o es not in tersec t the tree an d h en ce need n o t be p ro cessed further.
O th erw ise the ray is trav ersed dow n the tree.

T he ray trav ersal begins at the root n ode and d esc e n d s dow n the tree in a fro n t-to -b ack order.
At each in tern al (n o n -leaf) node, the ra y 's in te rse c tio n p a ra m e te r w ith the split plane o f the node
is ca lc u la te d . It is im p o rtan t to c o n sid e r the d irectio n o f the ray - in d ic a te d by the sign o f the
dot p ro d u ct b etw ee n the ray and the split axis - d u rin g the ca lc u la tio n o f the ra y -sp lit-p la n e
in tersec tio n param eter, /.«,;>/*<. S ince the en try and exit p ara m e te rs alo n g each axis has alread y been
calc u lated . t spilf can be q u ic k ly calc u la te d by lin e a r in terp o latio n o f the c o rre sp o n d in g p aram eters.
It is also to be noted that the node stores a d iscre tisa tio n o f the d ista n c e b e tw e e n the en try and exit
p lan es and hen ce has to be co n v erted back to the actu al value. T he co n v ersio n back to actual
values and the lin e ar in terp o latio n can be c alc u lated as follo w s:
3 .5 R B S P Tree T raversal 60

float findSplitParameter(node, float []tmin, float []tmax)


{

tSplit = node.splitPosVal*(1/16383);
/ /signRDirRec[axis] - Indicates ray direction along the split axis
tEntty = signRDirRec[axis] ? tmin[i] : tmax[i];
tExit = signRDirRec[axis] ? tmax[i] : tmin[i];
return (tEntry + tSplit*(tExit-tEntry));
}

L istin g 3.4: L istin g show s the c alc u la tio n o f the ra y -s p lit-p la n e in te rse c tio n p a ra m e te r by lin e a r
in terp o latio n .

It is to be ensu red that the right values o f t nXis and t axiS p a ram ete rs are selected d ep en d in g
on the sign, o r d irectio n , o f the ray. A n o th er im p le m e n tatio n d etail is that the value o f (1 /1 6 3 8 3 )
( 2 14 — 1 as 14 bits are used to store the q u an tisatio n . Sec T able 3.1) is not calc u la te d at each step,
but is sto red as a co n stan t. T h u s, the en tire p ro ce ss o f fin d in g t spn t is ach ie v ed w ith tw o m u ltip lies,
one ad d itio n and one su b tractio n .

A s show n by F ig u re 3.8. the lo catio n o f the split plan e in re latio n to the node - m ath em atically
rep resen ted by the value o f t spn t in re la tio n to t llltry an d t cxlt - d ete rm in e s the n o d es traversed. If
the t Spnt p a ra m e te r is g rea te r than the t cntry p aram eter, it in d icates that the split p lane is located
at a point alon g the axis that is after the poin t w h ere the entry p lan e is located. In this case, the
first node along the axis is traversed. If the split plan e is lo cated at a p o in t befo re the exit p lane,
i.e., if the ray hits the seco n d node along the axis and the n ode is traversed. If both
the cases are true, as in F ig u re 3.8, then the ray first trav erses the first node. If it does not hit a
trian g le in the first node, the seco n d node is traversed.

T he tree traversal c o n tin u es dowrn the tree until a le a f n o d e is re a c h e d . At a le a f node, the ray m ay
hit one o f the trian g les co n tain ed in the node. A s the tria n g les in the ray are not sorted acco rd in g
to the ra y ’s path, it is n e cessary to test all the trian g les fo r in tersec tio n . T he in tersec tio n p aram ete r
is found for all trian g les in the node using the m ethod by S eg u ra and F eito fS F O l], If the ray hits
m ore than one trian g le, the trian g le w ith the low est / p a ra m e te r - the first trian g le in the ra y ’s path
- is selected. O n the o th er hand, the ray m ay m iss all the tria n g les in the le a f node in w'hich case,
the tree traversal co n tin u es.

T he m ethod, as d escrib ed , is a d irect ad ap ta tio n o f o n e o f the b est kd -tree trav ersal m eth o d s.
H ow ever, as the resu lts show , the p e rfo rm an ce o f th is m eth od d e p re c ia te s w'hen the n u m b er o f
sp littin g axes used is in creased . T h is is d espite the fact that, as ex p ected , the n u m b er o f node
trav ersals and n u m b e r o f trian g le in tersec tio n s show s a c o n siste n t d ecrease. T h is d ep re cia tio n o f
p erfo rm an ce is attrib u ted to the first m eth o d R a yT ra verse. S pecifically , it can be traced to the
c alc u latio n o f entry and exit p a ram eters fo r all the split axes. If in is the n u m b er o f split axes used,
then 2 m ra y -p la n e in te rse c tio n s are necessary. C alc u la tin g the en try and exit t p ara m e te rs fo r each
plan e req u ires three dot p ro d u c ts and one d ivide o p eratio n resu ltin g in a large co m p u tatio n a l cost.
A lso, in cases w h ere the d ep th o f the tree is less than 2in , the n u m b e r o f ra y -p la n e in tersectio n s
w ould be m ore than if the n ode w ere to be in tersec ted in dividually. T h e R a yT ra verse m eth o d thus
dep en d s on m as m u ch as it d o es on lo g ( N ) (the o rd e r o f the tre e 's d ep th ).

To o v erco m e this p ro b lem and realise the true p o ten tial o f R B S P trees, altern ate traversal m eth o d s
are thought o f and are d esc rib e d in the fo llo w in g sections.
3 .5 R B S P Tree T raversal 61

3.5.2 Algorithm 1.2 - Traversal using SSE

S S E in stru ctio n s allow fo u r co m p u ta tio n s to he p e rfo rm ed in p a ra lle l and are used to a d d ress the
p ro b lem o f d e creased p erfo rm an ce. T he p rev io u sly id en tified p ro b lem atic p art o f the trav ersal -
the initial entry and exit plan e in tersec tio n c a lc u latio n - is c o n v e rte d to S S E so that fo u r en try
and exit p aram e te rs arc co m p u ted sim u ltan eo u sly in one ite ra tio n , th u s red u cin g the cost o f this
process. T he relev an t part o f the code in R a yT ra verse c o n v e rte d to S S E is

for (each split axis index, i)


{

tMin[i] = findTEntryBcund in g Pl an e( i );
tMax[i] = fi nd TE x i t B o u n d i n g P l a n e (i );
}

L isting 3.5: Part o f code from R a y T rav erse th at is c o m p u te d using SSE .

It is ch an g ed so that every fo u r split axes are h an d led in o n e ite ra tio n o f the lo o p and can be w ritten
using SSE as show n below .

rayDirSSEX = load rayDirX into all 4 SSE components;


rayDirSSEY - lead rayDirY into all 4 SSE components;
rayDirSSEZ = lead rayDirZ into all 4 SSE components;

for (each 4 split axes starting at index, i)


{

splitAxesX = X coordinate of the 4 split axes


splitAxesY = Y coordinate of the 4 split axes
splitAxesZ = Z coordinate of the 4 split axes

tMinSSEfi/4] = findTEntryBoundingPlaneSSE();
tMaxSSE[i/4] - findTExitBoundingPlaneSSE();
}

L isting 3.6: U sing S S E to accelerate R B S P traversal. In stead o f c o m p u tin g ra y -s p lit plan e


intersectio n s indiv id u ally , S S E is used to co m p u te fo u r en try p lan e an d fo u r exit plan e
in tersec tio n s.

To ob tain the best p erfo rm an ce w ith S SE in stru c tio n s, it is p re fe ra b le to load the in d iv id u al c o m ­


p onent co o rd in a tes o f each ray and each split ax is in to se p a ra te S S E variab les. W ith th is, dot
p ro d u cts req u ire few er in stru ctio n s. U pon c o n v ersio n to S S E . the c a lc u la tio n s o f the e n try and
exit p aram eters are achieved w ith few er iteratio n s and red u ce the d e trim e n ta l effect o f in creasin g
the n u m b er o f plan es. R esu lts show that this m eth o d o f o p tim isin g th e traversal is very effectiv e
as long as the n u m b e r o f axes is relativ ely sm all (b etw e e n 12 an d 16). T h e ad v an ta g e o f S S E
can n o t m ask the co m p u ta tio n a l cost fo r larger n u m b e r o f axes. H ow ever, the resu lts do show that
that m ethod realise s the p o ten tial o f R B S P trees b e tte r and ach ie v e s im p ro v ed p e rfo rm an c e over
an SA H kd-tree.

3.5.3 Alternate Traversal - R ay-Plane Intersection at Each Node

T he p erfo rm an c e o f the tw o traversal m eth o d s d e sc rib e d e a rlie r is d e p e n d e n t on the n u m b er o f split


axes used. H ow ever, if p o ssib le, it is preferab le fo r the trav ersal p e rfo rm a n c e to be in d e p en d e n t o f
the n u m b e r o f axes. To ach iev e this, the ra y -sp lit-p la n e in te rse c tio n p a ra m e te r calc u latio n uses the
3.5 R B S P Tree T raversal 62

r a y - p la n e in terse c tio n , as show n in L istin g 3.8, in stead o f lin e a r in terp o la tio n used by the e a rlie r
m eth o d , as sho w n in L isting 3.4, to o b tain the in tersec tio n p aram eter. A lth o u g h the p e rfo rm an c e
is w o rse for R B S P trees co n stru c te d w ith a sm all n u m b er o f split axes, it m ak es the re n d e rin g tim e
a fu n ctio n o f the depth o f the tree ra th e r th an the n u m b e r o f split axes. T h is m e th o d w o u ld th u s
be v ery useful to study and u n d erstan d the effect o f n u m b er o f split axes on R B S P trees. T he new
high level a lg o rith m c an be given as:

int rayTraverse()
{
//Determine two planes of bounding volume
//at which the ray enters and exits
findEntryPiane (entryPlanePoint, entryAxis);
findExitPlane(exitPlanePoint, exitAxis);

//Find the entry and exit


//ray-plane intersection parameters
tEntry = findRayPlaneintersection(ray, entryPlanePoint, entryAxis);
tExit = findRayPlanelntersection(ray, exitPlanePoint, exitAxis);

if(tEntry > tExit)


return -1; //There is no intersection
//between the ray and the
//node's bounding volume

return recursiveRayTraversal(root, tMin, tMax, tEntry, tExit);


}

L istin g 3.7: H igh level alg o rith m fo r A ltern ate R B S P trav ersal. In th is m eth o d , the e n try and exit
p lan es arc d ete rm in e d using eith e r the O p e n G L m eth o d (d e sc rib e d in S ectio n 3 .5 .3 .1 ) o r the
recu rsiv e d iv id e m ethod (d escrib ed in S ectio n 3 .5 .3 .2 ). C o d e in b lue show s th e m o d ific a tio n s
co m p ared to the e a rlie r high level a lg o rith m (given by L istin g 3.2)

C o m p a re d to the high level alg o rith m d esc rib e d in S e c tio n 3.5.1, the m ain d iffe re n c e (sh o w n
in b lue in L istin g 3.7) is that altern ate m eth o d s are n e ce ssa ry to d e te rm in e the e n try and exit
planes, as w e do not w ant to co m p u te ra y -p la n e in te rse c tio n s fo r all 2 m p lan es. C o n seq u en tly ,
the c a lc u la tio n o f the ra y -sp lit plan e in tersec tio n param eter, tSplit c a n n o t be d one u sing lin ear
in terp o latio n and is now ach iev ed using a ra y -p la n e in tersec tio n as show n by the p se u d o c o d e
below :

float findSplitParameterPlaneRaylntersection(node)
(
//Get the split position of the node along the axis
tSplit = node.splitPosVal*(1/16383);
//Find actual split point using linear interpolation
splitPoint = bvPointFront[axis]
+ tSplit * (bvPointBack [axis] - bvPointFront[axis));
//Intersect the ray with the plane
/ / (plane is given by the normal and the splitpoint)
tSplit = findRayPlanelntersection(ray, splitPoint, planeNormals[axis]);
return tSplit;
)

L istin g 3.8: C a lcu latin g the ra y -s p lit p lan e in terse c tio n by using a full ra y - p la n e in te rse c tio n .
T his is n ecessary as lin e a r in terp o latio n o f the ray can n o t be p erfo rm e d as startin g and en d in g
po in ts have not been ca lc u la te d earlier.
3 .5 R B S P Tree T raversal 63

O th e r than the c o m p u ta tio n o f the tSplit p aram eter, the ray trav ersal is th e sam e as show n by
L istin g 3.3.

3.5.3.1 A lgorithm 2.1 - E ntry and Exit Plane D eterm ination w ith O penG L

It is not straig h tfo rw a rd to d eterm in e the en try and exit p la n e s o f th e root n o d es w ith o u t in te rse c t­
ing the ray w ith each b o u n d in g p lan e o f the root node. T h e first altern ativ e is to em p lo y O p e n G L
to find th ese p lan es. O p en G L o r raste risa tio n m e th o d s are v ery q u ic k if the n u m b e r o f ren d e re d
tria n g le s is very sm all. T he faces o f the ro o t n o d e are fo u n d by clip p in g ev ery b o u n d in g plan e
ag ain st every o th e r b o u n d in g plan e. E ach face is assig n ed a u n iq u e c o lo u r fo r id en tificatio n . T he
en try p lan es and exit p lan es are sep arately ren d ered w ith O p en G L . F or the p a rtic u la r p ix el b e ­
ing ray traced , the en try and exit p lan es can be d e te rm in e d by c h e c k in g th e p ix e l’s co lo u r in the
ren d e re d im ag es. F ig u re 3.9 show s the en try p lan es b ein g e asily id en tifiab le by th e ir co lo u rs.

F ig u re 3.9: R B S P tree ro o t node fo r the B u n n y w ith e a c h p la n e c o lo u re d d ifferently.

O nce the en try and exit p lan es are d eterm in ed fo r a p a rtic u la r ray (p ix el), the ra y -p la n e in tersec tio n
is used to c a lc u la te t (,ntry an d t ex,i for the ray. T he m eth o d is very q u ic k and the p ro je c tio n o f
the b o u n d in g p lan es is p e rfo rm ed ju s t once at the b eg in n in g o f ray tracin g . T he m eth o d w o rk s
very w ell fo r sc e n e s w h ere the v iew p o in t is alw ay s o u tsid e the b o u n d in g v o lu m e o f the root node.
W h en the v iew p o in t m oves inside the scene, the en try p lan es are b eh in d the v iew p o in t c au sin g
O p e n G L p ro je c tio n to fail. H ence, an altern ate m eth o d is n ecessary .
3 .5 R B S P Tree T raversal 64

3.5.3.2 A lgorithm 2.2 - E n try and Exit Plane D eterm ination by Recursive Divide

A recu rsiv e p ro c e ss is fo rm u la te d as a so lu tio n to the p ro b lem o f fin d in g the e n try an d ex it p lan es.
T h e m ethod uses the p ro p erty that in a convex p o ly h ed ro n , if fo u r b o u n d a ry p ix els o f a re c ta n g u la r
reg io n lie on the sam e face, then all the pix els that lie w ith in th ese fo u r pix els a lso lie on the sam e
face. T he pro cess to find the en try and exit p lan es is v ery sim ila r and h en ce o n ly the p ro c e ss to
d eterm in e the en try p lan e is d e scrib ed in g re a te r detail.

determineEntryPlane(RectImageRegion r)
{
if(r does not contain any pixels within)
return;
//Find the entry plane indices for
//the four boundary pixels of r
int entryPlanel, entryPlane2,
entryPlane3,entryPlane4;

entryPlanel = findEntryPlane(r.cornerl);
entryPlane2 = findEntryPlane(r .corner2);
entry?lane3 = findEntryPlane (r.corner!);
entryPlane4 = findEntryPlane(r.corner4);

if (entryPlanel == entryPlane2 and


entryPlar>e2 == entryPlane3 and
entryPlane3 == entryPlane4)
{
//Set the entry plane indices of
//all the pixels inside r
setAllEntryPlanes (r) ;
return;
}
else
{ /
rectImageRegion rl, r2, r3, r4;
spl it.Reg: on Int oFour (r, rl, r2, r3, r4) ;
determineEntryPlane(rl);
determineEnt ryPlane(r2);
determineEntryPlane(r3);
determineEnt ryPlane(r4);

L isting 3.9: D eterm in in g the en try p lan e fo r a gro u p o f ray s u sin g a re c u rsiv e p ro c e ss.

T he en try plan e o f the fo u r c o rn e r p ixels is c a lc u la te d by finding the in te rse c tio n o f th e ray w ith
all the b o u n d in g p lan es o f th e node. If they are the sam e, th en e ach ray that co rre sp o n d s to a pixel
w ithin this rec ta n g u la r region has the sam e entry plan e. W h en the entry p la n e s are d iffe re n t, the
region is split into fo u r sm a lle r re cta n g u la r re g io n s and a sim ilar p ro c e ss is re c u rsiv ely fo llo w ed
until the entry p lan es o f all the p ixels have b een found. It m ay a p p e a r that a sig n ifican t n u m b e r
o f en try and exit p lan es have to be fo und. H ow ever, the en try and exit p lan es o f very few p ixels
need to be d eterm in e d and this m eth o d is fo u n d to be v ery efficient. It is slig h tly slo w e r than the
O p en G L p ro jectio n m ethod, but is b e tter than c a lc u la tin g all the e n try p lan es fo r all the rays. It is
also accu rate fo r all v iew p o in ts o f all scen es and is th erefo re p referred . F ig u re 3 .1 0 sh o w s a few
steps o f the recu rsiv e m ethod.
3 .6 D ata S tru ctu re V isu a lised f o r Various M o d e ls 65

F ig u re 3.10: R B S P tree b o u n d in g volu m e / ro o t node o f the B u n n y w ith e a c h b o u n d in g plane


co lo u red d ifferen tly illu stratin g the recu rsiv e d ivide m eth o d . If the fo u r c o rn e r p ixels o f a
re c ta n g u la r reg io n have the sam e colour, then all the pix els in sid e th is reg io n have the sam e entry
plane. O th erw ise, the reg io n is su b d iv id ed into fo u r sm aller reg io n s and the fo u r c o rn e r p ixels are
tested again.

3.6 Data Structure Visualised for Various Models

To v isu alise the d ata stru ctu re as built on various m o d els, the n o n em p ty n o d es at a p articu lar
dep th are show n. L e a f n o d es th at are at this or h ig h er d ep th s are a lso show n. T h is v isu alisatio n
en ab le s iden tificatio n o f n o d es trav ersed by the rays at the given d ep th .

F igure 3.11 show s this v isu alisa tio n fo r R B S P trees built on the B u n n y. T he n o n em p ty no d es and
le a f nodes are show n at v ario u s d ep th s to show the trav ersed n odes. It is c le a r th at trees built w ith
m ore sp litting axes m o re clo sely w rap the m o d el, in d ic atin g that fe w e r ray s trav erse th ro u g h each
level as the n u m b e r o f sp littin g axes rise.

F or a few o th er m o d els, the sam e (non e m p ty and leaf) n o d es are sh o w n in F ig u re 3.12, but only at
a d ep th o f 16. T he figure show s a sim ilar resu lt to those seen in the B u n n y. A s th e n u m b er o f axes
in crease, the tree m o re clo sely w rap s the m odel resu ltin g in a re d u c tio n in th e n u m b e r o f rays that
trav erse th ro u g h the stru ctu re. T h is p o in ts to re d u c ed n ode trav e rsa ls an d tria n g le in te rse c tio n s to
ray trace the im age. T his fact is c o rro b o ra ted by the resu lts in S ectio n 3.7.
3 .7 R esu lts 66

N um ber o f
A xes

D ep th 0

D ep th 4

D ep th 8

D epth 12

D ep th 16

F ig u re 3.11: R B S P tree on the B u n n y u sin g 3 ,8 ,1 6 and 24 p la n e s at v a rio u s d ep th s. Im a g e s show


the n o n -em p ty n o d es at the given dep th and le a f n o d es at h ig h e r d ep th s. T h e v isu a lisa tio n sh o w s
th at R B S P trees b uilt w ith m ore axes converge to the m o d el m o re q u ic k ly th an th o se b uilt w ith
few er axes.

3.7 Results

T he aim o f the new d ata stru ctu re is to en ab le efficien t ren d erin g p e rfo rm an c e by red u c in g the
n u m b er o f node trav ersals and the n u m b er o f trian g le in te rse c tio n s. T h u s, th ese n u m b e rs in a d d i­
tion to the ren d erin g tim es u sing the v ario u s traversal m eth o d s d e sc rib e d w ill be p re se n te d in this
section.

To accu rately re p re se n t real w orld p erfo rm a n ce , several sc e n e s are ray traced u sin g th e R B S P
tree. For each scen e used, several v iew p o in ts are ch o sen and the im ag es are g en e ra te d fro m this
view p o in t. Table 3.3 show s the m o d els and the v iew p o in ts u sed .
3 .7 R esu lts 67

S cene S p o n za B unny A rm ad illo D rag o n H ap p y B u d d h a


V iew s

T able 3.3: S cen es and v iew p o in ts used to b en c h m a rk R B S R trees.


3 .7 R e su lts 68

B o u n d in g B ox

3 p la n e s

8 p lan es

16 plan es

24 plan es

F ig u re 3.12: Im ag es sh o w in g the b o u n d in g box an d the R B S P tree u sin g 3 ,8 ,1 6 an d 24 p lan e s fo r


few o th e r m o d e ls. Im ages show the non em p ty n o d es at d e p th 16 an d le a f n o d e s at h ig h e r d ep th s.
A s m ore axes are u sed to build the R B S P trees, th e ir q u a lity im p ro v es.

3.7.1 Node Traversals and Triangle Intersections

T he g rap h s in F ig u re s 3.13 and 3.14 show the n u m b er o f n o d e trav ersals and trian g le in tersec tio n s
re s p ectiv ely fo r v arious m o d els u sing R B S P trees b u ilt w ith v ary in g n u m b e rs o f sp lit axes. R B S P
trees w ith 3, 4, 8, 12, 16, 20 and 24 p lan es have been built to co m p a re the effect o f u sin g v ariab ly
alig n ed and n u m b ere d sp littin g p lan es. F o r n o n -a x is-alig n e d scen es, the a d v an ta g e p ro v id e d by
R B S P trees is v ery clear. A s the n u m b e r o f splitting axes rise, th ere is a re d u c tio n in th e n u m b e r o f
node trav ersals an d trian g le in tersec tio n s. S ection 3.6 v isu ally d e m o n strate d th e re d u c tio n in void
area o f R B S P trees as n u m b er o f axes used to build th em in crease. T h e figures p ro v id e statistical
su p p o rt by sh o w in g red u ced in tersec tio n and trav ersal n u m b ers. If th is red u ctio n is tra n sla te d to
p erfo rm a n c e term s, R B S P tree w ould be a very g o o d d ata stru ctu re fo r ray tracin g .

T hey also rev eal that it is n o t a very g ood stru ctu re fo r scen es c o n sistin g o f p re d o m in a n tly axis-
alig n ed trian g les. T he a v ailab ility o f d ifferen tly a lign ed p lan e s c a u ses selectio n o f sp littin g p lan es
that arc not clo se ly alig n ed w ith the trian g les. T h is asp ect reveals a p ro b le m w ith the split plan e
selectio n h e u ristic - the S A H . It also su g g ests the su itab ility o f k d -tre e s fo r su ch sc en es as the
3 .7 R esu lts 69

Armadillo
• Bunny
Dragon
H appyBuddha
— S phere
■* S p o n za

No of Split A xes

F ig u re 3.13: V ariation o f node trav ersals p er pixel o v er n u m b e r o f sp lit axes used to b uild the
R B S P tree for ray tra c in g several m o d els. T h e m ore the n u m b e r o f ax es, the lo w er the n u m b e r o f
node trav ersals (ex cep t for the S p o n za scen e).

— Armadillo
• Bunny
Dragon
H appyBudctia
— S phere
♦ S p o n z a _____

No of SpHf A xes

F igure 3.14: V ariation o f tria n g le in te rse c tio n s p er pixel o v er n u m b e r o f split ax es used to build
the R B S P tree for ray tracin g several m o d els. A s the n u m b e r o f ax es u sed in c re a se , the n u m b e r o f
trian g le in te rse c tio n s d ecrease.

av ailab ility o f several sp littin g a lig n m e n ts is not ad v a n ta g e o u s and c o u ld in fac t be d e trim e n ta l


due to m ore exp en siv e trav ersals.

O n the o th er h an d , the ad v a n ta g e s arc c le a rly rev ealed by the re su lts on the S p h ere m o d el. F o r this
m odel, the split axes arc p erfectly alig n ed , as they are ch o se n by u sin g ev en ly sp ac e d p o in ts on a
sphere. T h ese axes are alm o st c u sto m ised to a sp h ere an d h en ce as the n u m b e r o f axes in crease,
3 .7 R esu lts 70

the tree b eco m e s m ore clo sely w ra p p ed to the m odel. T h e R B S P tree for this m o d el is v ery c lo se
to a B SP tree built on the m odel as the sp littin g p lan es are ac tu a lly d raw n fro m the m o d el. T h is
also su g g ests th at if the sp littin g axes are ch o sen so that they are in clo se a lig n m e n t to the scen e
tria n g le s’ alig n m en t, the R B S P tree w o u ld be an even b etter stru c tu re fo r ray tracin g .

3.7.2 Rendering Times

Scene N am e R en d e rin g tim es w ith v ario u s m eth o d s and v ario u s n u m b e r o f axes


Avg Rendering Times D 3p D 4p B 8p D 12p B16p D 20p B 24p

2000

1000

Rendering
time in ms
1 1 Algorithm u sin g linear interp o latio n 2 1 A lgorithm usin g O p e n G L
1 2 - Algorithm u sin g S S E 2 2 - A lgorithm usin g R e c u rs iv e divide
A rm adillo
Avg Rendering Times D 3 p 0 4p ■ Sp 0 1 2 p B 1 6 p D 2 0 p B 2 4 p
2000

1000

Rendering
1.1 1.2 2.1 2.2
time in ms 1 1 Algorithm u sing linear interpolation 2 1 A lgorithm u sin g O penG L
1 .2 - Algorithm u sin g S S E 2 2 A lgorithm usin g R e c u rs iv e divide
B unny
3 .7 R esu lts 71

Avg Rendering Times D 3 p D 4p B 8 p D 12p B 1 6 p D 2 0 p B 2 4 p

2000

1000

Rendering
1.1 1.2 2.1 2.2
time in ms 1 1 • Algorithm u sin g linear interpolation 2 1 Algorithm u sin g O p e n G L
I 1 2 - A lgorithm u sin g S S E 2 2 Algorithm u sin g R e c u r s iv e divide
D rag o n I_______
Avg Rendering Times □ 3p □ 4p ■ 8p □ 12p ■ 16p □ 20p ■ 24p
2000

1000

Rendering
1.1 1.2 2.1 2.2
time in ms 1 1 A lgorithm usin g linear interpolation 2 1 Algorithm u s in g O p e n G L
1 2 - A lgorithm u sin g S S E 2 2 A lgorithm u s in g R e c u rs iv e divide
H appy B uddha L _ _ _
13 Avg Rendering Times ■ D 3 p D 4 p B 8 p □ 12p ■ 16p D 2 0 p B 2 4 p

3000

S phere
2000

1000

Rendering
time in ms
mm l 1.1 1.2
1 1 - A lgonthm usin g lin ear interpolation
1 2 Algorithm usin g S S E
2.1
2 1 A lgorithm u s in g O penG L
2.2
2 2 A lgorithm u s in g R e c u rs iv e divide
3 .7 R esu lts 72

Avg Rendering Times □ 3p □ 4p ■ 8p □ 12p ■ 16p □ 20p ■ 24p

3000

2000

1000

Rendering
1.1 1.2 2.1 2.2
time in ms 1 1 - A lgorithm u sin g lin ear interpolation 2 1 Algorithm u s in g O penG L
1 2 A lgorithm u sin g S S E 2 2 - Algorithm u s in g R e c u rs iv e divide
S ponza

T able 3.4: R ay tra c in g tim es w ith d ifferen t m eth o d s on R B S P trees w ith d ifferen t n u m b e r o f
sp littin g axes fo r v ario u s scen es. T h e various alg o rith m s are:
A lg o rith m 1 . 1 - T rav ersal by L in ea r In te rp o la tio n o f R a y -P la n e In te rse c tio n P aram eter.
A lg o rith m 1.2 - T rav ersal u sing SSE .
A lg o rith m 2.1 - E n try and E xit P lane D e term in a tio n w ith O p en G L .
A lg o rith m 2.2 - E n try and E x it P lan e D e term in atio n by R ecu rsiv e D iv id e.

S ectio n 3.5 d escrib ed four d ifferen t m eth o d s to trav erse the R B S P tree. In o rd e r to e v alu a te the
m erits o f each o f th ese m eth o d s fo r ray tracin g , the scen es have b een ren d e re d w ith the trees using
all the fo u r different traversal m eth o d s.

A lgorithm 1.1 - T rav ersal sim ila r to k d -tree trav ersal by u sin g a lin ear in terp o la tio n at e a c h step.
T he resu lts fo r this m eth o d reveal the p ro b lem d isc u sse d in the e a rlie r sectio n . In th is m eth o d ,
initially an in tersec tio n p a ra m e te r c a lc u latio n fo r each axis is n ecessary . T h is c a lc u la tio n is quite
expensive, involving th ree dot p ro d u c ts and a d ivide fo r each axis. A s the n u m b e r o f axes used
increase, these ca lc u la tio n s d o m in ate the re n d erin g tim es. T h u s, th e re n d e rin g tim es in c re a se as
the n u m b er o f axes used in crease even th o u g h there is a d ecrease in the n u m b e r o f tra v e rsa ls and
in tersectio n s.

A lgorithm 1.2 - T rav ersal u sin g S S E fo r initial c alc u latio n s. T h is m eth o d realise s th e p o ten tia l
o f R B S P trees. In this m eth o d , the initial c a lc u la tio n s are c o m p u te d in g ro u p s o f fo u r w ith S SE
in stru ctio n s. H en ce, as the n u m b er o f axes rise, in itially there is a n o tic e a b le im p ro v em e n t in
p erfo rm an ce. H ow ever, the p erfo rm an c e d ete rio rate s w hen the n u m b e r o f axes rise above a certain
num ber. T his is sim ilar to M eth o d 3.5.1, but it ju s t tak es m ore axes fo r these to d o m in ate . It is to
be noted that this m eth o d resu lts in a very fast ray tracin g m eth o d - even b e tte r than on k d -trees.
A s long as the n u m b e r o f axes used is b elow a c ertain th resh o ld - p o ssib ly 12 - th is ren d e rin g
m ethod resu lts in the best p e rfo rm an ce fo r ray tracin g w ith R B S P trees.

A lgorithm s 2.1 and 2.2 - T h ese are essen tia lly the sam e m eth o d w ith slig h tly d ifferen t ap ­
pro ach es to finding the e n try and exit plan es. H ow ever, the m eth o d s o f finding the e n try and exit
plan es are qu ite efficient fo r both these m eth o d s, w ith the recu rsiv e div id e m e th o d b ein g slig h tly
slow er, as show n by the resu lts. T h e g rap h s for th ese tw o m eth o d s are very sim ila r sh o w in g that
the p erfo rm an c e in c re a ses as the n u m b e r o f axes in crease. T he p e rfo rm an c e in crease d ire ctly re­
flects the im pro v ed n u m b ers o f in tersec tio n s and trav ersals. T h e d isa d v a n ta g e o f these m eth o d s
3 .7 R esu lts 73

is th at they are g en erally slo w er than e ith e r m eth o d 3.5.1 o r 3.5.2 due to in cre ase d n u m b e r o f
r a y -p la n e in tersectio n s.

A s w ith the in tersec tio n n u m b er resu lts, p e rfo rm a n ce resu lts sh o w that R B S P trees p ro v id e e x ­
cep tio n al p erfo rm an c e fo r the Sphere m odel w h ere the axes are p erfectly alig n e d to th e m o d el.
R esu lts fo r m eth o d 3.5 .2 show that sig n ifican t p e rfo rm a n ce in c re a se is p o ssib le w ith R B S P trees.
T h ey reiterate the fact, th at if p ro p erly a d ap te d to the scen e, R B S P trees p ro d u ce trees that are b e l­
te r th an kd -trees fo r ray tracin g scen es w ith n o n -a x is-a lig n e d trian g les. T h e re su lts fo r the S p o n za
scen e show that fo r scen es w ith p re d o m in a n tly ax is-a lig n e d tria n g le s, R B S P trees co n stru c te d w ith
the cu rren t SA H is w o rse th an k d -trees.

3 .7.3 C on stru ction T im es

6 0 ( 101) - Bunrn
•- A rm adillo
D ragon
50000
H appy Buddha
— Sphere
o
% 40000 Sp onza

= 30000

o
= 20000

o
° 10000

10 15 20

Number of splitting axes

F ig u re 3.15: V ariation o f c o n stru c tio n tim es o f R B S P trees o v er n u m b e r o f split axes fo r v a rio u s


m odels. F ro m the d ata g ra p h ed it w as d e d u ced that e m p iric al co m p le x ity o f R B S P tree
c o n stru c tio n is O ( m ] c>N I o g 2( N ) ) .

T he co n stru ctio n tim es fo r R B S P trees is show n to be h ig h ly d e p e n d e n t on the n u m b e r o f sp littin g


axes used to build the trees. T he g rap h s fo r every m o d el is id e n tic al and c lo se r in sp ec tio n rev eals
th a t the em pirical co m p lex ity o f tree co n stru ctio n ap p ears to be ap p ro x im a tely O (??71 6N l o g 2 ( N ) )
1 w here m is the n u m b e r o f d ire c tio n s used and N is th e n u m b e r o f trian g le s in the scen e. T h is
co m p lex ity m ak es it hig h ly im p ractical to use m ore th an 24 split axes as the tim e n e c e ssa ry fo r
co n stru ctio n beco m e s p ro h ib itiv e.
3 .8 F u rth er R esea rch on S tru ctu res w ith N o n -A x is-A lig n e d S p littin g P la n es 74

Armadillo
Bunny
— Dragon
-*■ Happy Buddha
— S p h ere

— S p on za
6 06

604

6 02

5 98

F ig u re 3.16: Faces p e r n o d e o f R B S P trees w ith v ario u s n u m b e r o f split axes fo r v ario u s m o d els.


T he n u m b er o f faces p e r n ode ap p ears to be very close to six. irre sp e c tiv e o f the n u m b e r o f
sp littin g axes used.

3.7.4 N u m b er o f F aces in N ode

E x am in in g the av erag e n u m b e r o f faces o f all the n o d es o f R B S P trees w ith d iffe re n t n u m b e r o f


sp littin g axes, reveals that it is very clo se to six irresp ectiv e o f th e n u m b er o f axes. T h is is an
in terestin g result. D u ring c o n stru c tio n o f the tree, it im p lies th at o n ly six p la n es are relev an t and
this result m ay be used to red u ce the co m p le x ity o f co n stru c tio n .

3.8 Further Research on Structures with Non-Axis-Aligned Splitting


Planes

T he in troduction o f the R B S P tree in [K M 07] has sparked in te re st in stru c tu re s that use n o n -ax is-
aligned splitting p lan es [B C N J0 8 ] [IW P 0 8 ]. B udge et al. |B C N J()8 ] attem p t to ad d re ss so m e o f
the p ro b lem s alread y m en tio n ed in this c h a p te r like slow c o n stru c tio n tim es. H av in g n o tice d the
p o ten tial o f using n u m e ro u s n o n -a x is-a lig n e d splitting p lan es in R B S P trees, Ize et al. [IW P 0 8 |
attem p t to use the even m o re g en eral fo rm o f B S P trees th at allo w s a rb itra rily alig n e d p la n es. T he
m ain co n trib u tio n s o f th ese tw o p u b lic a tio n s and c o m p a riso n s to o u r w o rk w ill be d e sc rib e d in
b rie f in the section s below .

3.8.1 A ccelerated B u ild in g and R ay T racing o f R estricted B SP Trees - B u d ge et al.

B udge et al. [B C N J08] p resen t a lg o rith m s fo r build in g and ray tracin g w ith R B S P trees. O u r p a p er
h ig h lig h ted the slow' c o n stru c tio n o f R B S P trees. In ad d itio n , in [K M 0 7 ], w'e p resen t o n ly trav ersal
m ethod 3.5.3.2 - that, in th is ch ap te r, has been show n as the slo w est trav ersal m eth o d . B u d g e el
al. attem pt to ad d ress the p ro b le m s o f R B S P trees as p resen ted in [K M 0 7 ],

T he em pirical co m p lex ity has been calculated using the observed construction tim es, graphed in Figure 3.16.
3 .8 F urther R esearch on Structures w ith N on -A xis-A lign ed S p littin g P la n es 75

In [KM07] it was said that an empirical complexity of 0 ( m 16N lo g 2( N ■)) was observed for con­
structing RBSP trees. Budge et al. reduce this to 0 (7 n 3 + (m N lo g ( N ))). The main reason for the
high complexity was that the polyhedron of the node was found by clipping each plane with every
other non parallel plane. This resulted in the observed high complexity. Budge et al. identify that
the SAH is a problem that can be solved with dynamic programming to reduce this complexity.

The construction process in [BCNJ08] is an SAH heuristic with changed intersection and traversal
costs of 500 and 1 respectively. An 0 { N lo g 2( N )) approach, in which the triangles are sorted at
each SAH step, is used. However, instead of using a normal 0 ( N l o g ( N )) sorting method at each
construction step, a radix sort is used to achieve an 0 ( N l c^^ { ' T)) construction complexity (for
kd-trees). Each k-DOP (the term used for a node’s polyhedron along the lines of Klosowski et
al. [KHM+ 98]) is represented by an edge-soup - made of line segments. Each line segment also
has a pair of indices identifying the two faces to which the edge belongs. This representation is
said to be compact, efficient to maintain during splits and is sufficient to compute the surface area
of a k-DOP. No other information is stored for the k-DOPs.

Dynamic programming is a method used to solve problems with overlapping sub-problems. It


works by first dividing the problem into several sub-problems that are again split until the sub­
problems are simple to solve. The solutions from these are then combined to obtain the solution
to the problem. In the surface area heuristic, along an axis, if the split points are sorted, then the
surface areas of the first potential sub-node (of the two potential sub-nodes created due to the split)
at these points are increasing. The surface area at the first point is found. The surface area at the
second point is found by using the first solution and so on to reduce the calculation time. Using
this approach, the expression for the areas is given as below.

A re a ie ftO fS p lit — ^ 2ii^splii ~~ ^ lii^ s p lit t-i) <Sj'


A re a .rightO fSplit — A rc o .t0tai ~ A r e a le ftO fS p lit
A re a .Spin — A"2i^.sp/?7 "P - ^ 1?^split K oi
1-i <' = 1-split ^ 1-i+l

where C 2i, C 1-;, K 2 U K u and K q, are coefficients initialised for each face. The details of calcu­
lating these are provided in [BCNJ08],

Budge et al. state that the traversal method in [KM07], Section 3.5.3.2 is not the best method. A
method similar to the standard slabs method is proposed. Initially, the bounding box rather than
the bounding volume is intersected by the ray. The ray origin and the ray direction reciprocal are
pre-computed and the traversal is continued. SSE is used for accelerating the pre-computations.

If the traversal is performed without pre-computation, a general decrease in rendering times is


noticed. However, they state that in this case, the trends are not clear. As stated earlier, we believe
that the rendering times depend on the number of axes used. In case the number of axes are
greater than the depth of the tree, not pre-computing can lead to a cheaper traversal, owing to
fewer ray-plane intersections.

The traversal method is shown to be faster than the presented (unoptimised) times by a factor of
10. When the bounding box is used instead of the bounding volume, fewer triangle intersections
but greater number of node traversals are seen. This is a disadvantage of using boxes in cases when
the viewpoint is outside the model. A closer fitting bounding volume can mean that numerous rays
can be terminated at the root node level itself.
3 .8 F u rth e r R esea rch on S tru ctu res w ith N o n -A x is-A lig n e d S p littin g P la n e s 76

They observed problems with the RBSP tree constructed for scenes like the Sibenik scene, and
attributed the problems to the fact that due to increased empty space culling, the tree is deeper.
However, we believe that the problem lies in the selection of planes to build the tree. The problem
is similar to the problems with the Sponza scene which is dominated by axis-aligned triangles. A
heuristic in which the directions are customised to the scene for which the tree is being built is
suggested as one of the solutions. An approach used by Coming and Staadt [CS08] to achieve this
customisation is suggested. This is an important benefit of RBSP trees - that the directions can be
arbitrarily chosen. However, to take maximum advantage of RBSP trees, the directions have to be
carefully selected. When that is done, RBSP trees may significantly outperform kd-trees.

Comparison of Traversal Methods The traversal methods described by Budge et al. are similar
to methods 3.5.1 and 3.5.2 detailed in this chapter. The main difference is the use of the bounding
box by Budge et al. rather than the bounding volume. In addition, since the ray’s intersection
parameters are pre-computed, a linear interpolation approach can be used to compute the t sput
parameters. These are believed to increase efficiency, especially in cases where the viewpoint is
outside the model. Also, while they state that their heuristic is not faster than the kd-tree, our
approach - detailed in Section 3.5.2 - for tracing single rays is faster for RBSP trees than for
kd-trees.

3 .8 .2 R ay T racing w ith the B S P Tree - Ize et al.

Ize et al. [IWP08] take a more generalised approach. So far, it was believed that using arbitrary
non-axis-aligned planes does not lead to a good structure for ray tracing. While a restricted set
was used to make the problem more solvable, Ize et al. use a general BSP tree using popular ray
tracing concepts - like the SAH - shown to be applicable for non-axis-aligned planes by the study
on RBSP trees.

One of the main problems of using arbitrarily aligned planes is the construction of an effective
structure for ray tracing. Ize et al. attempt this by reducing the number of split planes used at
each split step. The SAH, with a polytope area calculation similar to ours - by clipping the node’s
polytope with the splitting plane and then computing the area by summing the areas of the faces,
is used to select the best split position from among a set of planes. Ize et al. limit the number of
planes at each triangle by using the triangle’s properties. For each triangle the split planes used
are

• The plane that defines the triangle face.

• The three planes that lie on the edges of the triangle and orthogonal to the triangle face.

• The same six axis-aligned planes used by the SAH for the kd-tree.

Restricting the planes as above limits the number of planes to be tested to O (N ). At each split
plane, since triangles are not sorted along the axis, a helper structure - a bounding sphere hierarchy
- is used to count the triangles at either side of the potential plane leading to a lo g (N ) cost for the
search and an 0 ( N lo g 2( N )) construction heuristic.

Another adjustment is made during the construction process to the SAH intersection cost. The cost
for intersecting a ray with a plane aligned along one of the coordinate axes is less than intersecting
with an arbitrarily aligned plane. Due to this, two costs, C\>sp and Ckd-tree ~ indicating the cost
of intersecting with an arbitrary plane and an axis-aligned plane respectively, are used during the
3 .8 F u rth er R esearch on Structures w ith N on -A xis-A lign ed S p littin g P la n es 77

SAH process. For arbitrary planes, it is deduced that the cost linearly varies with the number of
triangles in the node. Thus, the values of C\jSp is given as

Cbsp ~ CsCii^N 1) + Ckd—tree

where a - is a user tunable parameter (a value of 0.1 was used in the paper for a).

If, after investigating all split planes, it is determined that the cost of splitting is not better than
making this a leaf node, then a fixed cost is used for Cbsp and the SAH is process is run again.

The two modifications to the SAH cost computation - using two different costs and considering
the cost of intersecting the non-axis-aligned planes as being linearly dependent on the number of
triangles in the node - are interesting. The application of these ideas to the RBSP tree construction
process is worth investigating. It may ensure that good trees are created for problematic scenes.

The traversal is a modified kd-tree algorithm. A bounding box is used to ensure that the ray has a
good probability of hitting the scene. Each split plane is then intersected using the standard ray -
plane intersection test - involving two dot products and a floating point division. However, due
to limited floating point precision, an epsilon value is used and rays with intersection distances
within an epsilon value of the split plane traverse both nodes. The BSP node traversal is roughly
1.75x slower than the kd-tree node traversal. Hence, a standard kd-tree plane intersection test is
used when the planes are axis-aligned. Another idea used is that in the nodes, the splitting plane
may actually be the triangle itself. In these cases, the intersection need not be recalculated.

A disadvantage of using arbitrary planes instead of a restricted set is that each node requires 20
bytes, instead of 8 bytes for a kd-tree or an RBSP tree node. This leads to a significant increase in
memory usage.

Ize et al. observe good results with the BSP tree. When single ray tracing with secondary rays
are used, the structure outperforms the kd-tree. The single ray tracer is faster on the BSP tree than
on the kd-tree. With SIMD, ray tracing on the BSP tree is as fast or faster than on the kd-tree.
Due to increased traversal costs, a pure BSP tree is not as effective if axis-aligned planes with
faster intersection methods are not used. In contrast to the RBSP tree where there is a decrease
in both traversals and intersections, the BSP tree only brings about a decrease in the number of
intersections.

When the viewpoint is outside, the RBSP tree would be able to terminate more inactive rays due
to a better fitting bounding volume. This is supported by Ize et al. when an RBSP built for the
top level is suggested. The number of planes are limited and investigating more split points -
in the order of 0 ( N 3) - is believed to make the BSP tree more effective. However, using such
a large number of split points is not viable. The memory requirement of general BSP trees is
also quite high. When single rays are traced, they state that their structure is the fastest structure.
This corroborates our belief that the use of more than three non-axis-aligned axes leads to a better
structure for ray tracing than kd-trees, especially for single ray tracing. A point to be noted though
is that Ize el al. have not tested the performance of their structure against RBSP trees, which could
be faster due to faster traversal methods and cheaper memory requirements.
3 .9 S w n m a iy 78

3.9 Summary

RBSP trees are an attempt to combine the best features of kd-trees and BSP trees as applicable to
ray tracing. It can also be thought of as an experiment to determine the usefulness of arbitrarily
aligned splitting planes. It has been assumed that the general form of BSP trees would not be a
very effective structure for ray tracing. However, it is hard to ignore the promise o f BSP trees being
a structure that very closely wraps the scene being rendered to reduce the number of intersections
and traversals. RBSP trees are a subset of BSP trees that can be built to be very similar to BSP
trees. Thus, RBSP trees, in addition to being an excellent structure themselves, also enable the
study of BSP trees for ray tracing.

RBSP trees are compact to represent, have relatively simple construction methods, and have ef­
ficient traversal methods. The efficiency advantage of having a greater choice in number and
alignment of splitting axes is realised for scenes that pre-dominantly consist of non-axis-aligned
triangles. The results show that for these scenes, as predicted, RBSP trees do reduce the number of
intersections and traversals significantly, compared to kd-trees. This advantage is translated to a
performance advantage when the right traversal method is used so that RBSP trees can be a faster
method to ray trace a scene. The best traversal method is an SSE version similar to the kd-tree
traversal method.

The introduction of RBSP trees has sparked interest in the use of space subdivision structures with
non-axis-aligned splitting planes. Budge et al. [BCNJ08] attempt to address the problem of slow
construction times through the use of dynamic programming to obtain a faster method to calculate
the surface areas. Ize et al. [IWP08] apply some of the results from the RBSP trees to the more
general BSP trees with arbitrary planes. Both of them show very good results confirming the belief
that the use of non-axis-aligned planes is worthy of further study.

The study also reveals that for scenes that are dominated by axis-aligned triangles, the construction
heuristic results in trees that are not as good as kd-trees. On the other hand, for scenes like the
Sphere scene, where the splitting axes are essentially drawn from the scene, the heuristic constructs
trees that are significantly better than kd-trees. Another disadvantage is that the construction times
are significantly higher than that of a kd-tree and depend highly on the number of splitting planes
used. Both these problems can be solved by future work enabling intelligent selection of available
planes for the SAH upon investigation of the scene’s component triangles’ alignment. Using a
customised set of directions would allow reducing the number of splitting axes to a manageable
number. The optimum number of planes is found to be between 8-12.

As a method for visibility determination, the use of RBSP trees with single rays is probably not
practical. However, when large number of incoherent rays are to be traced, as in global illumina­
tion methods, the RBSP trees are thought to be especially useful. This is also an area of research
worth pursuing as part of the future work.
Chapter 4

Coherent Rendering

C o n ten ts
4.1 M o tiv a tio n ................................................................................. 78
4.2 Coherent Rendering - Concept and High Level Algorithm 83
4.3 Tree T r a v e r sa l.......................................................................... 85
4.4 Occlusion Detection - Hierarchical Occlusion Maps . . . 89
4.5 Leaf Node Processing - Recursive R a sterisa tio n ............. 93
4.6 R e s u lt s ....................................................................................... 97
4.7 Discussion ................................................................................. 102
4.8 S u m m ary.................................................................................... 104

This chapter discusses a visibility / rendering algorithm that attempts to demonstrate that trian­
gle mesh based scenes can be rendered at a lower complexity per pixel. The volume rendering
algorithm that shows a constant complexity per pixel motivates the investigation. It is research
undertaken as a part of the EPSRC grant that funded the investigation of lower complexity algo­
rithms.

This chapter expands and describes in detail the methods and results reported in the technical re­
port - Benjamin Mora, Ravi Kanunaje and Mark W. Jones, “On the Lower Complexity' o f Coherent
Renderings,” Swansea University, Technical Report, 2008. [MKJ08].

The main algorithm, due to its similarity to his earlier work [MJC02] [ME05], was implemented
by Dr. Benjamin Mora. However, a lot of the background work (e.g., the kd-tree construction for
the algorithm, ray tracing implementations against which Coherent Rendering is compared with,
etc.) as well as refinement to ensure that its application to triangle meshes is accurate has been
undertaken by me. The technical report, due to the fact that it was a submission to a conference
was limited in length. Hence, it is mainly concerned with the analysis of the complexity results
and does not delve into the details of the workings of the algorithm. This chapter details the
algorithm in full detail, using which the algorithm can be reproduced. Producing the benchmarks
for the complexity results has been another of my contributions. Finally, I am responsible for the
comparison to ray tracing and packet ray tracing in Section 4.6.2 and an analysis of the absolute
performance using profilers, as seen in Section 4.7.2. The analysis has led to the Row Tracing
algorithm, detailed in Chapter 5 which has resulted in a highly accelerated rendering method.
Thus, though Dr. Mora was the first author of the report, I have been responsible for a significant
part o f the project, leading it to be an integral part of my PhD, and consequently the thesis.

79
4.1 M o tiva tio n 80

4.1 Motivation

RBSP trees proved to be an effective method to reduce the number of node traversals and triangle
intersections necessary for ray tracing by utilising the spatial coherence of objects in the scene.
Another method of achieving fewer traversals and intersections per ray is to use image coherence
- i.e., tracing the rays in groups and amortising the number of intersections and traversals over the
group. The method, called packet ray tracing / packet tracing, has been very effective to accelerate
ray tracing. As described in Chapter 2, packet tracing traverses groups of rays through the accel­
eration structure and intersects the component rays with the primitives. Several different forms of
packets like frustum based, pyramidal and rectangular packets have been used to accelerate ray
tracing.

The best performance for ray tracing has been observed by the MLRTA [RSH05] - a form of
frustum based packet ray tracing that found entry points deep inside the tree to begin tracing the
individual rays. They also detail a packet traversal method based on interval arithmetic that is
relatively easy to implement. This method has been used to intersect packets through a variety of
structures and primitives [Ben06] [BWS06] [WBS07].

As the packet tracer traverses through the tree, several rays in the packet may not intersect the
node. Even in these cases, the nodes need to be traversed since a few rays intersect the node.
This reduces the coherence and the amortisation1 provided by traversing the tree with packets.
Consequently, the packet cannot consist of a large number of rays. At the same time, larger sized
packets lead to maximum amortisation of traversal costs, particularly when a large majority of
the rays traverse the nodes. Thus, a packet size that is optimal reduces the number of inactive
rays in the packet. In our implementations, a packet size of 8 x 8 was found to provide the best
performance2.

A packet size of 8 x 8 implies that a 1024 x 1024 image needs to traverse 16384 packets which
is still quite a large number of packets. Obviously, the number of packets increases as the image
size increases. Thus, an alternate method that uses larger packets could be even more effective.

The largest packet that can be used is one with rays through all the pixels of the image. The
method introduced in this chapter, Coherent Rendering, is an algorithm that considers all the
pixels of the image. The concept is adapted from object order ray casting [MJC02] - a high
performance volume rendering method that produces excellent images. Adapting this to triangle
meshes, providing a good implementation and subsequently studying the properties of such an
algorithm is the motivation for the rendering method called Coherent Rendering. The use of
the entire image as the packet is expected to produce an algorithm that potentially reduces the
complexity of rendering. The reasons for this belief will be described in the following section.

4.1.1 A verage C om p lexity

The average complexity of rendering is important as scene sizes increase. Z-buffer based rasteri­
sation has an average complexity of (N x s ) where N is the number of triangles in the scene and

a m o r tis a tio n in the context o f packet ray tracing refers to the cost o f node traversals b ein g shared by the num ber
o f rays in the packet. For m ost scen es, it can be observed that several neighbouring rays traverse the sam e path dow n a
tree. Thus, traversing the sam e nodes for each o f the rays is w asteful. W hen packets are traversed, the node is traversed
just once and the cost is divided, or am ortised, am ongst the com ponent rays, as long as they intersect the ray.
In our im p lem entations, packet sizes o f 4 x 4, 8 x 8. 16 x 16 and 32 x 3 2 w ere used. O f these, the perform ance
was the best w hen packet sizes w ere 8 x 8 , fo llo w ed c lo se ly by packet sizes o f 16 x 16.
4.1 M o tiva tio n 81

s is the av erag e p ro je ctio n size o f the trian g les. A b e tte r lo g a rith m ic c o m p le x ity p e r pix el u sing
g rap h ics h ard w a re has h o w ev er b een show n by W and et al. [W F P f 0 1]. O n av era g e, ray tra c in g -
thro u g h the use o f h ie ra rc h ic a l d ata stru c tu res like the k d -tree o r the B V H - h as a c o m p le x ity o f
(){luy{N)) p e r pix el [H B 00] [H ur05] [W F M S 0 5 ] [W S S 05] [H H S 0 6 ] [Y L M 0 6 ]. A sim p le r d ata
stru ctu re like the grid is show n to have a relativ ely slo w er c o m p lex ity o f 0 ( N 1/A) [C W 8 8 ], It has
also been show n that w o rst case c o m p lex ity o f ray tracin g is 0(log(N)) p er p ix el at 0 ( N 1 + e)
m em o ry and p re p ro c e ssin g cost (w h ere e > 0) |B H O + 9 4 j. H o w ev er, this m e m o ry an d p re p ro ­
cessing c o st is p ro h ib itiv e and av erag e case c o m p lex ity is lent m o re im p o rta n ce .

F inding the clo sest trian g le at a pixel (i.e., the clo sest trian g le in te rse c tin g the ray c o rre sp o n d in g
to a pixel) can be c o n sid e re d as a special case o f a search in g a lg o rith m . T h is im p lie s th at resu lts
o b tain ed for search a lg o rith m s sh o u ld h old fo r ray tracin g . H en ce, it m ay be d ifficu lt to im p ro v e
upon the lo g arith m ic co m p le x ity p e r pixel w hen search in g fo r a sin g le elem en t in th e tree. B e n t­
ley |B en 7 9 ] show s that if in stead o f search in g fo r one ele m e n t in the tree, k n e ig h b o u rin g e le m e n ts
are to be fo u n d , it can be ach ie v ed in 0(log(N) + k ) tim e, w h e re TV is the n u m b e r o f e le m e n ts in
the tree. T h u s, fo r sufficien tly large values o f k , the search c o m p le x ity w o u ld be p ro p o rtio n a l to k
instead o f log(N).

Nodes •
trav ersed bv 1 Nodes tiaverscd
a single ray j b> mulliPle ra> s
i1] Traversed N odes

2nd quadtree level

I'1quadtree level

(a) Dataset (b) (c)

F igure 4.1: N aive recu rsiv e ray tra c in g ex am p le on a 4 x 1 v o x e liz e d grid. T h e n u m b er o f


distin ctly trav ersed n o d es is a g eo m etric series (1 + 2 + .... + N / 2 + TV = 27V - 1).

Packet ray tracin g is one m eth o d th at u tilises th is resu lt to sig n ifica n tly a c c e le ra te ray tracin g .
Packet ray tracin g attem p ts to find the c lo se st in te rse c tio n fo r a sm all set o f p ix els. T h e fa c t th at
the n u m b er o f rays in the p ack et ca n n o t be a rb itra rily large (due to rea so n s o f c o h e re n c e m en tio n e d
before) im p lies that k is sm all en o u g h to not m ak e a sign ifican t d iffe re n c e to the co m p lex ity .
R en d erin g tim es fo r p a ck et ray tracin g , h en ce, still a p p ea r to be lo g a rith m ic [W F M S 0 5 ] [H u r0 5 ].
H ow ever, if k can be m ade sig n ifican tly larg er - by in c rea sin g p a c k et sizes to v ery large sizes
(entire im ag e), then it w o u ld o v ersh ad o w the term lo g ( N ) an d as k a p p ro a c h e s v ery larg e n u m b ers,
the co m p lex ity w ou ld d ep en d en tire ly on k p ro v id in g a co n sta n t c o m p le x ity p e r p ix el. T h is resu lt
has been used by Jen se n [JenO l] to search for p h o to n s in a p h o to n m ap . U tilisin g th is re su lt to
investigate if a b e tte r co m p le x ity fo r re n d erin g co u ld be ach ie v ed is th e b asis fo r the C o h eren t
R en d erin g .

T he ap p licatio n o f B e n tle y 's search resu lt to 3D ren d e rin g is o b v io u sly not trivial at this p o in t, even
4.1 M o tiva tio n 82

SAH Median
— — Nodes SAH . . . . N odes Median
Intersections Nodes

60 -

40 -

20-

d e p th
10 20 30 40 50

(a) Sp o n za scene - Left side rendered w ith C oherent (b) A verage number o f intersections and nodes traversed
R en d erin g and right side show ing the w irefram e by a ray tracer for the Spon za scen e w ith space m edian
and SA H kd-trees. A verage node traversals appears to be
logarithm ic w hereas the number o f triangle intersections
appears to converge to a constant.

F ig u re 4.2: R ay tracin g the S p o n za scen e

if it is h ig h ly p ro b ab le th at the in tersec tio n p o in ts o f the d ifferen t ra y s c o m p o sin g the im age are


co h e re n tly lo cated in the search space. F ig u re 4.1 illu strates the a p p lic a tio n to the d o m ain b etter
by sh o w casin g a 4 x 4 v o x elised 2D w orld c o n ta in in g a ID flat su rface that w ill be ray traced. In
the figure, rays and su rfaces are a x is-alig n ed and the g e n e ralisa tio n o f the sam e case to a N x N
v o x elisatio n w ill be d isc u sse d ( N bein g a p o w er o f 2). To sp e e d -u p re n d e rin g , tw o quad tree
levels (lo g 2( N ) levels in the g en eral ca se) are c o n stru cte d from th e v o x elised scene, in d icatin g
no n -em p ty spaces. F igure 4 .1 (a ) in d icates the no d es (n u m b ered by trav ersal o rd er) trav ersed by
a sim ple to p -d o w n recu rsiv e ray tra c e r fo r a sin g le ray. In this e x am p le, th ree node trav ersals
(lo g 2( N ) + 1) are n eed ed b efo re h ittin g the surface, w h ich h ap p en s im m e d ia te ly after re ach in g
the first leaf node. If the sam e sim p le recu rsiv e ray tra c er is ca lle d fo r fo u r (i.e, r ) d ifferen t but
sim ilarly alig n ed rays as d e p icted in F ig u re 4 .1 (b), a recu rsiv e ra y tra c e r w o u ld trav erse 4 x 3
n o d es (i.e, r ( lo g 2 ( N ) + 1)). H ow ever, it is n o tic e ab le th at som e o f th em are trav ersed several
tim es (nodes 1. 2 and 5 in F ig u re 4 .1 (b )). H en ce, if o n ly d istin ct n o d e s trav ersed by the alg o rith m
are c o u n ted , o n ly seven n o d es (2 N — 1 = 1 + 2 + 4 -1- ... + N / 2 + N ) are trav ersed by fo u r (i.e,
r ) rays.

T h e re fo re , if an a lg o rith m m an ag e s to trav erse ev ery n o d e ju s t o n ce, the c o m p le x ity o f tree trav er­
sal w ould be re d u ce d fro m lo g arith m ic p e r ray to a c o n stan t p e r ray. T h is is the p rin cip le b ehind
C oherent R en d erin g .

T he assu m p tio n is that the lo g a rith m ic c o m p lex ity o f a re g u la r rec u rsiv e ra y tra c e r is due to the
tree traversal itself, and not due to the in te rse c tio n tests. T h e av erag e n u m b e r o f in tersec tio n s
p e r ray by a reg u lar recu rsiv e ray tracer ten d s to a co n stan t, p ro v id ed th e tree is w ell co n stru cted .
R ay tracin g the S p o n za scen e 4 .2 (a) w ith an SA H k d -tree, show s th a t the av erag e n u m b e r o f
in tersec tio n s p e r pixel ten d s to 2. At the sam e tim e, the n u m b er o f trav ersal step s increases w ith
m a x im u m depth . T h is stro n g ly su p p o rts the b e lie f that the per-p ix el lo g a rith m ic co m p le x ity o f a
re g u la r ray trac er is m ain ly due to the tree traversal, and the in tersec tio n co m p le x ity is 0 ( 1 ) and
in d ic a te s that effo rts sh o u ld c o n ce n tra te on lo w erin g the n u m b e r o f tree trav ersals.

T h e volum e re n d erin g m eth o d - O b ject O rd er R a y C a stin g (O O R C ) by M o ra et al. [M JC 02] -


4.1 M o tiva tio n 83

demonstrates an 0 (1 ) complexity per pixel. This method and its main results will thus be detailed
in brief before the Coherent Rendering algorithm is detailed.

4.1.2 O b ject O rd er R ay C astin g

Object Order Ray Casting by Mora et al. [MJC02] is an object order method for volume ren­
derings. However, instead of tracing a ray through the volume, cells of the volume / voxels were
projected onto the screen. Since only orthogonal projections were considered, the voxels projected
onto a similar area (hexagon) on the screen. By just displacing the mid-point of a pre-computed
projection, a voxel’s projection is found very efficiently. In addition, a min-max octree was used
to identify and skip transparent regions of the volume. Using the octree also allows the front-to-
back visibility order determination. Combined with Hierarchical Occlusion Maps, the visibility
of a voxel / octree cell is determined. The combination of these structures for rendering is very
efficient. A very similar Maximum Intensity Projection (MIP) algorithm [ME05] has been shown,
both mathematically and empirically, to have an average complexity of 0 ( N 2) for an N 2 image
of an N 3 volume leading to an 0 (1 ) complexity per pixel. It is to be noted that since the algorithm
renders only orthogonal projections, one dimension of the image and the volume are the same ( N)
leading to the aforementioned complexity result.

The important results from the OORC algorithm will be briefly described in order to demonstrate
the 0 (1 ) complexity that motivates us to adapt it to a mesh rendering context.

Four widely used datasets have been resampled from 64'* to 7003 voxels. The datasets are Aneurism
(originally 256'*), UNC head (256 x 256 x 225), Bonsai (2563) and Neghip (643). All renderings
produce a 1024 x 1024 image using orthogonal projection (current code does not allow perspec­
tive projection). For all renderings other than the Bonsai dataset, the zoom value has been fixed
to 1, which means that an axis-aligned projection of a voxel will have a 1 x 1 pixel footprint. The
zoom has been reduced to 0.8 for the Bonsai dataset because the 7003 dataset’s isosurface would
otherwise not fit the screen space.
4 .] M o tiva tio n 84

ms
600 ♦ Aneurism
Rendering times (ms) i
500 ■ Head face
i
400 ▲ Head skull A
M
A
300 X Bonsai i
A
U * ♦
200 X Neghip
■ X ♦


■ * ♦
100 ♦
%
«

1 I ♦
One dimension of resampled volume
7 ---------- T---------- 1--- - — ------- 1----------1----- ------ r~- ---- T ----- T~ T "-- - —1---

64 128 192 256 320 384 448 512 576 640

(a) Isosurface rendering tim es according to one dim en sion o f the volum e size N

R en d e rin g tim e s p e r pixel (ps/pixei) 9 PS


R e n d e rin g ti m e s p e r n o n -b la c k pix el (p s/p ix e l)
8 ♦
7
6
5 ‘

06
4
3
i i II t I t • •
04 2 Aneurism ■ Head face a Head skull
0.2 Aneurism Head face Head skull 1 Bonsai * Neghip
Bonsai Neghip Log Curve O n e d m e n s io n of r e s a m p le d v o lu m e
0
O n e dim ension of re sam p led volume
64 128 192 256 320 384 448 512 576 640 64 128 192 256 320 384 448 512 576 640

(b) Per pixel rendering tim es. Flat curves indicate that (c) Rendering tim e per non black pixel.
the com plexity is proportional to N (and therefore 0 ( 1 )
per p ix el)

F ig u re 4.3: O bject O rder R ay C a stin g ren d erin g tim es

R en d erin g tim es rep resen t the av erag e c o n trib u tio n o f 30 differen t v iew p o in ts. F inally, sig n ifi­
can t effort has been put into o p tim isin g the O O R C alg o rith m , u sin g a p ro filer and S IM D S S E
in stru ctio n s as w ell.

F igure 4 .3 (a) show s the ren d e rin g tim es acco rd in g to the volum e size. In o rd e r to d e m o n stra te
the co m p lex ity better, the p rep ro c essin g tim e is su b tra cte d from th ese re n d e rin g tim es b efo re
d iv id in g it by the n u m b er o f p ix els in the im ag e (F ig u re 4 .3 (b )). T h e p re p ro c e ssin g tim e is the tim e
req u ired to in itialise a few ren d erin g p a ram eters and the 1024 x 1024 im ag e (w h ich is o v ersized
w hen ren d erin g sm all vo lu m es). T hus, o nly the tree trav ersal and the access to the relev an t p ix els
o f the im age are taken into acco u n t, w hich d em o n stra te s the co n v erg en ce slig h tly sooner.

A fter an aly sin g F igure 4 .3 (b ), it ap p ears that the cu rv es are flat en o u g h in the j 196'* - 700 ' J v oxels
4.2 C o h eren t R e n d erin g - C o n cep t a n d H ig h L evel A lg o rith m 85

W ald et al. M o ra et al.


P ro c e sso r D ual O p tero n 1.8 G H z, 1MB A th lo n 64 X 2 2G H z, 5 1 2 K B
C o res used 2 1
B onsai 5.2 fps 8.2 fps
A n eu rism 6.2 fps 20 fps

T able 4.1: C o m p ariso n w ith ray trac in g a p p ro ach

range to v alidate the 0 ( N 2) co m p lex ity , esp e c ia lly w hen c o m p a red to a th eo retica l lo g a rith m ic
curve in F igure 4 .3 (b ). It is th o u g h t th at the low efficien cy o f the alg o rith m fo r m o d e ls sm a lle r
than 1 9 6 1 m ay be due to the o v e re stim a tio n used d u rin g the v isib ility p ro c ess. A m o re a ccu rate
e stim atio n o f v isib ility co u ld w ell lead to an e arlie r co n v erg en ce.

R endering tim es in F ig u re 4 .3 (b ) are d iv id ed by the p e rc en tag e o f n o n -b la c k p ix e ls in e a c h re n d e r­


ing to obtain ren d e rin g tim es p e r non b lack pixels (show n in F ig u re 4 .3 (c)), sin ce so m e v o lu m e s
like the A n eu rism are ren d e re d m uch fa ste r as they have few er affected p ixels. O n e can see th a t
ren d erin g tim es p e r n o n -b lac k pixel are now m u ch closer. O nly th e B o n sa i d ataset h as a slig h tly
higher cost p er n o n -b lack pixel, p ro b ab ly due to the d ifferen t v o x el/p ix el asp ect ra tio used.

Finally, the ren d e rin g tim es fo r the A n e u rism and B o n sa i d atasets are c o m p a red w ith the very
in terestin g ray tracer by W ald et al. (W F M S 0 5 ). T his alg o rith m - sh o w n to be lo g a rith m ic - h as
been o p tim ised u sing S IM D in stru ctio n s as w ell, and u nlike O O R C . is able to p e rfo rm p e rsp e c tiv e
p rojection. T he sam e iso su rface, the sam e p ro c e sso r fam ily, and the sam e im ag e size (5 1 2 “ ) has
been used. W hile it is n e c essary to be very carefu l w h ile su ch c o m p ariso n s are m ad e sin ce m an y
param eters like sh ad in g and v iew p o in ts m ay not be the sam e, this c o m p a riso n w ill give us a g ood
idea o f the o rd e r o f efficien cy o f the O O R C alg o rith m . R esu lts su m m arise d in T able 4.1 sh o w that
the level o f p erfo rm a n c e o b ta in e d w ith just single th read by the O O R C a lg o rith m is m u ch b e tte r
even fo r sm all volu m es. It is likely th at the p erfo rm a n c e ad v an tag e is m ore p ro n o u n c e d as the
volum es get larg er due to the b etter com p lex ity.

The m eth o d is a very efficien t m eth o d to u n d ertak e volum e ren d erin g , sh o w in g sig n ific a n tly h ig h e r
p erfo rm an ce than prev io u s m eth o d s. A n aly z in g the c o m p lex ity o f th is m eth o d rev eals th at it has
an average co m p lex ity o f 0 ( 1 ) p e r pixel. T his led to the b e lie f that if a sim ila r a lg o rith m c o u ld
be im p lem en ted fo r tria n g u la r m esh es, sim ila r p erfo rm an ce an d co m p u tatio n a l c o m p le x ity co u ld
be achieved. H ow ever, th e alg o rith m fo r v olum es w as to be a d a p te d fo r use w ith m esh es. In
addition, p ersp ectiv e p ro je c tio n s - m o re p o p u la r in tria n g u la r m esh re n d erin g s - w as n e c e ss a ry to
be co n sid ered . C oherent R en d e rin g is the ad ap ta tio n o f this o b ject ray ca stin g m e th o d to re n d e r
tria n g u la r m eshes.

4.2 Coherent Rendering - Concept and High Level Algorithm

A s m en tio n ed . C o h eren t R en d e rin g is an ad ap ta tio n o f the O b ject O rd er R a y C a stin g m eth o d . It


is an attem pt to c o n sid e r the en tire v iew ing fru stu m as a packet o f rays. H ow ever, in o rd e r to
achieve this, co n c ep ts from b oth ra ste risa tio n and ray tracing are used. In ad d itio n , to p e rfo rm the
o cclu sio n testin g - ach ie v ed easily lo r sin g le rays th ro u g h early ray te rm in a tio n - the c o n c e p t o f
H ierarchical O cclu sio n M a p s [Z M H H 9 7 ] [Z ha98] is adapted.

T he high level alg o rith m o f C o h eren t R en d erin g first c o n sid ers the ro o t node o f the k d -tre e and
4.2 C oherent R en derin g - C o n cep t a n d H igh L evel A lgorithm 86

determines if it is visible. If it is, then its children are considered in a front-to-back order with
respect to the viewpoint. This is continued until either the node is determined as being not visible
or a until the leaf node is reached. A node can be determined as being not visible if it is either
outside the frustum or if it is occluded by already rendered parts of the scene. Once the traversal
reaches the leaf node, it implies that some of the geometry in the leaf node may be visible. Thus,
each triangle is tested for visibility and rasterised if it is visible. In this manner, the entire scene is
considered as a packet and at the end of the process, each pixel is shaded accordingly.

The high level algorithm can be written as shown below.


CoherentRender()
{
InitialiseConstants() ;
TraverseTreeCoherent(rootNode) ;
Shade Pixels
}

Listing 4.1: High Level Coherent Rendering algorithm.

One of the differences between rasterisation and Coherent Rendering is the use of a ray tracing
structure like the kd-tree to achieve a better complexity for rendering. The front-to-back traversal
of the structure ensures that geometry that is closer is processed prior to geometry that is further
away from the viewpoint. Using this property, occlusion is detected for nodes (including the
geometry in them) by using Hierarchical Occlusion Maps that are adapted to the algorithm.

The fact that in this method each node is traversed just once at most, is important to attain an
improved rendering complexity. In addition, the triangles are not shared by too many nodes as
long as a good tree can be built. Thus, except in cases where a good tree cannot be built, the
number of triangles to rasterise is minimal. Minimising the number of these two operations -
responsible for almost all of the computational time in a rendering system - is expected to lead to
a very efficient method.

While the main traversal algorithm is well-known, the primary goal is mainly to ensure correct
implementation of the entire pipeline and to observe the complexity improvement. Many papers
/ software are already using similar algorithms for rendering [Gre96] [ZMHH97]. For example,
Bittner et al. [BWPP04] described a similar traversal using occlusion queries to test for occluded
cells. However, the visibility function must actually perform a constant (0 (1 )) number of opera­
tions to ensure that the global average 0 (1 ) complexity holds. One way to perform the test would
be to query all rays intersecting the node and to check whether they are already opaque. However,
even this would not be an 0 ( 1 ) algorithm. The use of HOMs, that allow occlusion determina­
tion in constant time per pixel, is the main component in the attempt to obtain an 0 (1 ) rendering
method.

In the high level algorithm, there is an initialisation process whereby several important variables
and constants necessary for the rendering process are initialised. Subsequently, the tree is traversed
to determine the triangle that contributes to each pixel of the image. Finally, using the triangle and
the position of the viewpoint and the lights, the pixel is shaded accordingly.

Of the three steps, the tree traversal including the leaf node processing constitutes the main com ­
putations in the algorithm. These methods including a few important auxiliary methods will be
described in detail in the following sections.
4.3 Tree Traversal 87

4.3 Tree Traversal

O nce the several v ariab les and c o n stan ts fo r th is v iew p o in t have b ee n co m p u te d , th e tree is to
be trav ersed to re n d e r the im age. T h e trav ersal co n sid ers th e no d es o f the k d -tree stru c tu re in a
fro n t-to -b ack v isib ility o rd e r startin g fro m the root node. It can be w ritten u sin g the p se u d o co d e
below .

TreeTraverseCoherent(node)
(
Project node onto image
i f ( node not within frustum or
isOccluded(node) or
node is fully between 4 pixels)
return;
i f (isLeaf(node) )
(

FrocessLeafNode(node) ;
return;
)
side - viewpoint[axis] > splitpos;
if (side > 0)
{
frontNode = node.leftNode;
back.Node = node.rightNode;
}
else
{
frontNode = node.rightNode;
backNode = node.leftNode;
}
TreeTraverseCoherent(frontNode);
TreeTraverseCoherent(backNode);

L isting 4.2: T ree traversal using the en tire fru stum . If th e fru stu m in tersec ts the root node, the
tree is traversed in a fro n t-to -b a ck o rd e r until the le a f n o d es w h ere th e tria n g le s c o n trib u tin g to
the im age are d eterm in ed .

C oherent R e n d e rin g , at the h ighest level, as show n by the p seu d o c o d e above, is a re cu rsiv e a l­
gorithm that c o n sid ers each node in a fro n t-to -b a c k o rd er w ith resp ect to the v iew p o in t. At each
node, the algorith m d ete rm in e s the n o d e 's visibility. If the node is v isib le , its c h ild n o d e s are
con sid ered . T he p ro cess c o n tin u es until a leaf node is re a c h e d o r until the n ode is d e te rm in e d as
being occluded .

T he m ain steps to d ete rm in e if a n ode is visib le o r not are as follow s:

• P ro ject node o n to the im age.

• D eterm in e fru stu m visibility.

• Test node fo r O cclu sio n .

• Test to see if node is to o sm all.

E xcept the o cclu sio n test - that w ill be d escrib ed in m uch g re a te r detail in S ectio n 4 .4 - th e o th e r
o p eratio n s w ill be d e tailed in the fo llow ing su b sectio n s.
4.3 Tree Traversal 88

4.3.1 N od e P rojection

P ro jectin g a node o n to the im ag e plane im p lie s d e term in in g the b o u n d in g rec ta n g le o f the n o d e.


T his is achieved by tra n sfo rm in g the eig h t v ertices o f th e n ode to im ag e sp ace. T he v e rtic e s are
m u ltip lied by the glo b al tra n sfo rm atio n m atrix to first tran sfo rm th e m (w ith p ix els in th e ra n g e o f
-1 .0 to 1.0) and then to get the actu al p ix els o c cu p ie d by them by m u ltip ly in g an d a d d in g th e h a lf
im age w idth to the X c o m p o n e n ts and h a lf im age h eig h t to the Y co m p o n e n ts. O n ce the p ix els
o f the n o d e ’s v ertices are fo u n d , the m ax im u m and m in im u m A' and Y c o o rd in a te s are se le c te d to
get the b o u n d in g rectan g le o f the node. F ig u re 4 .4 show s the node p ro je cte d o n to the im ag e sp ace.

im age plane

view point*
n o d e projection

Figure 4.4: N ode p ro jectio n . C o h eren t R en d e rin g p ro je c ts all the eig h t v ertices o f the n o d e o n to
the im ag e plan e to o b tain the node p ro jec tio n for the root nodes.

T he co m p u tatio n s to co n v ert a v ertex fro m m odel space to im ag e sp ace can be giv en as:

p = pi * M
Px = (P x / P w ) *h a l f W i d t h + h a lf W id th
Py — ( P y / P w ) *h a l f H e i g h t + h a l f H e i g h t

Pz = (.P z / P w ) (4 .1 )

w here p i - the vertex to be p ro jected


M - the global tra n sfo rm a tio n m atrix
p - the vertex after p ro jectio n
Px ? Py j Pz ~ AT, Y , and Z c o o rd in a te s o f the point on the im age
p w - h o m o g en eo u s co o rd in a te
h a l f W i d t h , h a l f H e i g h t - H a lf o f the im a g e ’s w idth an d h eig h t resp ectiv ely

T he above term s p ro v id e the c a lc u la tio n s n ecessary to co n v ert one p o in t / vertex fro m m o d el sp ace
to im age space. H en ce, to find the p ro jectio n o f the n o d e on the im ag e space, the eig h t v ertices
arc tran sfo rm ed u sing the above c alc u latio n s.

The above term s show that p ro je ctin g a sin g le poin t req u ires several ex p en siv e c a lc u la tio n s - one
m atrix m u ltip lic atio n , one d iv isio n and several ad d itio n s and m u ltip lic a tio n s. T h u s, the en tire
4.3 Tree Traversal 89

pro cess o f p ro je c tin g a n o d e o n to the im age is a very ex p en siv e o p e ra tio n .

In o rd er to redu ce a few c a lc u la tio n s, the o b serv atio n that w h en a k d -tree n o d e is split, o n ly fo u r


new v ertices are created , is used. T h u s, w hen a node is split into tw o , the eig h t v ertices are d iv id ed
into tw o sets o f fo u r v e rtic e s - o n e set fo r ea ch child node. T h e o th e r fo u r v ertic es fo r the tw o
child nod es are sh ared and are defined by the fo u r v ertices fo rm ed by in te rse c tin g the split p lan e
w ith the node. F ig u re 4.5 show s this. T h u s, in stead o f p ro je c tin g all eig h t v e rtic e s o f a n o d e, fo u r
vertices o f the split plan e are p ro jected and the list o f p ro jected v ertic es is m ain ta in e d th ro u g h the
traversal process.

im a g e plane

v iew p oin t*
n ode projection node

Figure 4.5: N ode p ro je ctio n - p ro je c tin g only the split p lan e. F o r n o n -ro o t n o d es. C o h eren t
R en d e rin g p ro je c ts o n ly the split plane to red u ce c o m p u ta tio n s.

O nce the node h a s b een p ro je cte d o n to the screen , it can be su b je c te d to the v isib ility tests.

4.3.2 F ru stu m V isib ility and N ode Size Test

A n u n o cclu d e d n ode is d e term in e d as n o t v isib le if it lies o u tsid e the fru stu m o r if its p ro je c tio n
falls b etw een fo u r n e ig h b o u rin g p ix els (2 vertical and 2 h o riz o n tal pixels). T h e se tw o tests are
p erfo rm ed fo r every node at each o f the traversal steps.

4.3.2.1 F ru stu m Visibility

W ith resp ect to the fru stu m , a n ode can e ith e r be partly in sid e th e fru stu m , c o m p le tely in sid e the
frustum o r co m p le tely o u tsid e the fru stu m . It is n ecessary to d e term in e the case a n ode b e lo n g s to
in ord er to d eterm in e the v isib ility o f the node and its child no d es in c o n sid e ra tio n .

Node partially inside the fru stu m - To d eterm in e if a n o d e is p artly w ith in the fru stu m , it is
tested for in tersec tio n w ith the p lan es fo rm in g the frustum . S in ce the im ag e is re c tan g u la r, the
frustum is fo rm ed by fo u r p lan es. T h e v iew p o in t form s the c o m m o n p o in t a m o n g all the p lan es.
T he o th er tw o p lan es are d e term in e d by the tw o en d po in ts o f the ed g e s o f the im ag e. T h u s, a n ode
is partly inside the fru stu m if the n ode in tersec ts o n e o r m ore p lan es o f the fru stu m an d a p art o f it
is inside the fru stu m .
4.3 Tree Traversal 90

In the im p lem en ta tio n , a c lip p in g a lg o rith m is used to d ete rm in e if a n ode is p a rtly w ith in th e fru s­
tum . T he node is c lip p ed w ith all the p lan es o f the fru stu m . I f the c lip p in g a lg o rith m d e te rm in e s
that the p o ly h ed ro n fo rm e d is not an em p ty p o ly h ed ro n - d etec ted w hen the n u m b e r o f v e rtice s it
co m p rises o f is not zero - th en a p a rt o f th e n ode is w ith in the fru stu m . W hen a n o d e is d e te rm in e d
as being partly inside the fru stu m , its ch ild nodes have a p o ssib ility that they are fu lly o u tsid e /
inside the node. H en ce, c h ild n o d es o f p artly inside n o d es need to be tested fo r fru stu m v isib ility
at each recursive step until they are e ith e r fully o u tsid e the fru stu m o r fully in sid e th e fru stu m .
T his is show n in case 1 in F ig u re 4 .6

node

frustum
viewpoint

Case 1 - Node partially inside frustum

node

frustum
viewpoint

Case 2 - Node fully outside frustum

frustum
viewpoint

Case 3 - Node fully inside frustum

Figure 4.6: F ru stu m v isib ility o f a node. N ode can e ith e r be p a rtially inside th e fru stu m , fu lly
outsid e the fru stu m o r fully inside the fru stu m , as show n in the d iag ram .

Node fully inside or outside the frustum - W hen a node is d e term in e d as not b ein g p a rtly
inside the frustu m , it im p lie s that the n ode is e ith e r fu lly inside o r fu lly o u tsid e th e fru stu m . S in ce
the state o f all the v ertices o f the node w ith respect to the fru stu m is the sam e (e ith e r in sid e o r
outside), it is sufficient to d ete rm in e if one o f the v ertices is in sid e the fru stu m o r not. If th e sig n ed
d istan ces o f the point to all the fo u r fru stu m p lan es are p o sitiv e, it im p lie s that the n o d e is fu lly
inside the frustu m . O th erw ise, it im p lies that the v ertex and co n se q u e n tly the n o d e is fu lly o u tsid e
the frustum . C ases 2 and 3 in F ig u re 4 .6 show these n o des that are fu lly in sid e and fu lly o u tsid e
the frustum .

W hen a node is fully o u tsid e , the node can be skipped. W h en a n o d e is fully insid e, all c h ild
nodes are also fully inside. H en ce, fru stu m v isib ility tests are not n e cessary fo r ch ild n o d es o f
4 .3 Tree T raversal 91

fully inside nodes.

4.3.2.2 Node Size Test

In the C oherent R en d erin g alg o rith m , the p ix els are co n sid ered as p o in ts ra th e r th an as sq u a res.
T h u s, w hen a node b eco m e s too sm all su ch that it lies b e tw e e n fo u r p ix els (tw o ad ja c e n t v ertical
p ix els and tw o ad jac en t h o rizo n ta l p ix els), the o b jects in this n o d e (o r any ch ild n o d es / su b tree s
it m ay have) can n o t c o n trib u te to the im ag e. T h is o ccu rs w hen th e trian g le d e n sity is h ig h (a large
n u m b e r o f sm all trian g les). T h ese n o d es do not need to be p ro ce sse d .

T h is c o n d itio n is easily tested . O nce the node is p ro jec ted o n to the im a g e p lan e, the v e rtic e s at the
e x trem ities along the X a n d Y axes o f the im ag e are d eterm in e d . T h ese v alu es - the m in im u m
and m ax im u m values a lo n g a p a rtic u la r axis are u sed to d e te rm in e w h e th e r th e n o d e is to o sm all.
W h en the in teg er part o f the m in im u m and m ax im u m v alu es are the sam e a lo n g e ith e r axes, it is
d eterm in ed that the n ode c a n n o t co n trib u te to the im ag e. F ig u re 4 .7 sh o w s a few ex a m p le s o f how
this m ay happen. T he p se u d o c o d e belo w d esc rib es the p ro cess o f d e te c tin g if a n o d e is to o sm all.

Figure 4.7: E x am p les o f n o d es o c cu rrin g b etw een fo u r p ixels. S in ce C o h eren t R e n d e rin g


co n sid ers the p ix els as p o in ts, g eo m e try o c c u rin g b e tw e en p ix els are ig n o red .

pMinX =min(X values of eight vertices)


pMaxX =max(X values of eight vertices)

pMinY =min(Y values of eight vertices)


pMaxY =max(Y values of eight vertices)

nodeTooSmall = ((int)pMinX = (int)pMaxX or


(int.) pMinY = = (int)pMaxY)

L istin g 4.3: P seu d o co d e in d ica tin g how very sm all n o d es, that are sm all en o u g h to not c o n trib u te
to the im ag e, are d eterm in ed .

If a node has been d ete rm in e d as b ein g w ith in the fru stu m and b ig e n o u g h , it can c o n sist o f
trian g les that are a p art o f the im age. H ow ever, the n o d es (and trian g le s) m a y be o c c lu d e d by
trian g les that have alread y b een ren d ered . O c clu sio n d ete c tio n to en su re acc u ra te v isib ility is
achieved thro u g h the use o f H iera rc h ic a l O cclu sio n M a p s.
4 .4 O cclusion D etectio n - H iera rch ica l O cclu sion M a p s 92

4.4 Occlusion Detection - Hierarchical Occlusion Maps

Occlusion detection is built into ray tracing when a space subdivision structure is traversed in a
front-to-back manner. However, as packet sizes increase, the number of rays in the packet make it
difficult to skip nodes due to the fact that the active rays may intersect some geometry in the node.

Similarly, with Coherent Rendering - which could be considered as a form of packet tracing with
the entire image as the packet - it is difficult to use this property directly to determine occlusion.
At the same time, it is important that occlusion is detected as early as possible so that nodes higher
up in the tree can be skipped.

Since Coherent Rendering cannot use early ray termination directly, Hierarchical Occlusion Maps
[ZMHH97] [Zha98] are adapted to serve a similar purpose. Hierarchical Occlusion Maps are con­
ceptually simple and determine if regions of the image are occluded. They are adapted for use with
Coherent Rendering. The usage mainly consists of two parts - updating them so that they indicate
the current state of occlusion, and testing the corresponding HOM pixels to determine occlusion.
The concept of HOMs as used in Coherent Rendering, as well as their usage is described in further
sections.

With HOMs, occlusion is ascertained by testing just one pixel value. In addition, the update
process is also proportional to the number of pixels in the image and is independent of the number
of triangles in the scene. This is an important component in the efforts to investigate if an 0 (1 )
complexity rendering method is possible.

4.4.1 C on cep t o f H O M s

As the name suggests, HOMs consists of hierarchically organised pixels. HOM pixels in the lower
levels of the map indicate smaller areas with the lowest level indicating four pixels of the actual
image. Each upper level HOM pixel combines four pixels from the corresponding lower level.
At the highest level, a single HOM pixel represents the entire image. Figure 4.8 shows the HOM
generated for the Armadillo model.

In order to minimise the number of HOM pixels accessed during traversal, the structure is modified
slightly so that a given extent can be tested by examining just one HOM pixel. A region of
influence of a pixel is determined as the four pixels closest to it (shown in Figure 4.9). i.e., when
a pixel’s triangle is determined, the closest four pixels are considered to have changed and hence
the value in these four pixels are incremented. Due to this, a HOM pixel (covering four image
pixels at the lowest level) can be incremented 16 times, with a value of 16 indicating that the
corresponding area in the image is occluded.

The HOM pixels can have values between 0 and 16 - a value of 0 indicating that the pixels
represented have not yet been rendered and a value of 16 indicating that the pixels are fully opaque
and hence do not need to be processed again. A value between 0 and 16 indicates that a few pixels
in the region of influence have been rendered, but not all of them are opaque yet. These values
indicate that processing has to continue for areas that correspond to the pixel.

The HOM must be updated whenever a pixel is rendered so that it always maintains the current
occlusion state for the rendering process. This is performed in the recursive rasterisation function
described in Section 4.5.
4.4 O cclusion D e tectio n - H iera rch ica l O cclu sio n M a p s 93

Final Rendering

Figure 4.8: H O M fo r the A rm a d illo m o d el. T he im ag e w ith the g reen b o rd e r is the actu al im age
o f size 1024 x 1024. T h e im ag e lab eled 0 is the low est level o f th e H O M an d so on. L evel 0 o f
the H O M co n sists o f 256 x 256 pix els. T h e h ig h est level (5) o f the H O M has ju s t o n e pixel.

T he updating o f H O M s - so that it in d ic ate s the c u rren t state o f o c c lu sio n , and testin g fo r o c c lu sio n
using it will be d e sc rib e d in the fo llo w in g sectio n s.

4.4.2 HOM Update

W hen a pixel is ren d ered , it is set to the a p p ro p riate colour. A t th e sam e tim e, th e pix el affected
and three o f its n earest pix els (as show n in F igure 4 .9 ) are c o n sid e re d as b ein g ch an g ed . H en ce,
the low est level H O M pixel c o rre sp o n d in g to th ese fo u r p ixels are in cre m en ted by 1. E a c h H O M
pixel can be influ en ced by 16 lo w er level pix els and th u s can be in c re m e n te d 16 tim es. A value
o f 16 for the pixel value in d ic a te s th a t the reg io n o f the im ag e re p re se n te d by th e pixel is o p aq u e.
T his change is p ro p a g ate d u p w a rd s by in crem e n tin g the fo u r clo sest p ix els until e ith e r the to p m o st
level is 16 or the value o f a pixel is less th an 16.

T he co ncept o f reg io n o f in flu en ce is an o p tim isa tio n to the a lg o rith m th at allo w s o nly o n e H O M
pixel to be tested. A s w ill be d isc u sse d in S ectio n 4 .4 .3 , o nly the H O M pix el c o rre sp o n d in g to the
4.4 O cclu sio n D etectio n - H iera rch ica l O cclu sio n M a p s 94

Region of influence of
the central pixel
Lower level’s pixel
becoming opaque

+1 +1
1T

+1
hi

+1 is added to the 4
Bounding Box of the
closest pixels of the
primitive to be tested
current map
( c f \ < d 2 com pulsory)
(a) (b)

F igure 4.9: R egion O f Influence in H O M . W hen a pixel is d e te rm in ed , the fo u r clo sest pixels
(in clu d in g itself) are d eem e d to be in the reg io n o f in flu en ce. T h ese fo u r p ix els can c a u se an
in crem en t in u p to fo u r u p p e r level H O M s.

m id -p o in t o f p ro je c tio n ’s b o u n d in g box n eed s to be tested. If the c o n c ep t o f reg io n o f influence


w ere not used, fo u r pix els co rre sp o n d in g to the ex tre m ities o f the b o u n d in g box w o u ld need to be
tested.

Since the m axim u m n u m b e r o f u p d a te s to a H O M pixel is 16 and since the n u m b e r o f H O M p ixels


is a third o f that o f the o rig in al im age ( 1 / 4 + 1 / 1 6 + 1 /6 4 + ... = 1 /3 ), the u p d atin g p ro cess
requires at m ost 16p2/ 3 step s, w h ere p 2 is the im ag e size, lead in g to an 0 ( 1) c o m p lex ity p e r pixel
for u pdating the H O M . It is to be noted th at the c o m p le x ity is 0 ( 1 ) w ith re sp ec t to n u m b er o f
triangles and not n u m b er o f p ixels.

4.4.3 Occlusion Testing using HOM s

To p erfo rm the o c clu sio n test, th e ex act H O M pixel co rre sp o n d in g to the pix el ex ten t is tested.
O nce the pixel has been iden tified , the actu a l o cclu sio n d eterm in a tio n is ju s t a sin g le c o m p ariso n
against 16. W hen the H O M p ix e l’s v alue is equal to 16, the space c o rre sp o n d in g to the H O M
pixel and resu ltan tly the pixel ex ten t is fully o cclu d e d , o th erw ise it is not. O c c lu d e d n o d es and
occlu d ed parts o f tria n g les d o not need to be p ro cessed .

T he m ost co m p lex p art o f the test is d ete rm in in g the ex ac t pixel to test. T w o bits o f in fo rm a tio n
arc needed - the level o f the H O M and the bit o f the H O M to ch eck in the level. B oth o f these are
calcu lated using the v ertices o f the n o d e ’s b o u n d in g box.

The b o u n d in g box is d ete rm in e d by u sin g the p ro jec tio n o f the node / tria n g le b e in g tested . T he
m inim um and m ax im u m v alu es o f the X and Y c o o rd in a te s o f the p ro je c tio n giv es the n ecessary
b o unding box.
4 .5 L e a f N ode P ro cessin g - R ecu rsive R a sterisa tio n 95

Finding the level of the HOM The level of the HOM is dependent on the maximum of the
two lengths of the bounding box. The structure of the HOM is such that in the first level, each
pixel indicates two X coordinates and two Y coordinates. At the second level, it indicates four
coordinates each and so on. Hence, the level is given by the log 2 of the greater of the two extents.
The formula used for determining the correct level is simply given by I = Integer(log- 2 {di)),
where d\ is the longest edge (in pixels) of the box and I is the map level. I must also be clamped to
[0Jog2{N ) — 2 ] since the bounding box may either smaller than a pixel or larger than the image
size N . The log 2 of a floating point number can be determined efficiently by using the IEEE
floating point representation’s exponent part that gives the power of two just below the floating
point number being represented. The expressions below provide the implementation to determine
the HOM level to be tested.
dl = max(xExtentLength, yExtentLength)
HOMLevel = Exponent(dl)+1

Listing 4.4: Determining the HOM level. Exponent function uses the IEEE representation of a
floating point number to easily determine the exponent.

Finding the Exact Pixel to Test at a Level Once the level has been determined, the pixel at that
level is necessary. For this, the mid-point of the node projection’s bounding square is used as the
representative pixel. At each level, each coordinate is halved - for eg., if the pixel to be tested is
(100. 200), then at the first level the pixel to be tested would be (50.100). At the next levels, it
would be (25, 50), (12, 25) and so on. It can be noticed that the pixel at the required level is the
original pixel divided by 2HOMLevel. Since integer divisions by powers of 2 are sufficient, it can
be achieved with just a bitwise right shift operation. The X and Y coordinates of the pixel to be
tested at the required level are computed in the implementation as shown below.
HOMPixelX = x >> HOMLevel
HOMPixelY = y >> HOMLevel

Listing 4.5: Determining the HOM pixel to check using bit shift operators.

Once both the level and the pixel in the map are determined, it is simple to detect occlusion. If
the pixel thus found has a value of 16, then the node or triangle part is occluded. Otherwise it is
considered as not occluded and hence processing continues. It may be observed that the occlusion
test involves very few operations.

HOMs are an integral part of the tree traversal process. Occluded nodes are easily detected using
them and these nodes can be skipped. The traversal continues down the tree, skipping occluded
and invisible nodes to finally reach the leaf nodes. At the leaf nodes, the triangles in them may be
a part of the final image. Determining the parts of the triangle that are part of the final image is
undertaken by the leaf node processing part of the algorithm.

4.5 Leaf Node Processing - Recursive Rasterisation

The tree nodes are tested for visibility, as described in the preceding sections, until either they are
determined as being occluded or until a leaf node is reached. When a leaf node is reached, the
node is fully or partly visible from the viewpoint. Consequently, some or all of the geometry it
4 .5 L e a f N o d e P ro cessin g - R ecu rsive R a ste risa tio n 96

co n tain s m ay also be visib le fro m the v iew p o in t. T h e parts o f th e trian g les in the le a f n ode are
tested fo r o c clu sio n until the pixel level and if they are not o c c lu d e d , then the p ix els they o ccu p y
are d eterm in ed , co lo u re d and sh ad ed acco rd in g ly .

A t the hig h est level the alg o rith m can be d e sc rib ed w ith the p se u d o c o d e below .

ProcessLeafNodeO
1
tList = triangles in leaf node
for(each triangle t in tList)
{
polygon = ClipWithBoundingBox (t, nodeBB);
polyTrs = polygon.SplitIntoComponentTriangles ();
for (each component triangle cTr in polyTrs)
RecursiveRasteriseTriangle(cTr.pi, cTr.p2, cTr.p3);

L istin g 4.6: L e a f node p ro c e ssin g in C o h eren t R en d e rin g a lg o rith m . W h en a le a f n o d e is


reach ed , each tria n g le 's part th at is in the fru stu m is re cu rsiv ely rasterised .

T he p seu d o co d e show s the m ain c o m p o n e n ts in d e te rm in in g the co n te n ts o f the final im ag e. It is


to be recalled that o nly sc en es co n sistin g ex clu siv ely o f trian g les are c o n sid ered . E v ery trian g le
identified and accessed (as w ill be show n in A p p en d ix A .2.2) m ust be clip p ed w ith the b o u n d in g
box o f the node. T h is is to e n su re that the parts o f the trian g le th at are o u tsid e the n ode are
not co n sid ered . T h e clip p in g is p erfo rm e d u sing a m o d ified v ersio n o f the S u th e rlan d -H o d g m a n
clip p in g alg o rith m [S H 74].

leaf n o d e triangles

com m on
vertices
node

Parts of triangles clipped with the bounding box


(triangulated if n e c e ssa r y )

Figure 4.10: L e a f node trian g le s n eed in g clip p in g due to partly en c lo se d trian g les. To e n su re
accu rate visibility, o nly parts o f tria n g les that are fully c o n ta in e d by the node are to be
co n sid ered . If o th er p arts are c o n sid ered , th ey resu lt in ren d erin g artifacts.

A s Figure 4 .1 0 show s, clip p in g th ese trian g les leads to p o ly g o n s th at m ay not be trian g les. T h ese
p o ly g o n s are then b ro k en up into th e ir co m p o n e n t tria n g les u sin g a very sim p le m eth o d . O ne o f
the v ertices o f the p o ly g o n is c o n sid ered as the co m m o n vertex and a fan o f trian g les is c reated
w ith this vertex as the co m m o n v ertex o f all the trian g les. T he tria n g u la tio n m eth o d is illu stra te d in
F igure 4.10. T his m eth o d , how'ever, is n o t the best m eth o d for tria n g u la tio n and leads to tria n g le s
o f p o o r asp ect ratio. O th er m eth o d s w h erein a po in t is p laced in the c en tre o f the p o ly g o n and
co n nected to each vertex w o u ld resu lt in b etter q u ality trian g les. B ut. fo r the p u rp o ses o f o u r
alg o rith m , the trian g u la tio n m eth o d used by us p ro v id es satisfacto ry resu lts.

S u bsequent to clip p in g and tria n g u la tin g , the c o m p o n en t trian g les are raste rise d . T he first step is to
pro ject the three v ertices o f the trian g le to find the pix els that the tria n g le v ertices o ccu p y in im ag e
space. T he p ro je c tio n m eth o d is c o m p u ted as d escrib ed in S ectio n 4.3.1 - i.e., by m u ltip ly in g the
4 .5 L e a f N ode P ro cessin g - R e cu rsive R a sterisa tio n 97

po in t by the global tra n sfo rm a tio n m atrix and co n v ertin g the c o o rd in ate s from O p e n G L sp ace to
im age space. T he p ixels a ffected by the trian g le are then d e term in e d w ith a su b d iv isio n m eth o d
as d escrib ed by the p seu d o c o d e below .

RecursiveRasteriseTriangle(pi, p2, p3)


t
minX = min(pi[x], p2[x], p3[x])
maxX = max(pi[x], p2[x], p 3 [x ])

minY = min(pi[y], p2[y], p3[y])


maxY = max(pl[y], p2[y], p3[y])

minZ = min(pi[z], p2[z], p 3 [z ])


maxZ = max(pl[z], p2[z], p3[z])

if(maxX < 0 II maxY < 0 |I


minX > imageWidth || minY > imageWidth
I| maxZ < -1 || minZ > 1)
I
//if triangle is outside image bounds
return;

if (triangle is fully between 4 pixels)


return;
visible = CheckVisibilityHOM((minX + maxX)/2, (minY+maxY)/2);
iff!visible)
return;
if (maxX-minX < 1 && maxY-rninY < 1)
{
setPixel(maxX, maxY);
upaate HOM;
return;
}
1 = LongestEdge ();
p4 = PointOfTriangleNotIn(pi, p2, p3, 1);
middle - Midpoint(longestEdge);
RecursiveRasteriseTriangle(longestEdge.pl,
longestEdge.p2, middle);
RecursiveRasteriseTriangle(middle, p 4 , longestEdge.pl);

L isting 4.7: R ecursive ra ste risatio n o f a trian g le. T his is a su b d iv isio n a lg o rith m th at d iv id es the
triangle recursively until it casts a p ro je ctio n on o n ly o n e pixel, at w h ic h p o in t the tria n g le c a n be
attrib u te d to the pixel.

T he first step o f the recu rsiv e p ro c ess is to find the b o u n d in g box o f the trian g le b ein g c o n sid ered .
T his is d eterm in ed as the m in im u m and m ax im u m c o o rd in ate s alo n g each axis.

O nce this is d eterm in ed , if the b o u n d in g box is o utside the frustum , th e trian g le c a n n o t be in the
final im age. A test is u n d e rtak e n to d ete rm in e if the b o u n d in g box is w ith in the b o u n d s o f the
im age. If the b o u n d in g b o x ’s m ax im u m co o rd in a te is less than zero, in w h ich case the en tire
b o unding box and c o n se q u e n tly the trian g le is o u tsid e the im age. S im ilarly , if the m in im u m
co o rd in ate o f the b o u n d in g box is g re a ter than the im age w id th / im age h eig h t, then the trian g le is
outside the im age. T he b o u n d s fo r the Z c o o rd in a te are -1 and 1 w h ich are the d e p th s o f the n ear
plane and the far plane. P arts o f the scen e n o t betw een these plan es, in d icated by Z v alu es > 1
and < —1. are discard ed .
4.5 L e a f N o d e P ro cessin g - R ecu rsive R a sterisa tio n 98

If the tria n g le is d e te rm in e d as b ein g w ith in the im ag e b o u n d a rie s, the next test it h as to u n d erg o
is the v isib ility / o cc lu sio n test. H iera rch ica l O c clu sio n M a p s - d esc rib e d in S ectio n 4.4 are again
used to verify the tria n g le ’s v isibility. T h e H O M s in d icate if the p ixels o ccu p ied by the trian g le
have already been ren d ered . D ue to the fro n t-to -b a c k trav ersal o rd er o f n odes, if a pixel has
alread y been re n d e re d , it im plies that fo r this p ix el, the first tria n g le alo n g the ray (c o n c e p tu ally )
has been d eterm in e d . H ence, no o th e r trian g le can be visib le at th is pixel. T he H O M s allow easy
d e te rm in atio n o f th is co n d itio n .

W hen a trian g le is not sm all e n o u g h to d ete rm in e ju s t one pixel, it is split into tw o sm aller tria n ­
gles. T he trian g le is d iv id ed into tw o sm a lle r tria n g les at the m id -p o in t o f the lo n g e st edge o f the
trian g le. T he fu n ctio n is then recu rsiv ely called fo r th e tw o split trian g les until the ex ten t o f the
trian g le is less than one pixel w ide and one pixel high. T h ro u g h ea ch su b d iv isio n , the H O M s are
tested to see if the sp lit trian g le is o cclu d e d . W h en the su b d iv id e d tria n g le ’s ex ten ts are less than
than one, the pixel o v erlap p ed by the trian g le ex ten t is given the c o lo u r o f the trian g le.

F ig u re 4.11 sho w s the raste risa tio n p ro cess fo r a few d ifferen t trian g les.

pixels recursn e
subdivi: ions
triangles

very small triangle

pixels
shaded pixels

F igu re 4.11 : R a ste risin g trian g les. A su b d iv isio n m eth o d is fo llo w ed w h ere the trian g le is
su bdivided re cu rsiv e ly until th ey span a single pixel. A t th is pixel, the trian g le is d e te rm in ed as
bein g visib le.

S im p licity is fav o u red w hile im p le m en tin g this alg o rith m . S im ilar w ays to p erfo rm h ie rarch ical
rasterisatio n (W ar69J [G re96] IG G W 98J c o u ld have b een c o n sid ered , but the sim p le r m eth o d d e ­
scribed w as fav o u red . A cru cial p ro p e rty o f h ie rarch ical raste risa tio n is that its c o m p le x ity (i.e.,
the n u m b er o f recu rsiv e calls) is p ro p o rtio n a l to the n u m b e r o f p ix els o f the p ro jec ted trian g le.
F igure 4.12 show s the square root o f the n u m b e r o f recu rsiv e calls fo r a sin g le tria n g le p ro je ctio n
acco rd in g to one d im e n sio n (im ag e w id th / im ag e h eig h t) o f differen t im a g e sizes, d e m o n stra tin g
a p erfect linearity o f the alg o rith m . T he sq u are ro o t is needed since a sin g le d im e n sio n o f the
im ag e is plotted on the X axis.

T he recursive fu n ctio n is, on av erag e, c a lle d a p p ro x im ate ly 8 tim es p e r pixel. T h e re cu rsio n


4 .6 R e su lts 99

7000
W 6000
* 5000
£ 4000
w 3000
X 2000
a w
0 1024 2048 3072 4096

Im a g e S i z e ( S Q R T (# p i x e l s ) )

(a) (b)

F ig u re 4 . 12: L in e a rity o f the h iera rc h ic al ra ste risa tio n alg o rith m . T h e n u m b er o f recu riv e
ra ste risa tio n calls to ra ste rise the tria n g le (rig h t) at d ifferen t re so lu tio n s (fro m 8 2 to 4 0 9 6 2) has
been show n . It can be seen fro m the g rap h (left) that the n u m b e r o f calls in crease lin early
a c co rd in g to im ag e si/e .

sto p s o n ly w hen the size o f the b o u n d in g box o f the trian g le is less than a pixel w ide, but m any
su b d iv id ed tria n g le s en d up b etw een p ix els, m a k in g sh ad in g im p o ssib le and in creasin g the p er
pixel cost.

T h e p ro cess id en tifies th e tria n g le p ro je c tin g o n to e ach pixel. W hen all the trian g les fo r all the
p ix els have b een id en tified , th e v isib ility at ev ery pixel is d e term in ed . T he im age can be ren d ered
by sh ad in g the pixel d e p e n d in g on the p o sitio n o f the light so u rce. A v ery sim p le sch em e o f
sh ad in g is used w h e re the d o t p ro d u ct o f the tria n g le n o rm al and the ray is taken and the pixel is
sh ad ed acco rd in g ly . T h u s, u sin g the m eth o d d e sc rib ed in the above sectio n s, the v isib ility at each
pixel is d e term in ed and u sin g a b asic sh ad in g p ro c e ss, an im age is g en erated .

4.6 Results

A lth o u g h the a lg o rith m 's m ain m o tiv atio n is to in v estig ate the p o ssib ility o f a b etter co m p lex ity , it
w as e x p ected th at the a lg o rith m w o u ld be c o m p e titiv e w ith p acket ray tracers. H en ce, in ad d itio n
to co m p lex ity resu lts, th e ab so lu te p e rfo rm a n c e re su lts in co m p a riso n to p a ck et ray tracers are
provided.

H ow ever, it is to be n o ted th at, so far, no p a rtic u la r effo rt has been m ad e to o p tim ise the differen t
parts o f the alg o rith m . T h e m ain g oal o f C o h eren t R en d e rin g is to in v estig ate p erfo rm a n ce as
scene sizes an d im a g e s sizes in crease. A lso, k d -tre e s built u sing the space m ed ian h eu ristic have
been ch o sen , sin ce it a p p e a re d to be fa ste r in m ost c a ses th an the su rface area h eu ristic.

4.6.1 E m p irica l C om p lexity

T h e a lg o rith m h as b een e m p iric a lly tested w ith d ifferen t d a tasets on an A M D A thlon X 2 3 8 0 0 +


(2 G H z, 2 co re s, 512 KB ca ch e p er co re) wdth 2 G B o f m em o ry available. A ll o u r a lg o rith m s are
sin g le -th re a d e d and ru n n in g on a 32 -b it o p e ra tin g sy stem .
4 .6 R esu lts 100

Synthetic B enchm ark F o r the first test, a sy n th etic plan e (F ig u re 4 .1 3 (a )) w as su b d iv id e d to


see the evolutio n o f the re n d e rin g tim es acco rd in g to the n u m b e r o f tria n g le s. A p lan e w ith 2
trian g les is used as the base m odel and is su b d iv id ed to o b tain scen es w ith 2 S~ 1 trian g les, w here
s is the n u m b er o f su b d iv isio n s. In a d d itio n , 15 c o p ie s o f the sam e su b d iv id e d p la n e w ere ad d ed
beh in d the orig in al one and the p er pixel ren d erin g tim e s fo r b o th the sin g le an d m u lti p lan e scenes
w ere studied. A u n iq u e v iew p o in t - show n in F ig u re 4 .1 3 (c) - w as used.

R e n d e r i n g t i m e p e r p ix e l R e n d e r in g tim e p e r p ix e l

—♦— 3 2 x 3 2 32x32
— 64x54 —■ — 64x64
128x128 128x128
256x256 256x256
512x512 —* — 512x512
—• — 1 0 2 4 x 1 0 2 4 —• — 1024x1024
— l— 2 0 4 8 x 2 0 4 8 — t— 2048x2048
— 4096x4096 4096x4096

5 10 15 20

L o g ( # tr ia n g le s ) L o g ( # tr ia n g le s p e r p la n e )

F igure 4.13: S y n th etic B e n ch m ark s, (a) o rig in al single p lan e m esh (no su b d iv isio n ), (b)
16-plane scene su b d iv id e d 6 tim es, (c) is the final im ag e o b ta in e d fro m all sy n th e tic scen es
rendered using a u n iq u e v iew p o in t, (d) and (e) are the v aria tio n o f re n d erin g tim e s p e r pix el (//s)
according to the scene c o m p le x ity (i.e, n u m b e r o f su b d iv isio n s) and the im a g e size fo r single
plane and m u lti-p lan e respectively. R esu lts show that ren d e rin g tim es are c o n sta n t u n til the scene
c o m p lex ity b eco m e s g re a ter th an the im ag e co m p lex ity . At th is p o in t, re n d erin g tim es b eco m e
lo garithm ic. T he sim ilarity o f (d) and (e) in d icates th a t o n ly the v isib le tria n g le s are relevant.

R esults show that fo r a fixed im ag e size, the p er-p ix el re n d e rin g tim e is c o n sta n t until a given
point after w hich a lo g arith m ic co m p lex ity tak es over. C u rv es arc also se p a ra ted by a h o rizo n tal
distance o f tw o. in d icatin g that every tim e the im ag e size is d o u b le d , th is p o in t m o v es 2 po in ts
further. Finally, the cu rv es fo r both sets o f scen es p erfectly m atch es, w h ich p ro v es that the o c­
cluded trian g les added to the scen e do not affect re n d erin g tim es and c o m p le x ity at all, reflectin g
the efficiency o f H O M s.

T he per-pixel average co m p le x ity o b serv ed h ere can be su m m arise d as:


0 ( m a x ( 1. lo g 2 { v is ib l e tr i a n g le s ) — c .lo g 2 { im a g e s i z e ) ) ) w h ere c - i s a fixed c o n sta n t d e p en d in g
on the im plem en tatio n .

T his result is im p o rtan t since it p ro v id es ev id en c e th at c o m p lex ity o f C o h eren t R e n d e rin g is now


expressed as a fu n ctio n o f tw o variab les. In co m p a riso n , a re g u la r ray tra c e r (not u sin g c o h e ren ce)
is know n to achieve 0 (lo g 2 ( T o t.a l n u m b e r o f t r i a n g l e s ) ) here.

F or real-w orld d atasets, re su lts are u n fo rtu n ate ly not so relia b le , but it can be sh o w n that in creasin g
4 .6 R esults 101

the im age size still ten d s co m p le x ity to 0 ( 1 ) .

B enchm arking Com m on 3D Scenes T he test scen es are su m m a rise d in F ig u re 4 .1 4 . w ith q u a n ­


titative results co m p iled in F ig u re 4.1 5 . R en d e rin g tim es are ex p re sse d fo r th e given v ie w p o in ts,
w h ich include (very) basic sh ad in g . To d e m o n stra te the null c o n trib u tio n o f hidden o b je c ts, the
S p o n za scene has been re n d ered from inside and o u tsid e , and th ree S ta n fo rd m o d els have been
add ed to this scene as w ell.

F igure 4.14: M esh -b a sed C o h eren t R e n d erin g s. F ro m left to rig h t: S in g le tria n g le , S p o n za ,


Spo n za with S ta n fo rd M o d e ls . S p o n za from o u tsid e , D a v id and P o w erp la n t. A ll can be ren d e red
at ap p ro x im a te ly the sam e sp eed , p ro v id ed that the re su ltin g im age is large en o u g h .

T he m ost im po rtan t resu lt is v isib le in F ig u re 4 . 15(b), w h ere re n d e rin g tim es have been d iv id ed by
the n um ber o f non- b lack p ix els to c o m p e n sate fo r the fo o tp rin t size o f e ach m o d el. All re n d e rin g
tim es converge to a re feren ce co st C , irresp ectiv e o f th e n u m b e r o f tria n g le s in the scen e. F o r
instance, the P ow erp la n t (12 m illio n trian g les) and the S in g le T riangle scen e re n d e rin g tim es
(after fo otprin t n o rm a lisa tio n ) are less than 30% ap art w h en re n d e rin g an 81f)2J im ag e. If the
algorithm w as p urely lo g arith m ic, this ra tio w oultl have been m o re in the o rd e r o f 23.

To better un d erstan d this pro p erty , n u m b ers o f calls to tw o critic a l fu n c tio n s - the tree trav ersal
and the rasterisa tio n fu n ctio n s - have been m easu red as w ell. S im ila r to re n d erin g tim es, the
n u m b er o f rasterisatio n steps (F ig u re 4 .1 5 (d )) co n v erg e to an ap p ro x im a te co n stan t o f 8 step s
p er pixel, w hich d e m o n stra tes the average 0 ( 1 ) ra ste risa tio n c o m p le x ity p e r pixel. H ow ever, the
equivalent o p eratio n in ray tra c in g (i.e.. in tersec tio n test) is lik ely to be 0 ( 1 ) if the right tree is
used. T h erefo re, the n u m b e r o f traversal calls m ade by the a lg o rith m m u st be an aly sed as w ell.
W ith regular ray tracin g , this n u m b e r o b v io u sly in creases lin early w ith the n u m b er o f rays, as
v isible in [W F M S 0 5 ]. In C o h eren t R endering's, case (F ig u re 4 .1 5 (c )), the n u m b er o f trav ersal
steps is m ore o r less in d ep en d e n t o f the im age size, but o b v io u sly d ep en d s on the sc en e itself.
T his is w here the co m p le x ity im p ro v em en t c o m es fro m . It m u st be ad d ed th at for all re n d e rin g s
except the D a v id scen e, the n u m b e r o f traversal step s are less th an a m illio n , w hich is a very
low w hen co m p a red to the n u m b ers o f cells trav ersed by o th e r te c h n iq u e s su ch as those rep o rted
in [R SH 05] [W 1K + 0 6 | fo r sim ilarly sized scenes.

To get a b e tte r p ictu re o f the re d u n d an t no d es trav e rse d by a reg u la r ray tracer, a c o m p a riso n
betw een both m eth o d s is sh o w n in T able 4.2. F or every sce n e, tw o trees are co n stru c te d w ith
different splitting h eu ristics. T h ese trees are used by b oth ren d e rin g a lg o rith m s. A sin g le ray
tracer w as used, but a 4 x 4 p ack et, even in the ideal case, w o u ld o n ly div id e th ese n u m b ers by 16
and cannot reach the efficien cy o f C o h eren t R en d e rin g in te rm s o f n o d es traversed. T h is clea rly
suggests that th ere is still so m e room fo r o p tim isin g n ode trav ersal in ray tracin g .

I LIBRA RY
4 .6 R esu lts 102

160 seconds _ . . . .
t e n d e r i n g T im e s ( s )
(a) 35 t e n d e r i n g T im e s p e r n o n -b la c k p ix e l (b)
140 30 (p s/p ix e l)

120 Single Triangle


25
PowerPlant
100 David
20
80 Sponza inside
15 Sponza&Objects inside
60 Sponza outside
40 10 Sponza&Objects outside

20 5
p ix els
0 0
0 1024' 2048' 30722 4096' 51202 6144' 71682 8192 0 10242 20482 30722 40962 51202 61442 71682 8192

nu m b er of trav ersed no d e s j(c ) 45 number of rasterisation steps <d)


2500000 Rasterisation steps per non-black pixel
T ra v e rse d N odes
2000000

1500000

1000000

500000
pixels
0
0 1024- 2048- 30722 40962 51202 61442 7168J8192; 0 1024? 2048' 30722 4096' 51202 6144: 71682 8192

Figure 4.15: A n aly sis acc o rd in g to the im age size, (a) R e n d erin g tim es fo r the d ifferen t scenes,
(b) R en d erin g tim es p er n o n -b lack pixel sh o w in g co n v erg en ce to the sam e ren d e rin g tim e, (c)
N u m b er o f nod es trav ersed - in d ep en d en t o f th e im age size, (d ) N u m b e r o f ra ste risa tio n steps p er
no n -b lack pixel - c o n v erg en ce to the sam e ra ste risa tio n cost p er pixel. N ote that the tw o S p o n za
o u tsid e cu rv es o v erlap on all g rap h s.

RT (S pace RT (S A H C o h eren t R e n ­ C o h eren t R e n ­


M ed ian T ree) T ree) d e rin g (S p ace d erin g (SA H
M e d ia n T ree) T ree)
S ingle T riangle 1 1 #1 #1
P ow erP lant 75 59 0 .3 8 2 1.41
D avid 68.5 51 2.5 4.67
S po n za 87 61 0 .1 7 6 0 .169
S po n za and m o d els 109 72 0 .6 4 2 0.23
S po n za out 5.2 5.5 0.01 0.01
S ponza and m o d els out 5.35 5.5 0.01 0.01

T able 4.2: N u m b er o f no d es (in m illio n s) trav ersed by a recu rsiv e ray tra c e r (w ith o u t p ack ets)
and the C o h eren t R e n d erin g alg o rith m fo r a 1024 x 1024 im age.

T he S p o n za case re q u ires p a rtic u la r atten tio n . It is c le a r th at a d d in g several o b jects to the scen e


d ecreases the co n v erg en ce rate significantly, as lo n g as those o b jects are visib le in the final im age.
By using a view p oin t o u tsid e the S p o n za atriu m (th o u g h the S ta n fo rd m o d els are still in sid e the
frustum ), the visible p art o f the m esh is c o n sid e ra b ly sim plified. C o n v erg en ce is then very early,
and ad d in g or not a d d in g the S ta n fo rd m o d els ( 1.5 m illio n tria n g les) in sid e the atriu m d o es not
m ake m uch differen ce (cu rv es from b oth d a ta se ts o v erlap on the g rap h ). T h e n u m b e r o f n o d es
traversed for the D a v id scen e is large w hen c o m p a re d to the P o w erp la n t in w h ich there is a high
degree o f o cclu sio n . A ll in all the ex p erim en ts show that the c o n v erg en ce sp eed is directly linked
4 .6 R esults 103

to the co m p lex ity o f the v isib le part of the m esh. T h is d o es not c h an g e the c o m p lex ity resu lt itself,
but m ay ap p ear to be an issue in p ractice. T h e re fo re , the use o f o n -th e-fly sim p lifica tio n a lg o rith m s
such as |W F P ~ 0 1 ] |W W Z + 0 6 | c o u ld p o ssib ly be an in te re stin g e x ten sio n to the alg o rith m w hen
the visible m esh d en sity is g re ate r th an the pixel density.

F inally, the cu rren t re n d erin g tim es are ab o u t 10-15 tim es slo w e r than the sta te-o f-th e -a rt M L R T
alg o rith m , how ev er th ere is p len ty o f ro o m fo r o p tim isa tio n . T h e m ain po in t is th at fo r reaso n a b le
ren d erin g sizes (1 0 2 4 x 1024 to 2048 x 2 0 48), all the p er-p ix el re n d e rin g tim e s but D a v id s are o n ly
b etw een one and th ree tim es slo w er than the ren d erin g tim e o f a sin g le trian g le. T h is d e m o n strate s
that the co m p lex ity a ssu m p tio n is valid and scalable.

4.6.2 A b solu te P erform an ce

S cene ERW 6 S p o n za A rm ad illo S o d ah all P o w erp lan t

kd -tree 12 24 26 27 25
m a x i­
m um
depth
kd-tree
le a f node
triangles
6 fps

C oherent
R e n d er­
I 't
■ - .I - J
ing vs
RT and
packet
R T (8 x 8 )
L egend ■ Coherent rendering □ Ray tracing single □ Ray tracing packet 8x8

T able 4.3: P e rfo rm a n ce s o f C o h eren t R en d e rin g vs sin g le ray tracing an d p ack et ray tracin g
( 8 x 8 rays) in fram es p er second. It is to be no ted th a t the p a c k e t ray tracin g im p lem en tatio n has
been o p tim ised th ro u g h the use o f S SE in stru c tio n s. A b so lu te p erfo rm an ce o f C oherent
R en d e rin g is o b serv e d to be slo w er than th at o f packet ray tracing.
4 .7 D iscussio n 104

Although the main experiments were performed to verify complexity assumptions, the absolute
performance in comparison to packet ray tracers is also considered. To measure the effectiveness
of the algorithm, several models have been considered. These models are rendered using the Co­
herent Rendering algorithm described. To determine the fastest performance, the implementation
was run on a PC with a Core 2 Quad processor, but single threaded. The algorithm was developed
and compiled as a 64 bit application with the Visual Studio 2005 C++ compiler to run on Windows
XP 64.

To measure the absolute performance, the scenes shown in Table 4.3 are rendered with ray tracing,
packet ray tracing and Coherent Rendering. Kd-trees with parameters that differ according to
scene sizes are used as the underlying data structure. Details of the kd-tree thus built are also
provided. An image size of 1024 x 1024 is used to benchmark the algorithm.

Table 4.3 provides the results of rendering using this algorithm. The results show the performance
of Coherent Rendering in comparison to single ray tracing and packet ray tracing with 8 x 8 rays.
They indicate that Coherent Rendering is faster than single ray tracing but cannot be competitive
with packet ray tracing.

While the absolute performance of the algorithm is not competitive with a packet ray tracers
performance, the main point of the algorithm is to investigate its complexity per pixel. The reasons
for the slow performance are discussed in Section 4.7.2.

4.7 Discussion

4.7.1 C om p lexity

As the results demonstrate, Coherent Rendering allows convergence to constant complexity per
pixel. While Coherent Rendering is already very efficient for isosurfaces, more work remains to be
done for the triangle mesh version since the algorithm currently suffers from a high constant and
many questions are still open. For instance, the motivation for such a technique when fast packet
ray tracers [RSH05] exist could be questioned. First, this research is a proof of concept, which
verifies that lower average complexities exist for rendering, which has never been demonstrated
before.

Using a similar methodology , researchers can investigate empirically whether their technique al­
lows lower complexities or not. A very simple way to do so is to compare rendering times between
various scenes and a single triangle scene. The per-pixel times would indicate whether the new
algorithm is of lower complexity. This would allow differentiation between pure optimisations
and complexity advances and testing whether techniques are optimal and scalable or not. This
methodology could, for instance, be applied to the open problem of secondary rays.

There are many areas where the application of the rendering pipeline is less trivial, like direct illu­
mination of the scene. An approximation of direct illumination is possible at the same complexity
if a shadow mapping method - that requires rendering an image from the light source(s) - is used.
Similarly, this algorithm can be used for global illumination as instant radiosity [Kel97] demon­
strates it. However, shadow mapping and instant radiosity just estimate visibility. A fully correct
solution would need to consider an irregular grid of pixels for the secondary rays. While it may
be possible, it will be scientifically very challenging to develop such an algorithm and to ensure
optimal complexity at the same time. Another question is whether this algorithm is portable to
4 .7 D iscussion 105

g rap h ics h ard w are o r not. F o r ex am p le, som e p arts o f the a lg o rith m like trian g le / box c lip p in g
are already p o ssib le on the latest g rap h ics card g en eratio n . In tru th , the w h o le a lg o rith m is a lo t
c lo s e r to h ard w are-b a se d alg o rith m s and p ip elin e s th an ray tra c in g alg o rith m s. C u rre n t g ra p h ­
ics hardw are m a n u fa c tu re rs alread y im p lem en t lim ited h ie ra rc h ic a l ra ste risatio n w ith h iera rc h ic al
Z -buffers. F or in stan ce. 3-level h ie rarch ical Z- b u ffer is c u rre n tly im p le m e n te d on ATI g ra p h ic s
card s. A full p y ram id c o u ld actu ally be m o re su itab le. F ast a c ce ss to a p ix e l’s Z h ierarch y by the
C P U s w oufd be n ecessary to d ete rm in e v isibility. T h is w o u ld req u ire a fast c o m m u n ic a tio n c h a n ­
nel w ith low laten cies b etw een the G P U and the C P U and a few c o m p a n ie s are alread y w o rk in g
in that directio n . It is b eliev ed th at this w ould be m uch m o re e fficien t than o cclu sio n q u erie s, and
w ould th u s be a great featu re in G P U s.

Finally, a w o rst-c ase ren d e rin g sc en ario has not been d isc u sse d since real-w o rld d a tasets that
w ould not w ork have not been found. It is c le a r that if the H O M pix els c a n n o t b eco m e o p a q u e
(e.g., the final im age is m ad e o f m an y w id ely sp read black p ix e ls), the v isib ility test w ill alw ay s
pass and m ore n o d es w ill be trav ersed . A ctu ally , this is th e sam e w o rst case as w ith reg u la r
ray tracers becau se the C o h eren t R en d erin g a lg o rith m th e o re tic a lly trav erses the sam e n o d e s as a
reg u lar ray tracer.

4.7.2 A b so lu te P e rfo rm a n c e

T h o u g h , the im p o rtan t m o tiv atio n o f the alg o rith m is to in v estig ate and d em o n strate the b e tte r
com plexity, a usable a lg o rith m n eed s to be c o m p etitiv e w ith p a ck et ray tracers. In th is area,
the C oherent R en d erin g alg o rith m is slo w er than packet ray tra c e rs w a rran tin g in v estig atio n s to
identify the cau ses fo r the relativ ely p o o r p erfo rm an c e. T h is w o u ld also assist in o p tim isin g the
algorithm .

F igure 4.16 show s the alg o rith m profiled (by ren d erin g the S p o n za scen e n u m ero u s tim es). T he
profiling results show th a t the recu rsiv e p ro c ess o f sp littin g tria n g le s and id en tify in g the p ix els is
the m ain b o ttlen eck . T h e next m ajo r b o ttlen eck is id en tify in g th e b its in the H O M and p e rfo rm in g
the visibility test, fo llo w ed by the v ecto r su b tract, dot p ro d u ct and add o p eratio n s.

50.00%
40.00%
30.00%
20.00% n

10.00% fl
=3

<N

y
/ /
y y # ,/ a /
y y
*
y / f y >y yj r y / y
y y
/
y y y
e
y
r
<&
nsT
<&
>
r
o'0
y ^o y / yr ^
ElCPU_CLK_UNHALTED.CORE %

F ig ure 4.16: P rofiling the C oh eren t R e n d e rin g alg o rith m .


4 .8 S u m m a ry 106

The number of recursive rasterisation steps - see Section 4.5 - is quite high as each triangle has
to be split several times until the pixel level is reached. Since the rasterisation is done in 2D, the
steps are expensive. Since the triangles are split until they span less than a pixel, many triangles
- especially large triangles occupying several pixels are split numerous times before the sub-pixel
level is reached, at which point, it may be determined that the pixel is not shaded by the triangle
in consideration. This causes the recursive rasterisation process to be performed a large number
of times, sometimes without effect. The fact that the rasterisation algorithm does not identify
multiple pixels in one iteration makes it expensive.

The large number of recursive rasterisation calls also means that each time a triangle is split, the
mid-point of the vector has to be found leading to several calls to vector adds. In addition, at every
recursive step, the triangle is tested for visibility leading to numerous occlusion determination
calls. Due to the fact that the operations are in 2D, they are expensive leading to an expensive
rendering algorithm.

One method to alleviate this is to use a simple ray tracing approach at the leaf node level, i.e., when
the Coherent Rendering algorithm’s tree traversal reaches the leaf node, the ray corresponding to
the leaf node projection can be intersected with the leaf node’s triangles to obtain the right triangle
for the pixel. While the algorithm could be faster, it would lose the occlusion test done during the
rasterisation process which may be detrimental. This method, though, has not been investigated
and could merit further study.

However, the advantage of Coherent Rendering is that the number of tree traversals are minimised
as evidenced by the profiler results where the tree traversal method does not appear even in the
top 11 most time consuming methods. In a ray tracing algorithm, the complexity of triangle
intersections in a ray tracer can be assumed to be a constant. However, each ray has to traverse the
tree to get to the leaf node, performing an average of lo g (N ) traversals leading to an 0 ( lo g ( N ))
complexity per pixel. Since in Coherent Rendering, the number of nodes traversed is minimal and
almost the same for all image sizes, its complexity approaches 0 ( 1 ) when image sizes increase.

4.8 Summary

The Coherent Rendering algorithm discussed in the chapter combines concepts from rasterisation
and ray tracing to introduce a new rendering algorithm. In a ray tracer, the complexity per pixel
is determined as being due to the tree traversals and not the triangle intersections themselves.
Packet ray tracers amortise this by traversing the nodes with groups of rays together. However,
due to coherence issues, these packets are limited to a certain size - optimal size of 8 x 8 in our
implementation. Coherent Rendering takes this approach further by proposing an algorithm where
the entire image’s pixels are considered as a packet. The nodes visible from the viewpoint are
determined. Further, using the concept of Hierarchical Occlusion Maps, the early ray termination
property of ray tracers is applied to Coherent Rendering to perform occlusion detection.

The main reason for investigating the Coherent Rendering algorithm is to show that it may result
in an 0 (1 ) renderer. The results show that as image sizes increase, the rendering times appear
to converge to times needed to render a single triangle. On an 8192 x 8192 pixel image, the
rendering time for the Powerplant scene (12 million triangles) is around 30% more than the time
to render the Single Triangle scene. This shows that the method is definitely not logarithmic. It is
expected that if the image size is increased further, the difference in this rendering time would be
even smaller.
4 .8 S u m m o n ' 107

At the same time, the algorithm was expected to be very competitive with ray tracers. However,
due to expensive projection and rasterisation operations performed in 2D, the algorithm’s perfor­
mance suffers. Due to this, though the algorithm manages to be faster than single ray tracers for
most cases, it fails to compete with packet ray tracers. Through optimisations like the use of SSE,
the algorithm can be implemented in a more efficient manner. However, the fact that its perfor­
mance is not competitive with modern packet ray tracing methods is currently a drawback. Since,
the 2D nature of the algorithm has been identified as being the cause of the poor performance, it is
believed that an algorithm that performs these operations in ID would be much faster. The result
is the algorithm called Row Tracing - discussed in chapter 5.
Chapter 5

Row Tracing

C o n ten ts_________________________________________________________________
5.1 M o tiv a tio n ................................................................................................................... 106
5.2 C oncep t..........................................................................................................................107
5.3 High Level A lg o rith m ................................................................................................109
5.4 Datastructures for Row T r a c in g ............................................................................ 110
5.5 Tree T r a v er sa l.............................................................................................................I l l
5.6 Leaf Node P r o c e ssin g ................................................................................................ 117
5.7 Final Image G eneration .............................................................................................122
5.8 ID Hierarchical Occlusion Maps (HOMs) ......................................................... 123
5.9 Packet Row T r a cin g ................................................................................................... 129
5.10 Low Level O p tim isation s.................. 133
5.11 Results ............................................................................................................. 133
5.12 S um m ary .......................................................................................................................142

Row Tracing is introduced as a new visibility / rendering method that is based on Coherent Ren­
dering. At the same time, it aims to improve the absolute performance so that it is competitive
with packet ray tracers. This chapter explains in detail the method described by us in [KM08]
- Kammaje, Ravi R; Mora, Benjamin, “Row tracing using Hierarchical Occlusion M aps’’, IEEE
Symposium on Interactive Ray Tracing, 2008. R T 2008., pp.27-34, 9-10 Aug. 2008.

5.1 Motivation

Recent research has popularised the use of packet ray tracing as a rendering method through the
use of groups of rays that have a variety of shapes and sizes [PKGH97] [WBWS01] [RSH05]
[Wal04] [WBS07] [ORM07]. The acceleration is brought about by traversing the entire packet
of rays, thus amortising the computational costs resulting in a much lower cost per ray. Recent
research [RSH05] [WBS07] [BWS06] has enabled the use of larger packets through interval arith­
metic.

New methods of data acquisition and sophisticated scanning technologies have resulted in in­
creased scene sizes. For larger scene sizes, the computational complexity of packet ray tracing -

108
5 .2 C o n cep t 109

shown lo be logarithmic [HB00] [Hur05] [WSS05] [HHS06] [YLM06] - makes it the preferred
method of rendering. Packet ray tracing is also trivially parallelisable, allowing it to take max­
imal advantage of the recent trend towards multi-core processors. This scalability of packet ray
tracers over scene sizes as well as over multiple cores has resulted in it being touted as a potential
alternative to rasterisation.

Packet ray tracing algorithms work exceptionally when the size of the packets are relatively small.
On the other hand, using a larger group of rays that traverse a similar path would maximise cache
coherence and reduce memory bandwidth requirements. However, as the number of rays are
increased, the coherence reduces, leading to a performance penalty, as the component rays traverse
different paths. Thus, to achieve the best performance, an optimal (but relatively small) number
of rays per packet is necessary.

Another disadvantage of larger packets is that the early ray termination property of ray tracers
that allows in-built occlusion testing can be used only if all the component rays / pixels have
already found intersections. For smaller packets, the first hit object for most rays in the packet are
determined at almost the same traversal step and traversal can be stopped at this point. However,
for larger packets, due to the fact that the rays are far apart the object intersections may occur at
widely separated nodes. Thus, the early termination property cannot be effectively used.

During intersection tests, at most four rays can be tested against a single triangle through the use
of SIMD / SSE instructions (on current architectures). Packets with a greater number of rays have
to be split into smaller groups of four each and intersected with the triangle.

These inefficiencies due to the use of larger rectangular packets of rays point to the possibility
that rectangular packets may not be the most efficient grouping of rays to maximally utilise the
coherence provided by the data structure.

In Chapter 4, a new algorithm that aims to achieve a lower empirical computational complexity
was introduced. However, due to the use of 2D structures, it suffers from expensive intersection,
projection and occlusion detection costs making it an unfeasible algorithm to use in its current
form.

In both Coherent Rendering and packet ray tracing, the main inefficiencies exist due to the fact that
rays are grouped into 2D packets instead of a simpler structure. Hence, the idea of using an algo­
rithm similar lo Coherent Rendering, but with packets spanning a row of the image is considered.
The algorithm, called Row Tracing, is an attempt at an algorithm that preserves the advantages of
both packet ray tracing and Coherent Rendering while minimising the disadvantages.

5.2 Concept

Conceptually, Row Tracing is a very simple algorithm and is very similar to ray tracing. Instead
of tracing rays through a structure like a kd-tree or octree constructed on the scene, entire lines
/ rows of the image are traced through the structure. At the leaf nodes of the structure, the row
is intersected with the triangles of the leaf node to obtain the intersected objects for the affected
pixels of the row.

The use of an entire row of the image allows amortisation of traversal and intersection costs. It also
allows the algorithm to use the dual property of a row of pixels - that it can be considered either
as a 2D plane of rays (as shown in Figure 5.1), or as a ID line of pixels - at different points of
5.2 C oncept

Row plane

Row being traced

Im age plane

Viewpoint

Figure 5.1: A row of the image as a 2D plane. Tracing this 2D plane through the tree is the basic
idea of the R o w Tracing algorithm.

the algorithm. This enables the algorithm to select from several efficient methods for the integral
parts of the algorithm.

The row is considered as a 2D plane of rays when traversing through the tree or when intersecting
with triangles. Due to plane-box intersections and plane-triangle intersections being cheap, the
coherence used does not incur additional costs. Intersecting a single ray with a box or a triangle
is of almost similar computational cost to that of intersecting a plane with a box or a triangle. The
traversal and intersection cost - already quite low - is amortised across the number of active rays.
Further, plane-node intersections and plane-triangle intersections are achieved through the use of
simple dot products - vectorisable with SIMD instructions.

At other points of the algorithm, considering the row as a ID line allows simplification and optimi­
sation through the use of ID versions of several operations. Primitive clipping, occlusion testing
and frustum bounds testing arc a few key operations benefiting due to the ID nature of the row.

A disadvantage of tracing rows as compared to single rays is that it loses the early ray termination
property used very effectively for occlusion testing. When a ray is traversed through a space
subdivision structure in front-to-back order, the ray traversal is stopped if it has intersected an
object, thus eliminating a large number of unnecessary operations. R o w Tracing loses this ability
due to the large span of the packet. To overcome this disadvantage, a ID version of H ie ra rch ic a l
O cclusion M a p s (HOMs) introduced by Zhang et al. [ZMHH97] is used. The HOM for a row
allows skipping of nodes that overlap already rendered parts of the row - in effect, an early ray
termination like test for nodes.

R ow Tracing combines features and abilities of ray tracing and rasterisation. The ability of using
spatial subdivision data structures is borrowed from ray tracers. Generating the image by scan-
lincs, projecting points, etc are concepts adapted from rasterisation. This combination, in addition
to the use of H iera rch ica l O cclu sio n M a p s, is expected to enable R o w T racing to demonstrate per­
formance advantage over current packet ray tracers. As scene sizes increase, R o w T racing - due
5 .3 H igh L evel A lgorith m 111

to its (speculated) logarithmic complexity - should show better performance than Z-buffer based
visibility (OpenGL) when the scene consists of a large number of triangles. Table 5.1 provides a
comparison between rasterisation, ray tracing and Row Tracing and it is clear that Row Tracing is
a hybrid algorithm inheriting properties of both algorithms.

Rasterisation Ray Tracing Row Tracing

Logarithmic Complexity X / /

Cheaper per-pixel scanline X


/ /
algorithm

Multi-core / Multi-CPU X / /
Parallel is ation

Shadows
/ / /

Table 5.1: Comparison between Rasterisation, Ray Tracing and Row Tracing

For the purposes of the algorithm, it has been implemented with kd-trees and octrees as the under­
lying data structures.

5.3 High Level Algorithm

The high level Row Tracing algorithm is very similar to ray tracing. The image’s rows are indi­
vidually traced through a hierarchical structure. When the traversal ends - the triangle occupying
each pixel of the row is determined. When all the rows of the image are similarly traversed, the
triangles for each pixel are identified to determine visibility at each pixel. These pixels can then
be shaded according to the triangle and the lights.

The high level algorithm is very similar to the high level Coherent Rendering algorithm (List­
ing 4.1). Listing 5.1 shows the algorithm with the major differences between the Coherent Ren­
dering and the Row Tracing algorithm highlighted in blue.
5 .4 D a ta stru c tu re s f o r R o w Tracing 112

RowTrace()
{
for(each image row)
{
InitialiseRowConstants() ;
TraverseTree(row, rootNode);
Shade row;
}
}

Listing 5.1: High Level Row Tracing algorithm. Each row of the image is initialised, traversed
down the tree and finally the pixels determined are coloured appropriately. At the high level, the
algorithm is very similar to the Coherent Rendering algorithm (Listing 4.1). Differences between
the two are highlighted (in blue).

The row of the image to be traced is iteratively selected and by an initialisation process the row
plane equation and other attributes associated with a row are initialised. Subsequent to this pro­
cess, the row is ready to be traversed through the data structure. The row traverses the tree in
a front-to-back order until it is determined that the node cannot contribute to the row ’s pixels or
until a leaf node is reached. At the leaf node, the row is tested against the leaf node’s triangles to
ascertain the pixels occupied. Once the triangles have been determined for each pixel of the row,
the visibility at each pixel of the row is determined. It is to be noted that in the pseudocode the
leaf node processing method is not explicitly called as it is called by the TraverseTree method
upon reaching a leaf node.

The considerations for the underlying data structure along with the three parts of Row Tracing -
Initialisation, Tree traversal and Leaf node processing - will be detailed in the following sections.

5.4 Datastructures for Row Tracing

As with ray tracing, rendering times for Row Tracing depend on the number of nodes traversed,
the number of intersections and cost per traversal and intersection. The total rendering time is
given by Weghorst et al. [WHG84] in Equation 2.11. Since the only primitives being considered
are triangles, cost of primitive intersection is a constant. Hence, a structure that is well suited for
Row Tracing minimises the number of nodes traversed with a low per node intersection that at the
same time effectively separates the triangles so that the number of triangles to be intersected are
low.

Row-triangle intersections are very cheap and are amortised across several pixels and hence, the
second term of the cost, N p j x C p j, will be low when compared to the first term. Due to this,
Row Tracing prefers data structures that minimise the first term, N p x C p, even at a slight penalty
to the second term. Data structures with axis-aligned boxes can be intersected against a row plane
at a low cost [Hof96] and hence are particularly suited for Row Tracing. A kd-tree built with
the surface area heuristic, with its effective separation of triangles is a good structure for Row
Tracing. Another structure considered is the octree - due to properties that enable a cheaper cost
for intersection, Cp. These two data structures are thus considered.
5.5 Tree T raversal 113

5.4.1 K d-tree

A kd-tree is one of the most researched structure for ray tracing. Due to this, there are several
well known algorithms and heuristics for the construction and ray traversal of a kd-tree. The
most popular heuristic is the Surface Area Heuristic(SAH) generally accepted to produce the best
kd-trees for ray tracing. However, Row Tracing is a different algorithm - the best kd-tree for
ray tracing is not necessarily the best for Row Tracing. The advantage with an SAH kd-tree is
that it quickly separates empty space and selects a locally optimal split position at each step. This
partitioning scheme results in a very good tree that should be suitable for use with the Row Tracing
algorithm.

5.4.2 O ctree

The octree is normally not discussed with respect to ray tracing, but it is a very simple structure
that is easy to create. In addition, it is a very efficient structure for Row Tracing due to its property
that each node at a particular depth is a cube of the same size. This property enables the use of
a simple optimisation that reduces the cost of row-node intersection, tN o d e , making it a good
structure to investigate for Row Tracing.

Thus, both the kd-tree and the octree have advantages that make it suitable for Row Tracing. In
addition, by implementing and demonstrating the effectivity of Row Tracing on more than one
data structure with axis-aligned bounding boxes, the easy adaptability of Row Tracing over such
data structures is shown.

The details of the algorithm starting with the tree traversal by the row will be discussed in further
sections.

5.5 Tree Traversal

For Row Tracing on both the kd-tree and the octree, the row is traversed down the tree in a front-to-
back order. Several attributes required for the traversal computations are constants - either for all
the rows or for a single row. An initialisation process handles the computation of these attributes.

5.5.1 P la n e -B o x in tersectio n

Both the kd-tree and the octree have nodes are axis aligned boxes. Hence, each traversal consists
of a row plane-box intersection. The method described by Hoff [Hof96], that allows several
optimisations will be described in brief before the tree traversal algorithm is described.

A box intersects a plane if its eight vertices lie on both sides of a plane. Naively, this can be found
by calculating the signed distances of the eight vertices to the plane. However, as Figure 5.2 shows,
it is sufficient to test two vertices. These two vertices are the extremities of the box in relation to
the plane. Figure 5.2 shows that these two vertices are the vertices that have the minimum and
maximum projection onto the normal of the plane. Thus, it is sufficient to test if these two points
are on the same side or different sides of the plane.
5 .5 Tree Traversal 114

Hoff also explains the determination of these two points. If N is the normal of plane, the vertex at
the extremity in the direction of the normal is given by the pseudocode below:
D e t e r m i n e M a x V e r t e x ()
{
if( N.x > 0 ) //RIGHT
if(N.y > 0 ) //RIGHT, TOP
if (N .z>0 ) //RIGHT, TOP, FRONT
vl = RIGHT, TOP, FRONT
else
vl = RIGHT,TOP,BACK
else //RIGHT, BOTTOM
if(N. z>0)
vl = RIGHT,BOTTOM,FRONT
else
vl = RIGHT,BOTTOM,BACK
else //LEFT
if(N.y > 0 ) //LEFT, TOP
if(N.z>0)
vl - LEFT,TOP,FRONT
else
vl = LEFT,TOP,EACK
else //LEFT, BOTTOM
if(N.z>0)
vl = LEFT,BOTTOM,FRONT
else
vl = LEFT,BOTTOM,BACK
)

Listing 5.2: Determining the vertex at the extremity in the direction of the normal. N - is the
normal to the row-plane and v l - is the vertex at the extremity.

The vertex, V2 , at the extremity in the negative direction of the normal is just the negative of v i.
i.e., if v i is determined as the l e f t , bottom , back vertex, then V2 is the r i g h t , to p , f r o n t vertex.

If v i and V2 are the two vertices with the minimum and maximum projection onto the plane’s
normal and if the plane’s equation is A x + B y + C z + D = 0, then the intersection is determined
by using the set of equations below.

dv = A x v -f- B y r C z v -I- D
dy = A xv B y v + C zv + D
in tersection = (dv = — 0) or (d.v = = 0) or ((dv > 0)! = (dv > 0)) (5.1)

Using this method, it is sufficient to calculate the plane-vertex distance for just two vertices. In
addition, in any structure with a hierarchy of bounding boxes, like kd-trees, octrees or BVHs,
the nodes are all oriented similarly. Hence, these two vertices can be determined during an ini­
tialisation process and the appropriate vertex can be accessed when necessary, allowing further
optimisations.
5 .5 Tree Traversal

'lane normal

Plane
Projection of vertex onto,
plane's normal *

0 Vertices casting maximum projection


onto plane's normal

F ig u re 5.2: In tersectio n o f a box and a p lan e. A s show n, it is sufficient to d ete rm in e if the tw o


e x tre m e p o in ts o f the box in rela tio n to the norm al o f the p lan e arc on the sam e side o r e ith e r side
o f the plane.

5.5.2 Initialisation

T h ere are tw o parts to the in itia lisa tio n p ro cess - one that o ccu rs p rio r to the tree trav e rsa l and one
that o ccu rs p rio r to every ro w ’s trav ersal. T he first p ro cess co m p u tes the g lo b al v alu es th a t rem ain
the sam e for all the row s. T h e seco n d p ro cess - p erfo rm ed fo r every row - c o m p u te s th e v alu es
relev an t to the row b ein g traced .

• G lo b al V alues - T h ese are the a ttrib u tes th at do not ch an g e acro ss the row s and c an hen ce be
c o n sid ere d as g lo b al acro ss all the row s. In itialisin g th ese ju s t once b efo re the tree trav ersal
starts is a d v an ta g eo u s. T h e a ttrib u tes c o n sid ered as g lo b al are:

- G lo b al tra n sfo rm a tio n m atrix - T h is is the tra n sfo rm a tio n m atrix that is u sed to convert
a point fro m m o d el sp ace to im ag e space.

- N e a r plan e - T h is is the plan e upon w h ich the im age is ren d ered . It is v arian t upo n the
view p o in t used and is co n stan t fo r all the row s. T he four v e rtice s o f th is p la n e are d e ­
term in ed by ap p ly in g the g lo b al tran sfo rm a tio n to the four v e rtice s o f the O p e n G L neat-
plan e c o o rd in a te s. S u b seq u en tly , the e q u atio n o f the n e a r plan e is fo u n d by fin d in g the
n o rm al to the n e a r plan e. If the n ear plan e is giv en by A npx + B npy + C lipz + D np = 0,
the n o rm al v e cto r p ro v id es the values o f A np, B np and Cnp. P lu g g in g th ese and the
v alu es o f x, y, an d 2 from one o f the n ear p la n e ’s v ertices in to the eq u a tio n , th e value
o f D np is c o m p u te d and is stored.

- In d ices o f the n o d e ’s vertices c astin g m ax im u m p ro jec tio n on n ear p la n e n o rm al - In


5 .5 Tree T raversal 116

order to find if a node is in front of the near plane, it is more efficient to find the two
vertices of the node that casts the maximum projection onto the normal of the near
plane, as described in Section 5.5.1. The same two vertices of every node casts this
maximum projection. The indices of these vertices, in d e x np and in d e x np , are found
and saved.

- Indices of node’s vertices casting maximum projection on rows 0 and 3 of matrix - As


will be shown in the following sections, finding the node projection’s overestimate is
an essential part of Row Tracing. To find this with minimal cost, row 0 and 3 of the
global transform matrix are considered as vectors and the node’s vertices at the end
point of the diagonal casting the maximum projection onto these rows are determined
as given by Section 5.5.1. Indices of these vertices are stored.

• Per row constants - The attributes that vary for every node need to be initialised at the
beginning of that row’s traversal. The attributes initialised at this time are:

- Row Plane Equation - This is a plane of rays with the viewpoint and the two end
points of the row defining the plane. The plane equation is determined by calculating
the normal that provides the corresponding A r , B r , C r values and then calculating
the value of D r for the row.

- Indices of node’s vertices casting maximum projection on row plane normal - As


an intersection is to be calculated for each node and the row during traversal, the
indices of the diagonal casting the maximum projection onto the row plane’s normal
is determined and stored.

5.5.3 Tree Traversal A lgorith m

The various variables corresponding to a row are pre-computed by the initialisation process so that
the tree can be efficiently traversed in a front-to-back order. Traversal is similar to ray traversal
- it starts from the root node of the tree and continues in a front-to-back order. It can be written
with the following pseudocode.
TraverseTree(row, node)
(

if (node is empty or
node does not intersect row)
return;
proj = FindNodeProjection() ;
if (proj is outside frustum or
proj is occluded)
return;
i f (node is a leaf node)
(

ProcessLeafNode (node);
return;
)

f o r (each childNode sorted in


front-to-back order)
TraverseTree(row, childNode);

Listing 5.3: Single Row traversal algorithm.


5 . 5 Tree T raversal 117

From the pseudocode, it can be observed that traversal stops if the node does not intersect the row,
or if the node projection falls outside the frustum or if the node projection is determined as being
occluded. Otherwise, traversal continues down the tree until it reaches a node that meets the exit
criteria, or until a leaf node is reached. At the leaf node, the triangles contained by the leaf node
are rasterised onto the pixels of the row. At the end of the traversal process, the closest triangle
corresponding to each pixel of the row is determined.

As mentioned, the traversal process tests the node for several criteria. In order to perform these
tests as efficiently as possible, fast methods for row-plane-node intersection and node projection
are determined. These methods are described in the following sections. Before that however, to
better explain the node projection operation, the well known process of projecting a point onto the
screen / row in the context of Row Tracing is provided.

5.5 .4 P rojection o f a P oin t on to the R ow

Projecting a point onto the row - i.e, converting the point from the object space to the image space
to find the actual pixel occupied by the point on the row - is achieved by multiplying the point by
the transformation matrix and converting these co-ordinates from OpenGL space to screen space.
However, in Row Tracing, since the Y coordinate is fixed (by the row being traversed), only the
X and Z coordinates are necessary.

If p is the point to be projected, the value of A" and Z coordinates can thus be found using

x = p .m 0
2 = p. Ill 2

W = p .m3

x = (x/w) * half Width + half Width


z = ( z / w ) * h a l f Wi dt h (5.2)

where m o, m 2 and m 3 - are the first, third and fourth row of the global transformation matrix
respectively h a l f W i d t h - is half of the image’s width x, z, w - are the values of the X coordinate,
the Z coordinate and the homogenous coordinate respectively The values of x and 2 thus found
are the coordinates of p on the screen.

5.5.5 K d -tree N od e P rojection

The naive and most obvious method of finding a node’s projection is to project the eight vertices
onto the row. Projecting a point was described by Equations 5.2. For conducting the node oc­
clusion test, only the X coordinate is necessary. Hence, the naive projection calculation can be
shown as:

x\ = v 1 mo
x2 = v 2.m 0
5 .5 Tree Traversal 118

x8 = v 8 .m 0

W\ = V 2.1113
V ’2 = V2 m 3

Wg = v 8.m 3

X\ = (x 1/ w\ ) * h a l f Wi d th + h a l f W i dt h
X'2 = (X2/W2) * h a l f W i d t h + h a l f W i d t h

xs = (xg/w$) * hal f Wi dt h + h a l f Wi dt h

x-min = min(xi,X2, ^s)


X fnax = n i a a ( x i . X o < ■■, x x )

(5.3)

V] ,.v 8 are the eig h t v ertices o f the node.


are the Ar c o o rd in ate s o f the eig h t v ertices o f the node.
w \..w x are the h o m o g e n o u s c o o rd in a tes o f the eig h t v ertices o f the node.
x mm and x'max are the e x trem eties o f the actu a l p ro je ctio n o f the node,
m o and m 3 are the h rst row and the fo u rth row o f the g lobal tra n sfo rm a tio n m atrix.

H ow ever, as E q u atio n s 5.3 show , this m eth o d is co m p u tatio n ally ex o rb ita n t, req u irin g 16 dot p ro d ­
ucts, eig h t d iv id e s and several fu rth e r o p eratio n s. It can be o b serv e d that to p erfo rm an o cclu sio n
test o r a fru stu m b o u n d s test, the e x act node p ro jectio n is not necessary. A s show n in F igure 5.3,
a slig h t o v erestim a te o f the n o d e 's p ro jectio n w o rk s alm ost as w ell. U sin g the o v erestim ate re ­
sults in a few m ore trav ersals than o th e rw ise , but if the o v erestim ate is sm all en o u g h , the tra d e-o ff
resu lts in h u g e ly re d u ced cost fo r traversal. T h e m eth o d to co m p u te an o v erestim ate is d iscu ssed .

We first define ,r j, .r2, u 'i and u'2 as follow s:

x , - m i n ( \ 2 . m 0l v 2 . m o . ... v 8 .n io )
x -2 = m u x ( v i .n i y . v 2 . n i o . ... v 8 . n i o )
u 'i = m m ( v 1 .m 3 , v 2 .m 3 . . . , v 8 .m 3 )
u '2 = m a x ( v 2 . m 3 , v 2 . m 3 . ... v 8 . m 3 ) (5.4)

vx and v x c an be defined as the v ertices ca stin g the m ax im u m p ro je ctio n s on m o and can be


d eterm in ed using the m eth o d given in S ectio n 5.5.1. H en ce. v x .n io and v x .n io pro v id e values
o f .1] and .r2. S im ilarly. v w and v w can be d ete rm in e d as the v ertices o f the node c a stin g
the m ax im u m p ro je ctio n o n to m 3 . U sin g these, the m in im u m a n d m ax im u m values o f x can be
d eterm in ed as follow s:
5.5 Tree Traversal 119

Node

Node vertex projections

Image row

Accurate Projection

Projection overestimate —•

Viewpoint

Figure 5.3: N ode P ro je ctio n o n to a Row. x imn and x max in d ic ate the n o d e ’s o v e restim ate . If the
ov erestim ate is o c clu d e d , then the actu a l p ro jec tio n is also o cc lu d e d . H en ce, this o v e re stim a te
ca n be u se d fo r o c clu sio n d e tec tio n .

x\ = vx m 0
x2 = v x .m 0
w\ = l / ( v w m 3)
w2 = l / ( v w .1113)
( 5 .5 )

Since, w e are u n c le a r if x \ > x 2 an d if w \ > w 2. and also since x \ , x 2, w \ and w 2 m ay b e n eg ativ e


o r less than 1, w e ca lc u la te the m in im u m and m ax im u m v alu es of x as follow s:

Xrnin = a l l ' l l ( x ] W] , X \ W 2 , X 2 W \ , X 2 W 2 )

Xmax = m a x ( x 1 Wi , X\W2. X2 W1 , X2w2)


x mi n — x min * h a l f Width- + h a l f W id th
Xmax = x max * h a l f W i d t h + h a l f W i d t h ( 5 .6 )

w here x nvin and x mux re p resen t the ex tre m itie s o f the node p ro je c tio n ’s o v erestim ate.
5.5 Tree T raversal 120

x rnm and x max are used to find th e level and the ex act bits n e ed ed to ch eck the H O M fo r o cclu sio n .
T he details o f using th e H O M to detect o c clu sio n is p ro v id ed in S ectio n 5.8. A lso, x mrn and x max
are used to test the n ode fo r fru stu m inclusion. W h en x min and x mnx do not sp an any pixel
b etw een zero and the w id th o f th e im ag e, the node is o u tsid e th e fru stu m .

5.5.6 Octree Node Projection

T h e m ethod used to find the p ro je ctio n o f an octree n ode o n to the row is essen tially the sam e
as that d escrib ed fo r the k d -tree. H ow ever, the p ro p erty o f an octree that every n o d e at a given
d ep th is a cube o f the sam e size can be used to o p tim ise the vertex d e te rm in a tio n calc u latio n . A s
F igure 5.4 show s, if the m id -p o in t o f the o ctree n ode is know n, th en each o f its v ertices can be
fou n d by adding (o r su b tra ctin g ) a v ecto r to the m id p o in t o f th e o ctree. To use th is p roperty, the
unit vectors to be ad d ed to the m id p o in t to o btain the n e cessary v ertices are p re -c o m p u te d b efore
the tree traversal. T h e h a lf len g th s o f a n o d e ’s d iag o n al at every d ep th o f the o ctree are also p re ­
com p u ted . At traversal tim e, u sing the unit v ecto r and the h a lf len g th , the c o rre sp o n d in g v ertices
are easily foun d by v ecto r ad d itio n s. W h en S1MD / S S E code is u sed , a v ecto r a d d itio n is ach iev ed
in one instructio n and is c h e a p e r co m p a red to the ex p en siv e in d ex in g o p eratio n s p erfo rm e d by the
k d -tree version.

F igure 5.4: C a lc u la tin g the o ctree n o d e ’s v ertices, o is the m id -p o in t o f th e octree, d i and d 2


are v ecto rs to be ad d ed to o to ob tain the vertices.

5.5.7 R ow -K d-tree Node Intersection

A ro w -k d -tre e node in te rse c tio n is co m p u ted at ev ery trav ersal step and is one o f the m o st fre ­
q u en tly p erfo rm ed o p e ra tio n s o f the alg o rith m . H en ce, it is im p o rtan t th at it is c o m p u ta tio n a lly as
ch eap as possible.

If all eight v ertices o f a k d -tree node lie on the sam e side o f th e row p lan e, the node does not
in tersect the row plane. U sin g the m eth o d in [H o f9 6 ], in te rse c tio n can be d eterm in ed by c o m ­
puting the signed d ista n c e s o f ju s t tw o vertices (end p o in ts o f a d iag o n al th a t casts the m ax im u m
projection onto the row p la n e ’s n o rm al). T he sam e d iag o n al o f every node casts this m ax im u m
p ro jectio n o nto the row. H en ce, the tw o a p p ro p riate v ertices can be p re -d c te rm in ed at the b e g in ­
ning o f traversal o f each ro w - in the p e r ro w in itialisatio n p ro c e ss. T h is re d u c es the n u m b er o f
o p eratio n s significantly. F ig u re 5.5 sh o w s a d iag ram th at assists in u n d e rstan d in g this better.
5.6 L e a f N ode P ro cessin g 121

Node vertices
to be tested y

Row node
.intersection'

Row plane

Row being traced

Image plane

Viewpoint

Figure 5.5: R o w -N o d e in tersec tio n . If the tw o ex trem e v ertices o f the n ode w ith respect to the
row p lan e's n o rm al (sh o w n in b lu e) are on o p p o site sides o f the row p lan e, then it in tersects the
node.

U sin g the eq u atio n o f the ro w -p lan e, A r X + B , Y + C r Z + D r - 0, the sig n ed d istan ce is


calcu lated by su b stitu tin g the A \ V' and Z v alu es o f the v ertex into the eq u atio n . If d \ and d 2 are
the signed d istan ces o f the tw o v ertices, then there is an in tersectio n o nly if s ig n (d \ )! - sig n a l}).
T he operation is very c h e a p and can be ach ie v ed using ju s t tw o dot p ro d u cts - efficiently c a lc u late d
w ith SSE instru ctio n s.

5.5.8 Row -O ctree Node Intersection

A m ethod that is sim ila r to the R o w -k d -tre e node in tersec tio n (S ectio n 5 .5 .7 ) is used to find the
intersection betw een the octree n ode and the row plan e. H ow ever, as an o p tim isa tio n , the p ro p e rty
o f o ctrees d escrib ed in S ectio n 5 .5 .6 and as show n by F ig u re 5.4 is again used. U sin g th is, the tw o
v ertices are calc u lated u sin g ju s t a v ecto r a d d itio n each in stead o f an ex p en siv e in d ex in g o p eratio n .

If a node is d eterm in ed as in tersec tin g the row. trav ersal c o n tin u e s dow n the tree until a le a f n ode
is reach ed , unless a n o th e r exit criteria is e n co u n te red .

5.6 Leaf Node Processing

At the leaf node, the tria n g le s in it can be a p art o f the final im age. D eterm in in g the ap p ro p ria te
parts o f these trian g les is im p le m e n te d by the fu n c tio n ProcessLeafNode show n below .
5.6 L e a f N ode P ro cessin g 122

ProcessLeafNode(node)
I
f o r (each Triangle t in node)
{
1 = Intersect (t, rowPlane);
1 = ClipWithBoundingBox (1);
p = ProjectOntoImageLine(1);
pList.add(p);
)

extent = MinMax(pList);
RecursiveRender(extent, pList);
}

L istin g 5.4: P ro cess the le a f node. W h en the le a f node is re a c h e d d u rin g the traversal p ro cess, the
triangles in it have to be teste d ag ain st the ro w to d e te rm in e w h ic h o f th e m c o n trib u te to the
p ix els in the row.

A s the pseud o co d e d e sc rib e s, p ro ce ssin g the le a f node in v o lv es a few step s th at w arran t fu rth er
discussion.

5.6.1 R ow-Triangles Intersection

Pi

R o w -p la n e -T r ia n g le
in te r se ctio n s e g m e n t

R o w -p la n e
PinM
Pmt2

Figure 5.6: R o w -p la n e -tria n g le in tersec tio n . If the v ertices o f the trian g le lie on o p p o site sides
o f th e row' p lan e, th ere is an in tersec tio n .

T he intersectio n b e tw e e n a row and a tria n g le is e ssen tia lly a p la n e -tria n g le in tersec tio n . It is
easily com puted using the e q u atio n o f the row plane. T he c a lc u la tio n s are b e tte r d e scrib ed by the
eq u atio n s below and F ig u re 5.6. In the e q u a tio n s below , the v aria b le s (like p \ , p-2 , etc) co rre sp o n d
to the g eo m etric en tities sh o w n in F ig u re 5.6.
5 .6 L e a f N ode P rocessing 123

row E q u a tio n = > A x + B y + C z + 1) = 0


d\ = A x ] T B x \ -\~ C Z \ -1- D
d ‘2 — A x 2 3" B x 2 C z2 D
d'j = 2 I.T3 T- B x $ + C 23 + Z2 (5.7)

If (Zi, d2 and ^ 3 have the same sign, it implies that the entire triangle is on one side of the row plane
and hence does not intersect the plane. However, if any one distance is differently signed than the
other two, then there are parts of the triangle on both sides of the row plane and henee there is an
intersection. If there is an intersection, the end points, pint and pint , of the intersection segment
are given by:

dint = d i /( d i + d2)
dint = d \/{ d \ + ^ 3 )
Pint = P i + dint (P 2 - P i )
Pint = P i + dint (P 3 - P i ) (5 .8 )

Intersect R ow -plane with leaf n od e triangles

L ea f n o d e

R ow p lan e

R ow b ein g traced

Im age p lan e

V iewpoint

F igure 5.7: R o w -le a f-n o d e -tria n g le s in te rse c tio n . In tersectin g the ro w -p la n e w ith the trian g le s
in a n ode gives several line seg m en ts.

T he intersectio n is found fo r ev ery trian g le in the n ode to ob tain a list o f line seg m en ts. T h e list is
m ain tain ed as a list o f start and end p o in ts o f th e seg m en ts. If the list is not em pty, th e a lg o rith m
p ro ceed s to the next step. F ig u re 5.7 sh o w s th e in tersec tio n se g m en ts fo rm ed by in te rse c tin g a
row w ith all the trian g les in a le a f node.
5.6 L e a f N ode P ro cessin g 124

5.6.2 S egm en ts P artially in F ron t

The algorithm should render only parts of the geometry that are in front of the viewpoint / near
plane. Relevant parts of the triangles can be on both sides of the near plane only if the node
consisting them lies on both sides of the near plane. This is easily determined by finding the
signed distances of the two corresponding pre-determined vertices of the node to the near plane.
If the signed distances of these points have different signs, the node lies on either side of the near
plane and can contain triangles partially on both sides. Only triangle segments in these nodes are
to be tested for being partly in front of the near plane.

Segments that are fully in front of the near plane are rendered directly without any additional
processing. Similarly, segments that are fully behind the near plane are discarded as they are not
part of the final image. However, for the few triangles that have parts in front of and behind the
near plane, additional processing is necessary to ensure that only parts that are in front of the near
plane are rendered.

Testing whether a triangle lies on both sides of the near plane essentially reduces to a problem of
intersection between the near plane and intersection segment. If it is determined that a segment
intersects the near plane, the segment is clipped by the near plane and the part that is in front is
selected for rendering. If the segment is fully in front or fully behind, then they are rendered as it
is or discarded respectively.

At the end of this step, a modified list of segments, containing only the clipped parts that are fully
in front of the near plane, is generated.

5.6.3 C lipp ing In tersectio n S egm en ts

In a leaf node, parts of the triangle can be outside the node. This can result in intersection segments
that may be partly or completely outside the leaf node as seen in the example in Figure 5.7. If
these segments are rendered as they are, the resulting image would be incorrect. To ensure that
the image consists of the right parts of the right triangles, these segments are clipped against the
leaf node’s bounding box so that only parts that are inside the node are rendered.

The operation is achieved by using the line equation of the segment and by intersecting the line
with the three entry faces and three exit faces of the node. The set of equations below detail the
process.

tx = {bb[xentry\ — )/{ p 2 - P \ )

tx = {bb[xexit] - p i )/{ p 2 - p i )
t y
= (bb[yentry\ - Pi ) / (P2 ~ P \ )
t y = (bb[yexit\ - p i ) /( p 2 ~ p \ )
tz = {bb[zentry\ - pi ) /( p 2 - p i )
tz = (bb[zexit\ - p i )/{ p 2 - p i )
5.6 L e a f N ode P ro cessin g 125

t entry — n i a x ( t :r 1t y ,tz )
texit = min(tj: 1t y ,tz )
P in t = P i + t,z n t r y (P2 - P i )
P in t = P i + Lexit (P 2 — P i ) (5.10)

ty , tz , tz - arc the c lip p ed e n try and exit p a ram eters o f the


in tersec tio n segm ent.
bb - is the array re p re se n tin g the h o u n d in g box o f the node. It c o n sists o f 6 values - the m in im u m
and m axim um po in t a lo n g each axis.
P int » Pint ~ are the ex tre m ities o f the clip p ed seg m en t.

Clip intersection segments

Leaf node

R ow p la n e

R ow b ein g tra c e d

Im a g e p la n e

V iew point

F igure 5.8: In tersec tio n se g m e n t c lip p in g . In tersectio n line seg m en ts n eed to be clip p ed to
en su re th at o n ly parts o f tria n g le inside the n ode are co n sid ered .

A ll segm ents are clip p ed ag ain st the b o u n d in g p lan es o f the node an d o nly parts o f these seg m en ts
that are fully c o n tain ed by the node are retain ed . T h e o th e r p arts are d iscard ed . T he list o f
seg m en ts is m odified acco rd in g ly . F ig u re 5.8 show s the in tersec tio n seg m en ts clip p ed by the
b o u n d in g box o f the node.

5 .6.4 P rojection o f C lipp ed S egm en ts

T he list o f clip p ed seg m en ts co n ta in s the p arts o f the le a f n ode tria n g les that can be a p art o f
the final im age. By p ro jectin g each c lip p ed seg m en t o n to the row and d e te rm in in g the co rrect
v isibility, the trian g les o cc u p y in g each affected pixel is fo und. A s sh o w n by F igu re 5.9, each
seg m e n t's tw o end po in ts are p ro je c te d o n to the row u sin g the m eth o d d e scrib ed in S ection 5.5 .4
to obtain the req u ired p ro jectio n .
5.6 L e a f N ode P rocessing 126

O n ce the X and Z c o o rd in a te s o f the tw o end p o in ts o f the seg m en t are fo u n d , any point on the
line segm ent can be fo u n d using lin ea r in te rp o la tio n . T he X c o o rd in a te is c alc u lated by lin e a r
in terp o latio n . F o r p e rsp ectiv e p ro je ctio n s, the Z c o o rd in a te is c o m p u ted by linearly in te rp o la tin g
1 / Z . To sim plify the co m p u ta tio n , the relatio n o f 1 / Z w ith re sp e c t to the X co o rd in a te is fo u n d
as follow s.

b = 1 /Z ,

a = ( Z 1 - Z 2) / ( Z i * Z 2 * ( X 2 - * i ) )
1 /Z * = aX + b (5 .1 1 )

w h ere a and b - are the n ecessary coefficien ts.


X ] , Z i, Ar2, Z 2 - are Ar and Z c o o rd in a te s o f the tw o e n d p o in ts o f the c lip p ed in tersec tio n
segm ent.
1 j Z x - is the c o rre sp o n d in g 1 / Z value at any X .

T he values o f a , b and 1/ Z are calc u la te d fo r each segm ent. T h ese values are sto red in a list fo r
use by later parts o f the alg o rith m .

Project clipped segments

Leaf node

Row plane

Row being traced

Image plane

Viewpoint

F igure 5.9: C lip p ed se g m en ts p ro jectio n . T he c lip p ed in te rse c tio n se g m en ts are p ro jected o n to


the row to o b tain the pix els affected by the trian g les in the node.

5.6.5 R asterisin g the Segm ents

T he X , a and b values are used to ra sterise the se g m en ts - i.e., to d ete rm in e and shad e the p ix els
affected by the seg m en t. A n im p o rta n t co n sid e ra tio n is to attrib u te the right trian g le to the pixel
by ensuring accu rate v isib ility if several trian g le s p ro ject o n to it.

H O M s are used to d eterm in e w h eth e r pixels u n d e r c o n sid eratio n have a lread y been rasterised and
5 .7 Final Im age G en eration 111

are hence occluded by triangles in other leaf nodes. By recursively subdividing the extent of the
node’s triangles and testing the subdivided extent against the HOMs, the visibility of each subdi­
vided extent is determined. The subdivision and occlusion tests are continued until the subdivided
extent is contained fully by an eight pixel group determined by the HOM. Subdivision is stopped
at this point as it is more efficient to use SIMD instructions for further processing. Visibility
determination using HOMs will be discussed in detail in Section 5.8.4.

At the end of subdivision, the intra-node visibility in addition to the inter-node visibility is under­
taken. Inter-node visibility order - the visibility order of triangles in different nodes - is deter­
mined using the lower levels of the HOM. However, intra-node visibility - accurate visibility of
triangles within the node - is determined by using a Z-buffer like algorithm. For the eight pixels,
the maximum values of 1j Z coordinates is maintained. When a clipped line segment projects
onto a pixel, the value of l / Z of this line segment is compared against the existing value. If it
is greater, then the pixel’s existing triangle is replaced by this triangle and the Z-buffer’s value is
updated to reflect this.

An observation is that if a leaf node has triangles that do not overlap, intra-node visibility is not
necessary and hence the Z-buffer algorithm is also not necessary. However, to simplify the im­
plementation, this is implemented only if a leaf node has a single triangle or two non-overlapping
triangles.

5.7 Final Image Generation

The process so far details the method to find the corresponding triangle for each pixel on a partic­
ular row. Once the triangle has been determined for the pixel, shading can be computed using one
of the popular methods. Since visibility is the main concern, the shading used by Row Tracing is
a simplified method that aims to emphasise speed.

5.7.1 S im plified S h ad in g for R ow T racing

The simplified shading used for Ro\\> Tracing is an adaptation of the method used in [MJCOO]
[MJC02] that uses normal coding [Gla90]. Directions are quantised and the normals of the trian­
gles are then classified into one of the quantised directions for which shading is calculated.

The advantage of this process is that for a viewpoint, there are a fixed number of quantised di­
rections for which shading can be pre-computed and stored in an array prior to tree traversal. For
Row Tracing, a quantised set of 65535 directions are used. Once the tree traversal is completed, a
pixel’s shading is found by just an array indexing operation.

For every pixel in each row, if a triangle projects onto it, shading is computed and the pixel’s
colour is set accordingly. When all the rows in the image are similarly shaded after completing
the tree traversal and leaf node processing, all the pixels in the image are appropriately shaded to
obtain the final image.

The algorithm can be used without using HOMs, in which case visibility is always determined by
the Z-buffer algorithm. However, the use of HOMs is an important optimisation that accelerates
the rendering process significantly by incorporating an occlusion detection method - one that
mimics the early ray termination used by ray tracers - into Row Tracing.
5 .8 ID H iera rch ica l O cclu sio n M a p s ( H OMs ) 128

5.8 ID Hierarchical Occlusion Maps (HOMs)

H ierarch ical O cclu sio n M ap s (H O M s), as in tro d u ced by Z h an g et al. [Z M H H 9 7 ], is a v ery efficient
m eth o d to d eterm in e o cclu d e d areas o f an im ag e. W h en c o m b in e d w ith a fro n t-to -b a c k traversal
o f a structure like a k d -tree, the areas th at have a lread y b een ren d e re d o c clu d e o th e r trian g les
p ro jectin g o nto th ese pix els. H O M s w ere also used in C o h eren t R e n d e rin g d isc u sse d in ch a p te r 4.
T h ey are adap ted fo r use w ith R o w T racing - the ID n atu re m a k in g H O M s sim p ler and m ore
efficient.

T h ro u g h the use o f ID H O M s, it is p o ssib le to m im ic the e arly ray te rm in a tio n feature o f ray


tracin g w hereb y o n ce a ray is d e term in e d to have hit an o b je ct, it can sto p the traversal. H ow ever,
due to the fact th at a ro w e n c a p su late s a large n u m b er o f ray s, th e early ray te rm in a tio n pro p erty
can n o t be used d irectly. H O M s are one m eth o d to ad ap t th is p ro p erty o f ray tracin g to R o w
Tracing.

T he H O M is an array o f bits w ith ea ch bit in d icatin g if the p art o f th e row c o rre sp o n d in g to that
bit is occluded. T h e H O M is a h ie ra rc h ic a l stru ctu re. E ach bit at the lo w est level o f the H O M
co rresp o n d s to a pixel in the row being traced . F ig u re 5.1 0 sh o w s a H O M in its in itial state w hen
the entire row is u n o cclu d e d . In the figure, the row is in d icated by the c o lo u re d line w ith each red
or green part rep rese n tin g a pixel. A bit at each h ig h e r level c o m b in e s th e o cclu sio n statu s from
the tw o co rresp o n d in g bits at the level im m ed iately belo w it - i.e., if the tw o co rre sp o n d in g b its
are set. then the u p p er level bit is also set, o th erw ise the bit is n o t set. E ssen tially , the u p p er level
bit is a bitw ise a n d o f the tw o c o rre sp o n d in g low er b its. T h e bits b elo w are c o m b in e d to o b tain the
h ig h er level bits an d w hen the hig h est level bit is set. it in d ica tes th at the e n tire row is o cclu d ed .

To rep resen t a H O M w ith m in im al w astag e, an a rray o f chars is used in the im p lem en tatio n as
the structure for ID H O M s. As a char is just eight b its, this re d u c e s any w a sta g e that m ay o c c u r
by using a b ig g er d a ta type. F o r an im ag e that is 1024 pixels w id e, the lo w est level o f the H O M
uses ju s t 128 chars. T he u p p er levels use 64, 32... 1 chars allo w in g the H O M for the row to be
rep resen ted by ju s t 256 chars - a n eg lig ib le am o u n t o f m em ory. T he red u c e d m em o ry req u irem en t
for the H O M also im p ro v es the c a ch e co h eren ce.

P rio r to traversal o f a row. the en tire row is em pty, i.e.. n one o f its p ixels are ren d ered . At th is
point, the H O M s - that at all p o in ts in d icate the o c c lu sio n state o f th e row - is in itialised so that
all bits are zero, as F ig u re 5 .1 0 show s.

HOM level

16 pixels / bit 0 4

8 pixels / bit 0 I 0 3

4 pixels / bit 0 0 0 0 2

2 pixels / bit 0 0 0 0 0 0 0 1
0
o
o
o
o

o
o
o

1 pixel / bit 0 0 0 0 0 0 0 0

F ig u re 5.10: Initial state o f the H O M .

T he use o f H O M s involve - u p d atin g it w h en a trian g le is ra sterise d so that it rep re sen ts the cu rren t
o cclusion state o f the row at all tim es, testin g it d u rin g tree trav ersal tim e. T h e H O M s are also
5 .8 I D H iera rch ica l O cclu sio n M a p s ( H O Ms ) 129

HOM level

16 pixels / bit 0 4

8 pixels / bit 0 0 3

4 pixels / bit 0 I 0 0 0 2

2 pixels / bit oT o o T oT o 0 0 0 1

o
o
o
o
o
o
1 pixel / bit 0 0 0 0 1 11 1 0 0 0 0

Row

(a) P ixels and H O M bits updated (in blue). If the blue part on the row is rasterised,
then the first lev el H O M is first updated.

HOM level

16 pixels / bit 0 4

8 pixels / bit 0 0 3

4 pixels / bit 0 0 0 0 2

2 pixels / bit 0 0 0 0 0 1 0 0
o

o
o

o
o
1 pixel / bit 0 0 0 0 0 0 1 1 1 0

Row
(b) U pper level bit updated (in blue)

F igure 5.11: U p d atin g the H O M .

ch ecked w hen trian g le seg m e n ts are rasterised .

5.8.1 H O M U pd ate

T he H O M need s to be u p d ated w hen tria n g les in the le a f n ode are ra ste rise d so that the H O M s
indicate the cu rren t o cc lu sio n state o f the p arts o f the row. A t the le a f node, w h en a trian g le is
determ ined ov erlap a p a rt o f the row, the bits c o rre sp o n d in g to the affected pix els o f the row are
set to one. T hese bits in d icate th at any fu rth e r attem p ts to re n d e r a n o th e r trian g le o n to these pix els
should fail. E ach bit at the lo w est level and its b in ary a d ja c e n t bit are ancled to o b tain the u p p er
level bit. A sim ila r o p eratio n is u n d ertak en at ev ery c o n se cu tiv e level until the u p p er level bit is
zero or until the h ig h est level bit is reach ed . F ig u re 5.11 sh o w s the p ro cess.

E ach H O M co n sists o f a series o f chars m ak in g it p refe ra b le to set an d u p d ate the H O M u p w ard s


in groups o f 8 bits o r I char at a tim e. A cco rd in g ly, d u rin g the re n d erin g p art (S ectio n 5.6 .5 ), the
up d ate is p e rfo rm ed w h en the pixel w id th is eig h t p ix els c o rre sp o n d in g to th e 8 bits in a char o f
the H O M at the low est level. H en ce, the state o f these 8 p ix els is c o m b in e d into a char in w hich
a bit w ith a value o f 1 in d icates that the c o rre sp o n d in g pixel has b een ren d ered and a bit w ith a
value o f 0 indicates th at it has not yet been ren d ered . T h e char th u s c re a te d is b itw ise a n d e d w ith
the co rresp o n d in g H O M char’s cu rre n t value to get the new o cc lu sio n statu s fo r th ese p ixels.

F in d in g the u p p er level bits is slig h tly c o m p lic ate d by the lack o f sim p le h o rizo n tal b itw ise a n d
m ethods. T he m o st o b v io u s m eth o d is to m ake a copy o f the char, u se b itw ise sh ifts on the copy,
5 .8 I D H iera rch ica l O cclu sio n M a p s ( H O Ms ) 130

and finally b itw ise a n d the o rig in al char and the sh ifted c o p y as sh o w n in F ig u re 5.12. T h is
pro cess is exp en siv e - esp ec ia lly as this o ccu rs d u rin g the re n d e rin g p ro cess. To o p tim ise this,
for each o f the 256 v alu es that the char can take, the h o riz o n ta lly b itw ise a tid ed values are p re ­
co m p u ted and stored. T h is sim p lifies the ex p en siv e co m b in a tio n o p eratio n to one o f indexing an
array - a relativ ely less ex p en siv e o p eratio n .

1 1 0 0 1 0 1 1 1 1 0 0 1 0 1 1 —►a

» 1 0 1 1 0 0 1 0 1 — b

□ 0 0 0 0 1 0 0 0 0 0 1

□ 0 0 0
HOM - obtaining the higher HOM - obtaining the higher
level by 'horizontal and' level using
'shift right and
bitwise and'

F ig u re 5.12: H O M u p d ate - c o m b in in g bits h o rizo n tally .

T he HOM thus u p d ated w ill alw ay s reflect the cu rren t o cc lu sio n state o f the row being co n sid ered .
It is used to d eterm in e if su b seq u en t re n d erin g a ttem p ts c o rre sp o n d to alread y ren d ered parts o f
the row.

5.8.2 O cclu sion Testing usin g H O M s

T he real value o f H O M s is realise d at traversal tim e, w h en they en ab le d etec tio n o f o cclu d ed p arts
o f the row to allow large parts o f the tree to be sk ipp ed . G iv en a seg m en t o f p ix els - a startin g
pixel and an en d in g pixel - the o c c lu sio n testin g p ro cess uses the H O M s to d eterm in e w h e th e r this
segm ent is o cc lu d e d o r not. To c o n d u ct the test, the ex act level and bits o f the H O M that hold this
inform ation is needed. If the bits thus d ete rm in e d are all set. then the span o f p ixels are o cclu d e d .
O therw ise, the span o f p ix els are not o c clu d e d and p ro c e ssin g w o u ld have to co n tin u e further. T he
pro cess is show n in F ig u re 5.13

T he m ost ch alle n g in g p art o f the o cc lu sio n test is to d e te rm in e the n ecessary level and bits o f th e
HOM .

D eterm ining the level - T he level o f the H O M to be ch e ck e d d e p e n d s on the length o f the pixel
span lo be tested. T h e level is o b tain ed by tak in g the lo g -2 o f th is len g th and ro u n d in g up to the next
integer. T he lo g -2 o f a floating point n u m b er is easily fo u n d - it is th e e x p o n en t part in the IE E E
floating point rep re sen ta tio n . S in ce the ex p o n en t p ro v id es the lo g 2 o f a p o w er o f 2 lo w er than the
float in co n sid eratio n , it is in cre m en ted to get a pow 'er o f 2 g re a te r than the float. F or pixel
lengths that are exact p o w ers o f tw o the ex p o n en t is accu rate. A H O M bit at this level in d icates
the occlusion statu s o f pixels. It is p o ssib le that the pixel ex ten t m ay sp an e ith er one o r tw o
bits at this level.
5 .8 I D H iera rch ica l O cclu sio n M a p s { HOMs ) 131

HOM level

16 pixels / bit 4
0
8 pixels / bit 0 1 3
HOM level
4 pixels / bit 0 0 m m m m m 2 and bits to
check
2 pixels / bit 1 0 1 0 1 1 1 1 1

1 pixel / bit 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0

Row

Length = 3 pixels
Log2(3) = 1.58
Level of HOM to check = 2

F ig u re 5.13: O cc lu sio n testin g . II' the b lue part on the row is to be tested fo r o cc lu sio n , 2 bits
(sh o w n in b lu e) at level 2 o f the H O M is tested .

D eterm ining the bits of the HOM at the level - At the level o f the H O M . d ete rm in in g the
c o rresp o n d in g bits o f the H O M is d o n e using the actual pixel v alu es o f the start and e n d p ix els o f
the pixel extent. T h e resp ectiv e pixel values are d iv id ed by 2,evtl - ach ie v ed easily by sh iftin g the
pix el value by le v e l bits to the right - to get the b it(s) to be tested . T h is m ay be o n e o r tw o bits
d ep en d in g on the start an d en d pixels.

U sing the level and bits th u s co m p u ted , o cc lu sio n is e a sily d e te rm in e d by testin g if the bits are set.
I f they are set. the pixel ex ten t is o c clu d e d an d h en ce p ro cessin g can stop.

F ig u re 5.14: F in al H O M on the D ra g o n m o d el. O ran g e lin e re p re se n ts the actu al pixel. E ach


h o rizo n tal line above the o ran g e line is a level o f the H O M . In the H O M , each single c o lo u red
blo ck in d icates a bit o f the H O M at that level.

T he o cclu sio n testin g is used d u rin g the tree trav ersal to test and sk ip o c c lu d e d no d es. It is also
u sed in the raste risatio n step s to d ete rm in e accu rate v isib ility o f tria n g le s and to skip pixel ex ten ts
that m aybe o cclu d ed .
5 .8 I D H iera rc h ica l O cclu sio n M a p s ( H OMs ) 132

5.8.3 N o d e O cclusion Testing

C h e c k in g the H O M is the sim p le r part o f the node o c c lu sio n test. T he m ore c o m p lex p art is to
d e te rm in e an efficient m eth o d to find the p ix els th at the n o d e o cc u p y w h en p ro jected o n to the
row. A s d etailed in S e ctio n s 5 .5 .5 and 5 .5 .6 a n d F ig u re 5.3, fin d in g an o v ere stim ate is p referred
to finding the exact p ro jectio n due to the co m p u ta tio n a l co st. A lth o u g h , it w o u ld resu lt in a
few false negativ es - i.e., a few nodes m ay be d e te rm in e d as u n o c clu d e d w hen in fact they are
o cclu d e d , u sing the o v erestim a te is p re ferred to the m ore ex p e n siv e m eth o d o f finding the ex act
n ode p ro jectio n .

5.8.4 R ecursive R asterisation o f L e a f N ode T riangles

Leaf node

Row plane

Row being traced

Use for I
~occlusion —
Image plane

Viewpoint

F igure 5.15: L e a f n ode in tersec tio n seg m en ts - C lip p ed and p ro je cte d o n to the row. T h ese
p ro jectio n s on the im age row are tested fo r o cc lu sio n ag ain st the H O M s to d eterm in e in ter-n o d e
visibility.

In S ectio n 5.6.5 , it w as stated th a t a recu rsiv e p ro cess is fo llo w ed to rasterise the trian g le seg m en ts.
H O M s are used even in this recu rsiv e p ro cess to efficien tly d e term in e in ter-n o d e visibility. T he
pixel ex ten t to be tested fo r o cclu sio n is given by th e m in im u m and m ax im u m p ro jected p o in ts
alo n g the row as show n in F ig u re 5.15. If th is e x ten t is o c c lu d e d , as d ete rm in e d by ch e c k in g the
H O M , th en rasterisa tio n c an stop. H ow ever, if the ex ten t is n o t o cc lu d e d , it is su b d iv id ed in to tw o
su b -ex ten ts - each sp a n n in g h a lf o f the o rig in al ex ten t - th at are te ste d fo r o cclu sio n . T h e p ro cess
is c o n tin u ed until e ith e r th e su b -ex ten t is o cclu d e d o r until the su b -e x te n t is in d icated by o n e bit
in the H O M at a level w h ere each bit in d icates 8 p ix els - i.e., a bit in level 3 o f the H O M show n
in F igure 5.13. T he p ro cess is e x p lain ed by the p se u d o c o d e below .
5 .9 Packet Row Tracing 133

RerursiveRender(extent, pList)
{
i f (extent is occluded)
return;
i f (extent is contained
inside a block of 8 pixels)
{
Rasterise relevant parts of
each projection in pList
using a Z-buffer algorithm;
Update Occlusion map
with rasterised pixels;

else
1
RecursiveRender(first half of extent);
RecursiveRender(second half of extent);
}
)

L isting 5.5: R ecu rsiv ely re n d er p arts o f the trian g le that p ro ject o n to this row. T he m eth o d
div id es the p art in to tw o until the parts span eig h t pixels. At this point a Z -b u ffer a lg o rith m is
used to d ete rm in e accu rate visib ility fo r th ese eig h t pixels.

If the extent - fully in d icated by a bit in the level 3 o f the H O M (F ig u re 5 .1 5 ) - is not o cclu d ed ,
then the Z -b u ffer like stru ctu re is u sed to d ete rm in e o cclu sio n . H ow ever, in o rd e r to test w h e th e r
a pixel has alread y been raste rise d by trian g les in o th e r n odes, the c o rre sp o n d in g low est level bits
(in d icatin g o c clu sio n statu s fo r a single p ixel) o f the H O M is tested. T he pix el is rasterised only if
the H O M s in d icate that it is u n o cclu d e d .

In this m anner H O M s are used both to o p tim ise Row Tracing and also to en su re accu rate visibility.
It is an im p o rtan t part o f Row Tracing that e n h an ces the p e rfo rm a n ce - esp e c ially fo r hig h ly
occlu d ed scen es - to be very co m p etitiv e w ith packet ray tracers and ra ste risa tio n m eth o d s like
O penG L .

5.9 Packet Row Tracing

R ow Tracing m akes effectiv e use o f the co h e re n ce p ro v id e d by the u n d e rly in g d atastru c tu re to a c ­


celerate the speed o f ren d erin g . H ow ever, upon clo ser o b serv atio n , it is n o tic e d th at the a lg o rith m
can further utilise the c o h e re n c e p ro v id ed by the data stru ctu re. Several a d ja c e n t row s trav erse a
sim ilar path do w n the tree and p o ssib ly hit the sam e o b ject. T h is in tu rn p o in ts to the p o ssib ility
o f in creased ren d erin g p erfo rm a n c e by tracin g g ro u p s o r p ack ets o f row s th ro u g h the tree. A new
algorithm - Packet Row T racing - w as im p lem en ted to fullil this p rom ise.

5.9.1 H igh Level A lgorith m

The high level alg o rith m is v ery sim ilar to R o w T racing, as the p se u d o co d e belo w show s. H o w ­
ever. due to the use o f g ro u p s / p ack ets o f row s instead o f a sin g le row, the alg o rith m needs m in o r
m odifications. T he first ch an g e is that each c o m p o n en t row has its ow n a ttrib u tes that have to be
co m p u ted p rio r to traversal o f the p acket. H ow ever, the m ajo r ch an g e is the tree traversal o f the
5 .9 P acket Row Tracing 134

pack et. A lth o u g h it is v ery sim ila r to the single row v arian t, it h as to be a d ap ted to h an d le p ack ets
o f ro w s in stead o f a sin g le row. It w ill be d isc u sse d in g rea te r d etail in S ectio n 5.9.2. F inally, once
the tree traversal co m p lete s, the trian g les o ccu p y in g all the p ix els o f all the c o m p o n en t row s have
b een determ in ed . T h is h igh level a lg o rith m is given by the p se u d o c o d e below .

RowPacketTrace()
{
for (each row packet in image)
{
for(each row r in row packet)
InitialiseRowConstants();
TraverseTreePacket(rootNode);
for(each row r in row packet)
Shade r;
)
}

L istin g 5.6: H igh L evel P acket Row>Tracing alg o rith m . Instead o f in itialisin g a sin g le row and
tracin g it (as show n in L istin g 5.1), row s are p ro cessed in g ro u p s, i.e., they are in itialised ,
trav ersed and finally th eir p ix els are sh ad ed in g ro u p s. T he a lg o rith m is v ery sim ilar to the sin g le
row v ersion and m a jo r d ev iatio n s from L istin g 5.1 are h ig h lig h e d in blue.

T h e in itialisatio n and sh ad in g are ju s t iteratin g o v er all the row s u sing the sam e m eth o d s d e scrib ed
in the single row' v ersions. T h e tree trav ersal, th o u g h , attem p ts trav ersin g several row s d o w n the
tree m in im isin g the n u m b er o f node trav ersals n ece ssary - re d u cin g the c a lc u la tio n s and im p ro v in g
th e cach e co h e re n c e and m em o ry b an d w id th usage o f the alg o rith m .

5.9.2 Tree T raversal

T h e tree traversal is ad ap te d so that the tree is trav ersed by a g ro u p o f a d ja c e n t row s instead o f a


sin g le row. T he p seu d o co d e below' show s the p ro cess as im p lem en ted .
TraverseTreePacket(node)
(

if (node is empty or
entire RowPacket misses node)
return
proj = FindNodeProjection();
if (proj is occluded in all rows or
proj is outside frustum)
return;
i f (node is leaf node)
{
for(each row in packet)
ProcessLeafNode(row, node);
return;
>
if(entire RowPacket intersects node)
i
f o r (each childNode sorted in
front-to-back order)
TraverseTreePacket(childNode);
}

else if(RowPacket partially


intersects node)
{
5 .9 P acket R o w Tracing 135

for (each row in packet)


TraverseTree(row, node);

Listing 5.7: Row-packet-tree traversal algorithm. This is very similar to the single row traversal
shown in Listing 5.3 and the major differences are highlighted in blue. The differences enable the
algorithm to process a packet of rows instead of a single row.

The early termination criteria for the tree traversal are very similar to the ones for the single row
variant. The first condition is if the node is empty, then traversal can stop. This condition does not
vary for the packet version.

The next condition is packet-node intersection. As shown in Figure 5.16, the intersection is
determined by using just the two outer boundary rows - the top and bottom rows of the packet.
The signed distances between the node and these two are computed, similar to the single row
version - described in Sections 5.5.7 and 5.5.8.

It may be recalled from Section 5.5.1 that a plane-node intersection can be achieved efficiently by
predetermining the two vertices of the node casting the maximum projection onto the row-plane’s
normal. If v i and V2 are the two vertices thus determined then we can define:
dtop and dtop - as the signed distances of the top row to v i and V2 respectively and
dbottom and dbottom ~ as the signed distances of the bottom row to v i and V2 respectively.
Using these signed distances, the cases are determined and handled as follows:

Case 1 - Entire packet intersects Node - The case occurs when:

sign(dtop ) ! = s ig n ( d top ) a n d
sign(db0ttorn ) != sign^dbottom )

This indicates that both the boundary rows intersect the node, as shown in Figure 5.16(a). Conse­
quently, all the component rows of the packet also intersect the node, thus necessitating traversal
down the tree.

Case 2 - Entire packet misses Node - Both the rows miss the node and are on the same side
of the node, as shown in Figure 5.16(b). The case is determined when:

s ig n ( d top ) sign(dtoP ) a n d
s ig n ( d bottom
lottorn ) s ig n { d bottom ) a n d
s ig n ( d top ) sign(dbottom )

During the calculation of the distances, it is to be ensured that the normals to the rows are not
directed towards each other. In this case, none of the rows in the packet intersect the node and
hence the traversal can stop at this node.

Case 3 - Part of packet intersects Node - This can occur when:


5 .9 Packet Row Tracing 136

• O ne o f the b o u n d ary ro w s in tersec t the node and the o th e r does not. T h is is d e tec ted if:

§ ign(dtop ) ! -- sig n (d top ) and


sign [di)0ttoni ) sign {d[>ott<nn )

s ig n (d top ) == sig n (d top ) and


sig n (dbottnm ) ! s ig n (dbon om )

T h is case is illu strated by F ig u re 5 .1 6 (c).

• T he node span is sm a lle r than the span o f the pack et, as sh o w n in F ig u re 5.1 6 (d ). T h is is
d etec ted w hen:

sig n (d top ) = sig n (d top ) a nd


sign(d(,ottom ) = sig n (d bottom ) and
sig n (d top ) ! = sig n {d bottom )

W hen o nly a p art o f the p a c k e t in tersec ts the n ode (F ig u re 5 .1 6 ( c ) o r F ig u re 5 .1 6 ( d )), the alg o rith m
co n tin u es traversal u sin g the sin g le row variant w ith each o f the c o m p o n e n t row s u sin g this node
as the startin g node for sin g le row traversal.

Node Occlusion testing - W h en a g ro u p o f row s are bein g trav ersed , if the c o h eren ce is g ood
en o u g h , the o cclu sio n o f one o f the row s m ay ind icate that the en tire g ro u p is o cclu d ed . H ow ever,
to test this accu rately , the m eth o d used is to sim p ly test the o cc lu sio n m ap s o f all the co m p o n en t
row s individually.

S ince the tran sfo rm a tio n m atrix is used to p ro ject the node o n to the screen , the n o d e ’s m ax i­
m um extent - given by the m in im u m and m ax im u m A' c o o rd in a te o c c u p ie d by it on the screen
- applies to all the row s. It is sufficient to find the n ode p ro je c tio n 's o v erestim a te as d e ta ile d in
S ectio n s 5.5.5 an d 5 .5 .6 ju st once for the en tire packet o f row s to get the pixel ex tent fo r the
o cclu sio n test.

It m ay be re c a lle d that du rin g the in itia lisa tio n p ro cess, c o n stan ts fo r each row are bein g created
and m ain tain ed . D uring this p ro cess. H O M s fo r each o f the c o m p o n e n t row s are also created and
initialised. T h ese H O M s are u p d ated w hen tria n g le s are raste rise d and k ept up to date w ith the
o cclu sio n statu s o f the row. D u rin g tree traversal, upon finding the n ode p ro jec tio n o v erestim ate,
it is tested ag ain st every sin g le ro w ’s H O M . If any o n e o f the H O M s in d ic a te s that the node is not
o cclu d ed , the o c clu sio n test can stop as the trav ersal needs to c o n tin u e in this case. H ow ever, if
all the H O M s in d icate that the node is o cclu d e d , tree trav ersal c an sto p fo r the en tire p acket.

A s w ith the sin g le row v ersio n , if all the tests in d icate that trav ersal sh o u ld co n tin u e, trav ersal
co n tin u es until e ith e r the node is o c c lu d e d o r a leaf n o d e is reach ed .

L eaf node processing - If trav ersal c o n tin u e s dow n the tree in p ac k et m ode until a le a f node
is reached, then all the c o m p o n e n t row s in tersec t the le a f node and can have so m e o f th eir pix els
5.9 P acket R o w Tracing 137

Node

;fe&SBoas
____

(a)Case 1 (b) Case 2


Both rows intersect node Both rows miss node

(c) Case 3.1 (d) Case 3.2


One row intersects and Node span is smaller
the other misses t h a n packet span

Top row of packet


Bottom row of packet
Part of packet hitting node

Part of packet missing node


F ig u re 5.16: R o w -P a c k e t-N o d e in terse c tio n .

determ in ed at this le a f node. In this im p le m e n tatio n o f P acket R o w T ra cin g , all th e co m p o n e n t


row s at the le a f n o d es are treated in d iv id u ally and are sep a ra te ly p ro c e sse d u sin g the sam e m ethod
used by the single row v arian t (sectio n 5.6).

Final image generation - At the end o f the tree traversal by a p a ck e t o f row s, th e trian g les at
every pixel o f each o f the c o m p o n e n t row s is d eterm in ed . T h ese tria n g le s are used to shade the
pixels co rresp o n d in g to each pix el o f each row. By tracin g all the ro w s o f th e im ag e, by g ro u p in g
them into pack ets, the final im a g e is g en erated and d isp lay ed .

An im portant c o n sid era tio n fo r Packet R o w Tracing is the n u m b e r o f row s in e ach p ack et. T he
p erfo rm an ce o f the alg o rith m is d ep e n d e n t on this num ber. A sm all n u m b e r d o es n o t m ax im ise
the co h eren ce e n o u g h , w h ereas w ith a very large n u m b er the av ailab le co h eren c e is not sufficient.
In o u r im p lem e n ta tio n s, packet sizes o f 8. 16 o r 32 w ere fo u n d to be very g o o d wfith a pack et size
o f 16 p ro v id in g the best resu lts on m ost scen es.
5 .1 0 L ow L e v e l O p tim isa tio n s 138

5.10 Low Level Optimisations

T he alg o rith m s - R o w Tracing and P acket R o w T racing - as im p lem e n ted w ith o u t any low level
o p tim isa tio n s are re a so n a b ly fast. H ow ever, in o rd e r to m a x im ise the p erfo rm a n c e , a few im p le ­
m en tatio n o p tim isa tio n s are used. T he tw o m ain o p tim isa tio n s are the use o f S IM D in stru c tio n s
(see A p p en d ix B ) and m u lti-th read in g .

5.10.1 M u lti-T h rea d in g

R o w Tracing is hig h ly p a ra lle lisa b le as each row is in d e p en d e n t o f o th e r ro w s. T h u s, p ro cessin g


o f each row can o ccu r sim u lta n eo u sly w ith o u t in terfe rin g w ith the p ro c e ssin g o f o th e r row s. T h is
p ro p erly o f R o w Tracing is in h erited fro m ray tracin g that is sim ilarly triv ially p arallelisab le.

M u lti-th read in g over the n u m b er o f co res av ailab le acc ele ra te s R o w Tracing a n d the packet variant
im m ensely. A s the resu lts sectio n - S ectio n 5.11 - w ill show , th e sp e e d -u p s ach ieved is alm o st
p erfect w ith a sp eed -u p o f 3.8 x on av erag e (fo r the fastest v arian t o f R o w T racing) on a q uad core
p rocessor.

T h e key to ach iev e the best p erfo rm an c e w hen an alg o rith m is m u lti-th re a d e d o v er m u ltip le c o re s
is to ensure th a t all the th re a d s have as sim ila r a w o rk lo ad as p o ssib le . T h is en su res that all the
th read s finish the w ork a llo cated alm o st sim u ltan eo u sly . To ach iev e the best d istrib u tio n o f w'ork
acro ss threads, a ro u n d ro b in a llo catio n o f the row s / p ack ets o f ro w s to p ro ce ss is im p lem en ted .
A s Figure 5.17 show s, this m ean s th at the first row' / first packet o f row s is allo c a te d to the first
thread, the seco n d row' / p ack et to the se co n d th re a d and so on. A s a d ja c en t row s follow a very
sim ila r path d o w n the tree, this m eth o d o f load b ala n c in g e n su re s th at the th re a d s have w o rk lo a d s
as clo se to each o th e r as p o ssib le.

Single Row Tracing Packet Row Tracing

Work allocated to CPU-1


Work allocated to CPU-2
Work allocated to CPU-3
Work allocated to CPU-4

F igure 5 .17: L oad b a lan c in g o f Row' Tracing and P acket R o w T racing. E ach c o lo u red b lo ck is
the w ork a llo cated to a thread.
5.77 R esu lts 139

5.11 Results

The algorithm has been implemented in C++ using the Visual C++ IDE. The results are tabulated
by running the algorithm on a computer with the Intel Core 2 Quad 2.4 GHz processor, 4 GB of
RAM and a Geforce 8800 GTX with 768 MB of video memory, running Windows XP64. A range
of models are used to study the algorithm.

Figure 5.18 shows the performance of Row Tracing and its packet variant in comparison to the
performance of packet ray tracing and OpenGL on the same scenes. It can be noticed that the
performances of Row Tracing and Packet Row Tracing are excellent. It is also noticeable that
Row Tracing and Packet Row Tracing are extremely amenable to multi-threading. The other point
of interest from the graph is the adaptability of Row Tracing over different data structures like
kd-trees and octrees.

In order to observe these results better, subsets of Figure 5.18, that isolate particular results are
shown in the sections that follow.

5.11.1 R ow T racing vs P ack et R ay T racing

In order to better understand the performance difference between Row Tracing and packet ray
tracing, the best results for these two rendering methods are isolated in Figure 5.19 and Table 5.2.
The figure and the table show that the fastest version of Row Tracing is significantly faster than
the fastest version of ray tracing. The advantage is particularly noticeable when the scene consists
of large triangles like in ERW6, Sponza and Sodahall scene with advantages of 7.41 x , 1.93 x and
3.31 x. This reveals the advantage of cheap triangle intersections amortised over a large number
of pixels. For other scenes - the Powerplant and Armadillo scenes - that have a large number of
(possibly small) triangles visible. Row Tracing shows advantages of 1.93 x and 1.09 x . It can thus
be inferred that Packet Row Tracing is a very viable alternative to packet ray tracing as a visibility
method.

Scene ERW 6 Sponza Armadillo Sodahall Powerplant


Row Tracing (kd-tree) 128 40 2 0 .6 49.2 6.67
Ray Tracing 8 x 8 17.3 2 0 .6 14.2 14.8 6.09
OpenGL 1000 500 333 8.40 1.49
Row Tracing speed-up vs 7.41 x 1.93 x 1.45 x 3.31 x 1.09x
Packet Ray Tracing
Row Tracing speed-up vs 0.13x 0.08 x 0.06 x 5.86x 4.48 x
OpenGL

Table 5.2: Best performances of packet ray tracing and Row Tracing in fram es p e r second. 8 x8
ray packets and Packet Row Tracing's, kd-tree versions have been used respectively.
5.11 R esu lts 140

ER W 6 Sponza Arm adillo Soda Hall Pow crPlant

804 Triangles 664 5 4 T riangles 345944 Triangles 2.1 M illion T riangles 12.7 M illion T riangles

■□ID Dll .□ID nllfLnln


Fram es per second - Single 1hreadcd version.
Please note that OpenGL results for the first three models arc clipped to the maximum on the graph

120 120 120 120

loo 100 100 100

80 80 80

d d k i
rU
II
Fram es per second - M ulti-Threaded version (4 threads).
Please note that OpenGL results arc not improved by additional threads.

3.9
3.8
3.7
3.6
3.5
3.4
3.3
3.2
3.1
3 In □
Speed-up provided by M ulti-Threading

O ctree Octree Kd I rcc Row KdTree Packet Ray Packet Ray Packet Ray Packet Ray OpenGL.
Tracing Packet Row Tracing Packet Row T racing 2x2 T racing 4x4 T racing 8x8 T racing Display Lists
Tracing Tracing 16x16

C o lo r m apping Index. First 4 colum ns: Row -Tracing. Next 4 C olum ns: R ay-tracing. Last C olum n: OpenGL.

Figure 5.18: Performance of Row Tracing vs Packet Ray Tracing and OpenGL.
5.11 R esults 141

KRYV6 S p onza A rm ad illo Soda H all P o w erP lan t

804 T ria n g le s 664 5 4 T riangles 345944 Triangles 2 . 1 M illion T riangles 12.7 M illion T rian g les

30 30 30
1.5
25 25 25

20 20 20
1
15 15 15

10 10 10
0.5

Ji : i oi : 1 0

F ram es per seco n d - Single T hreaded version.

D■
F ram es per second • M u h i-T hreaded version (4 threads).

O ctree Row O ctree K dT rcc Row K dT rec Packet Ray Packet Ray Packet Ray Packet Ray O penG L
T racing Packet Row T racing Packet Row T racing 2x2 T racing 4x4 T racing 8x8 T racing D isplay Lists
T racing T racing 16x16

F ig u re 5.19: P e rfo rm an c e o f Packet R o w T racing vs P acket R ay T racin g - S in g le th read ed and


M u lti-th read e d v e rsio n s resp ectiv ely . F o r sm a lle r sized m o d els P acket R o w T racing is m uch
faster than p ack et ray tracin g . F o r larg er m o d els, th ough the p erfo rm a n c e d ifferen ce is not that
sig n ifican t. Packet R o w T racing is still faster.

5 .11.2 R ow T racing vs O p en G L

S ince R ow Tracing can he th o u g h t o f as an alg o rith m that c o m b in e s ray tra cin g and O p en G L . a
c o m p ariso n ag ain st O p e n G L is necessary. T able 5.2 and F ig u re 5 .2 0 co m p a re s the fastest R o w
T racing variant vs O p e n G L im p lem en ted w ith d isp lay lists. It is c le a r that fo r sm a lle r m o d els -
w ith less than a m illio n tria n g le s - O p en G L is sig n ifican tly fa ste r th an R o w Tracing. H ow ever,
w ith an increase in the scen e sizes. O p en G L p e rfo rm a n c e d ecreases d ra m a tic a lly as ev id en c ed by
the p erfo rm an ce o f O p en G L on the larg er m o d els - S o d a h a ll (2.1 m illio n tria n g le s) and Power-
p la n t (12.7 m illion trian g les). F o r th ese scen es, R ow T racing is 5 .8 6 x and 4 .4 8 x fa ste r re sp e c ­
tively.

T h is show s that R o w Tracing in h erits the co m p lex ity ad v an ta g e o f ray tra c in g m ak in g it very
scalab le over scen e sizes. It also in d icates the d ep en d en ce o f O p e n G L on av ailab le v id eo m em ory.
T he display lists fo r the larg er sc en es p ro b ab ly do not fit w ith in the 768 M B o f v id eo m em o ry
available. T his also co n trib u te s to the p e rfo rm an ce red u ctio n as p arts o f the m o d el w o u ld have to
be paged. R ow T ra cin g , w ith its low m em o ry usage can h an d le such scen es efficiently.
5.11 R esu lts 142

ERW 6 Sp o n za A rm ad illo So d a H all P o w erP lan t

8 04 T ria n g le s 6 6 4 5 4 T riangles 345944 T rian g les 2 . 1 M illion T rian g les 12.7 M illio n T riangles

35 35 35 2

30 30 30 -

25 25 1.5
25

20 20 20

15 15 15

10 10 10
_ 0.5
5

0 -
5 n
0
5

0 - -

Fram es p e r second - Single T hread ed version


Please noic thai OpcnCiL results for ihc first three m odels arc clipped to the m axim um c i the graph

F ram es per seco n d - M ulti-T hreaded version (4 threads).


Please note that O penGL results arc not im proved by additional threads

O ctree R ow O ctree K dTrcc R ons K dT rce Packet R un Packet Ray Packet Ray Packet Ray O penG L
T racing Packet Row T racing Packet Row T racing 2x2 T ra c in g 4 x 4 T racing Kx8 T racing D isplay Lists
T racing T racing 16x16

F igure 5.20: P e rfo rm an c e o f R o w Tracing (S in g le th read ed and M u lti-th re a d e d v e rsio n s) vs


O penG L . O p e n G L p e rfo rm an ce fo r first th ree m o d els clip p ed to 35 fps and 120 fps in both
charts. F or sm aller m o d els. O p e n G L is m uch faster. H ow ever, fo r larg er m o d e ls, P acket Row-
T ra cin g . e sp e c ia lly w hen m u lti-th read ed , is faster than O p en G L .

5.11.3 R ow Tracing P erform ance on K d-trees vs O ctrees

T he p erfo rm an ce o f R o w Tracing on o ctrees is very goo d - close to its p erfo rm a n c e on an SA H


kd-tree. F igure 5.21 show s th at the d ifferen ce in p erfo rm a n c e is sm all fo r m o st scen es. E sp ecia lly
fo r d en sely p ack ed scen es like the A rm a d illo scen e, w h en the S A H c an n o t effectiv ely seg reg ate
the trian g les, the octree a c tu a lly show s a slig h t ad v an ta g e o v er the k d -tree. T h is is due to the
co m p u tatio n al sim p licity o f in tersec tin g an o ctree n ode as ag ain st the slig h tly m o re exp en siv e
kd -tree node in tersectio n .

T his result is sig n ifican t as o ctre e s are a very sim p le stru ctu re th at are very e asy and fast to create.
B y using fast c o n stru c tio n m eth o d s to g e th e r w ith fast ren d erin g m eth o d s, R o w Tracing w ith the
o ctree can be used as an effectiv e alg o rith m to p erfo rm d y n am ic ren d erin g . T h e o ctree resu lts also
show the ad ap ta b ility o f R o w T racing on stru c tu res that are m ad e up o f c u b o id s / A x is-A lig n e d
B o u n d in g B oxes (A A B B s). W h en c o m b in e d w ith rec e n t d ata stru c tu res like B IH (B o u n d in g In ­
terval H ierarch ies) [W K 06], B V H (B o u n d in g V olum e H iera rch ie s) [W B S 0 7 ], etc that h av e fast
5.11 R esults 143

co n stru ctio n m e th o d s av ailab le, it po in ts to the use o f R o w T racing fo r d y n am ic ren d erin g .

ERW 6 Sponza A rm adillo Soda H all P o w erP lan t

804 T rian g les 6 64 5 4 T riangles 345944 T rian g les 2 . 1 M illion T ria n g le s 12.7 M illion T ria n g le s

F ram es p er second - Single T hrea d ed version.

BO
F ram es per seco n d - M ulli-T hreaded version (4 threads)

O ctree R ow O ctree K dT ree Row K dTrec Packet R ay Packet Ray Packet R ay P acket Ray O penG L
T racing Packet Row T racing Packet Row Tracing 2x2 T racing 4*4 T racing 8x8 T racing D isplay Lists
T racing T racing 16*16

F igure 5.21: P erfo rm a n ce o f R o w Tracing on k d -trees an d o c tre e s - S in g le th read ed and


M u lti-th read ed v ersio n s resp ectiv ely . S h o w s that R o w tra cin g w o rk s w ell on b oth k d -trees and
o ctrees.

5.11.4 P erform an ce o f Row' T racing vs P acket Row T racing

F igure 5.22 and T able 5.3 show that P acket R aw Tracing is c o n sid e ra b ly faster th an R o w Tracing.
T he use o f m u ltip le row resu lts in a 1.05 x to 1.46 x ac c e leratio n in ren d e rin g p erfo rm an ce. W h ile
this is c o n sid erab le, it is n o tic a b le that the a cceleratio n due to the use o f m u ltip le row s is not in
the sam e league as ac c e le ra tio n o b ta in e d by packet ray tracin g vs sin g le ray tracin g . R o w Tracing
already m akes uses o f a lot o f th e c o h eren ce p ro v id ed by the scene. T he use o f p ack ets can thus
only slightly im prove c o h e re n c e u tilisatio n . In ad d itio n , the use o f p a c k e ts im p lie s that som e o f
the co h eren ce is lo st as the c o m p o n e n t row s m ay not traverse the sam e n o d es, at w h ich tim e P acket
R ow Tracing reverts to the sin g le row version. H aving said that, the u se o f p ack ets do es resu lt in
a co n sid erab le im p ro v em en t in re n d erin g lim e and m ak es Packet R o w T racing very useful.
5.11 R esu lts 144

Scene ERW 6 S p o n za A rm ad illo S o d ah all P o w erp lan t


R o w Tracing (k d -tree) 106.38 30.48 16.42 33.78 5.93
Packet R ow T racin g (kd- 128.21 4 0 .0 0 20.66 4 9 .2 6 6.67
tree)
Packet R ow T racing sp e e d ­ 1.21 x 1.31 x 1.26 x 1 .4 6 x 1 .1 2 x
up vs R o w Tracing
R ow T racin g (o cttre e) 9 1 .7 4 24.63 17.79 32.05 4 .5 4
P acket R ow T racin g (o c t­ 107.53 33.67 21.37 45.87 4.78
tree)
P acket R o w T racing sp e e d ­ 1.17 x 1.20 x 1.45 x 1.43 x 1 .0 5 x
up vs R o w Tracing

T able 5.3: P e rfo rm a n c e c o m p a riso n b etw een R o w Tracing and P acket R o w Tracing.

ERVV6 Sponza A rm adillo Soda H all P o w erP lan t

66 4 5 4 T riangles 3 45944 T riangles 2 I M illion T rian g les 12.7 M illion T riangles

F ram es per second - Single T hreaded version.

Fram es p e r second - M ulti-T hreaded version (4 threuds).

O ctree Row O ctree K dT ree Row K dT ree Packet Ray Pac ket R ay Packet Ray Packet Ray O penG L
T racing Packet Row T racing Packet Row' T racing 2x2 T racing 4x4 T racing 8x8 Tracing D isplay L ists
T racing T racing 16x16

Figure 5.22: P erfo rm a n ce c o m p a riso n b etw een R o w Tracing and P acket R o w T racing on both
octrees and k d -trees. It is c le a r that Packet R o w T racing is c o n sid e ra b ly faster than single R ow
T racing.
5.11 R esults 145

5.11.5 M ulti-core Perform ance

T he p e rfo rm an ce o f R o w T racing, in theo ry , sh o u ld scale w h en th e n u m b e r o f co res are in creased .


F igure 5.23 show s the sp e e d -u p s ach ie v e d by R o w Tracing w h en m u lti-th re a d e d and run on a
quad core processo r. It can be in ferred th at the alg o rith m is e x c e p tio n a lly su ited to be m u lti­
threaded. T he sp eed -u p s fo r the best v arian t o f R o w Tracing - P acket R o w T racing w ith k d -trees -
is about 3 .8 x , th u s, co n firm in g the claim s. In ad d itio n , it also sh o w s the low m em o ry b an d w id th
req u irem en ts o f R o w T ra cin g , since fo r a m u lti-th re a d e d ap p lic atio n , the m ajo r b o ttle n e c k s are
load b alan c in g and m e m o ry b a n d w id th issues.

ERW 6 S p o n /a A rm adillo Soda Hall Pow er Plan!

804 T riangles 664 5 4 T riangles 345944 T riangles 2.1 M illion T ria n g le s 12.7 M illion T riangles

S p e ed -u p pro v id ed by M u lti-T h rea d in g

O ctree R ow O ctree K dT ree Row K dT ree Packet Ray Packet Ray Packet Ray Packet Ray O penG L
T ra cin g Packet Row T rac in g P acket Row T racing 2x2 T ra cin g 4x4 T racing 8x8 T racing D isplay Lists
T racing T racing 16x16

F igure 5.23: S p ee d -u p s a ch ie v ed by R o w Tracing by u sing fo u r th re a d s a q u ad core C P U .


A cce le ra tio n s o f c lo se to 4 x su g g est that R o w T racing is h ig h ly p arallelisab le.

5.11.6 Perform ance vs Tree Size

F igure 5.24 show s the p e rfo rm a n c e o f R o w Tracing and p a ck et ray tra cin g acro ss k d -trees o f
d ifferent sizes. T he p e rfo rm a n c e o f R o w Tracing is very goo d fo r sm all tree sizes. It im p ro v es
significantly w hen the tree sizes are m o d erately increased . T h e o p tim al tree size fo r R o w T racing
is very sm all and p e rfo rm a n c e d e g rad es slig h tly w hen tree sizes are in c rea se d b eyond this. In
co m p ariso n , the p e rfo rm an c e o f p ack et ray tracin g is p o o r w hen tree sizes are very sm all. It
im proves only w hen the tree size is in cre ase d sig n ifican tly . T h e o p tim al tree size fo r p ack et ray
tracing is sig n ifican tly g re a te r than th at o f R o w Tracing.

T his is an o th er im p o rtan t resu lt w hen c o n sid e rin g R o w T racing fo r d y n a m ic ren d erin g . In a d y ­


nam ic context, the stru ctu re has to be c o n stru cte d p rio r to ren d erin g each fram e. A sm a lle r tree
size im plies lo w er c o n stru c tio n tim es lead in g to b e tte r d y n a m ic ren d erin g p erfo rm an ce.
5.12 S u m m a ry 146

i -i<to
—Row Tracing — Row Tracing
Packet Row Tracing - Packet Row T racing

— Packet R a\ tracin g 8 \8 ■ Packet R a\ T racing 8x8


= 4IHI

800
i
c

§ 2(H>
-400

100 -

KBs KBs
o
0 1.000 2.1HM) 3.000 4.000 5.000 6.000 20.(100 40.000 60.0(H)
Tree Size in KBs Tree size in KBs

(a) Sponza scen e (b) Sodahall scen e

F igure 5.24: R e n d erin g tim es u sin g R o w Tracing, R acket R o w Tracing and P ack et R ay T racin g
over tree sizes show th at w h ile p a c k e t ray tra cin g n eed s la rg e r trees, R o w T racing an d Packet
R ow Tracing w o rk very w ell w ith sm a lle r trees p o in tin g to the p o ssib ility o f u sin g R o w Tracing
fo r d y n a m ic scenes.

5.11.7 Row Tracing with and w ithout HOMs

A s m en tioned, R o w Tracing can be used w ith o u t H O M s. In th is case, the v isib ility is a c cu rately
d e term in ed using a p artial Z -buffer. To m e a su re the e ffectiv en ess o f H O M s as im p le m e n ted fo r
R ow T racing, the p e rfo rm a n c e is d e term in e d w ith and w ith o u t the use o f H O M s. T he resu lts are
show n in F igure 5.25.

T he tim es in d icate that H O M s are very effective. For the fa ste st v ersio n o f R o w T racing - m u lti­
threaded packet R o w T racing on SA H k d -trees - the H O M s are resp o n sib le sp eed -u p s o f 1.2 x ,
1.5 x , 2.3 x , 2 1 .7 x an d 3G x fo r the five sc en es - E R W 6. S p o n za , A rm a d illo , S o d a h a ll, and P ow er-
p la n t scenes - resp ectiv ely . T h e benefit o f H O M s is seen m ore for larg er sce n e s like the S o d a h a ll
and P ow erplan t scen es. O nly fo r the very sm all E R W 6 scen e, the H O M ’s effect is n eg lig ib le.
T his show s that the H O M s, and in a sim ila r vein - the e arly ray term in a tio n u sed by ray trac e rs is
highly effective in im p ro v in g th e re n d erin g p e rfo rm an ce. It is also an in d ic a tio n th at H O M s are a
h ighly effective m eth o d to tra n sfe r the e arly ray term in atio n p ro p e rty to R o w Tracing.

5.12 Summary

T he results show the effe c tiv en e ss o f the alg o rith m as a v isib ility m eth o d . F o r scen es w ith p re­
d o m in an tly large tria n g le s v isib le , R o w Tracing show s ex cep tio n a l p e rfo rm a n c e in c o m p ariso n to
pack et ray tracing. T h is is a ttrib u ted to the am o rtisatio n o f in tersec tio n c alc u la tio n s o v er a large
n um ber o f pixels. T he trav ersal calc u la tio n s are also av erag ed o v er a n u m b e r o f p ixels, but the
sm aller tree sizes - im p ly in g few er trav ersals - w ork in fav o u r o f R o w T racing m ak in g it m uch
m ore efficient. F inally, the a d a p ta tio n o f the e arly ray te rm in atio n p ro p e rty - used ex ten siv ely in
5.12 S u m m a ry 147

ERW 6 Sponxa Armadillo Soda Hall Pow erPIant

804 TYimglcs 66454 Triangle* 345944 Tnangle* 2.1 Million Triangles 12.7 M S on Triangles

Frame* per second * Single llircaded version.

Fram es per seco n d - Multi- I'hreaded version (4 threads)

Octree Row Octree Packet K dTieeR ow KcfTiee Octree Row Octiec Racket KDTree Row KdTree
Tiacais Row Tiacaia Tiacais Packet Row Tiacais WO Row'Tiacaig Tracm aW O Packet Row
Tiacaia HOM* WO HOMs HOMs Tracing WO

Figure 5.25: P e rfo rm an c e o f Row' T racing on k d -trees and o ctrees w ith and w ith o u t H O M s.
H O M s are sh o w n to be hig h ly effectiv e as an o cclu sio n d etec tio n m ethod.

ray tracers - to R o w Tracing th ro u g h a ID versio n o f H O M s proves ex tre m e ly b en eficial, lead in g


to a very efficient alg o rith m .

In co m p ariso n to O p en G L . R o w Tracing show s an ad v an tag e w h en the sizes o f the scen es are


large. A s the scen e sizes in crea se , O p en G L - u sing a brute force ap p ro a c h - show s d e terio ra ted
p erfo rm an ce, w ith ray tra c in g an d R o w Tracing o u tp e rfo rm in g it fo r these scen es. A t the sam e
lim e, the p erfo rm a n ce o f R o w T racing fo r sm a lle r scen es is very g o o d . It can thus be in fe rred that
R ow Tracing has a b e tte r c o m p le x ity o v er scen e sizes than basic O p e n G L w ith d isp lay lists. T he
results thus show that R o w T racing scales v ery effectiv ely b oth w ays acc o rd in g to the scen e size.

A n o th er in feren ce from the resu lts is the p o ten tial su ita b ility o f R o w Tracing fo r d y n am ic re n ­
dering. R ow Tracing w o rk s v ery w ell w ith a very sim p le d ata stru c tu re like the octree. O ctrees
- being sim p le r to c reate - are b etter suited fo r a d y n am ic ren d e rin g context. In ad d itio n . R o w
Tracing w orks w ell w hen tree sizes are sm all. A gain, sm aller trees are faster to c reate - p o in tin g to
very good p o ten tial fo r d y n a m ic ren d erin g . T he u tilisa tio n o f R o w T racing in a d y n am ic re n d erin g
context is one o f the o b v io u s are as fo r fu rth e r investig atio n .

R ow Tracing is also very easily p arallelised - a p ro p erty that is in h e rite d from ray tracers. T h is
is a very valuable ad v an ta g e w hen th ere is a trend to w ard s in c re a se d n u m b e r o f C P U cores. T he
5 .1 2 S u m m aiy 148

results show an almost perfect speed-up with the number of cores. Thus, when the number of
cores increase, Row Tracing performance would increase without any modifications to the imple­
mentation.

Other future work include using techniques used in rasterisation - shadow, reflection and refraction
mapping, shadow volumes, etc - to add secondary ray effects. Row Tracing inherits the inability of
rasterisation based methods to physically model secondary rays and these have to be implemented
with methods similar to rasterisation. There is also scope for further optimisations like testing
entire packets for occlusion, faster leaf node processing for packets by intersecting groups of rows
with the leaf node triangles, and possibly splitting packets into smaller packets at divergence nodes
during traversal to further accelerate Packet Row Tracing.

It has been seen that the optimal trees for Row Tracing are much different than trees for ray
tracing. For a cube, the probability of a plane intersecting it is proportional to the sum of the edge
lengths [Hai07]. Instead of using the SAH, a modified version that uses the sum of edges measure
could be used instead of the surface area to create a tree with better properties for Row Tracing.

Row Tracing is introduced as a novel algorithm that aims to improve visibility determination /
rendering performance. It can be considered as a hybrid algorithm combining rasterisation and
ray tracing concepts - fast scanline rasterisation at the leaf node level and fast front-to-back tree
traversal of an octree or a kd-tree. The use of HOMs to determine accurate visibility transforms it
into a very effective algorithm, especially in cases where there is significant occlusion - as in most
computer graphics scenes. The fact that rows are traced independently of each other implies that
it is highly parallelisable. In addition to these factors. Row Tracing and Packet Row Tracing are
simple algorithms to implement. These factors enable Row Tracing to be a very viable alternative
to either packet ray tracing or rasterisation based visibility determination / rendering methods.
Chapter 6

Conclusions and Future Work

Conclusions

This research has made several contributions while investigating the use of ray tracing data struc­
tures and algorithms in the context of visibility. The aim of the research was to investigate the
effectiveness of ray tracing structures like kd-trees, octrees, etc and associated algorithms from a
purely visibility context - i.e., when only the first intersected object is to be found without con­
sidering additional optical effects like reflections, refractions and indirect lighting. Three new
methods to achieve this have been developed and studied.

• RBSP trees - A new structure based on kd-trees, but more general, RBSP trees provide
ability to have more than three splitting axes. The axes may be in any arbitrary direction as
long as they are predetermined prior to tree construction. They allow the tree to fit closely
to the scene being rendered, thus reducing the number of node traversals and primitive
intersections.

Construction and use of RBSP trees to generate images has been discussed extensively in
Chapter 3. Due to their similarity to kd-trees, several well known algorithms and heuristics
for both construction and traversal can be adapted. Through the use of the Surface Area
Heuristic to construct them, it has been possible to construct good trees for most scenes.

Results show that, for scenes with predominantly non-axis-aligned triangles, RBSP trees
are responsible for a reduction in the number of node traversals and triangle intersections.
Unoptimised rendering times also show that RBSP trees are more efficient than kd-trees for
these scenes. At the same time, RBSP trees are slower than kd-trees when the scene consists
of predominantly axis-aligned triangles like the Sponza scene. In addition, another problem
is the fact that construction times, as described in this thesis, are dependent on the number
of axes used and can be quite long.

During the axes selection phase of the construction, the axes can be drawn from the scene
itself so that the RBSP tree provides an even closer fitting structure. By this, fewer axes
would be sufficient. The results showed that when ray tracing on RBSP trees, for most
scenes, the best results were obtained with 8 - 1 2 axes and the rendering times went up with
trees constructed using more than 8-12 axes. In addition, for the Sphere scene, for which
the axes were, in effect, customised, RBSP trees produced the best acceleration compared
to kd-trees.

149
150

It is notable that the introduction of RBSP trees has drawn more attention to the use of
structures with non-axis-aligned splitting axes. Budge et al. address some of the problems
of RBSP trees as described in our publication. Budge et al.’s improvements make RBSP
trees faster to construct and traverse. Ize et al. use the more general form of BSP trees and
show that they are quite effective for ray tracing.

• Coherent Rendering - This is an algorithm aiming to demonstrate a reduced complexity for


primary visibility using essentially ray tracing structures and concepts like the front-to-back
traversal of a kd-tree. The method is an adaptation of the volume rendering algorithm by
Mora et al. that was very efficient and showed that per pixel rendering complexities are
constant.

The algorithm uses several concepts from both rasterisation and ray tracing based visibility
methods like Hierarchical Occlusion M aps, polygon subdivision and front-to-back traversal
of kd-trees. By combining these, an algorithm that achieves better coherence is developed.
It is shown that as image sizes increase, the time per pixel converges to a constant. This time
per pixel is only about 30% greater for the 12+ million Powerplant scene than for the Single
Triangle scene when large image sizes are used. This shows that complexity is definitely
less than logarithmic as image sizes increase.

• Row Tracing and Packet Row Tracing - Row Tracing is an algorithm similar to scanline
rendering in that it processes one row of the image at time. However, in contrast to scanline
Tenderers, the intersections between the row and the objects are done in object space. In
addition, ray tracing structures like kd-trees and octrees are used. An adaptation of H ier­
archical Occlusion Maps are used to determine already occluded areas of the image. The
combination of concepts from both rasterisation and ray tracing produces a very effective
and parallelisable method of determining visibility.

In addition, upon closer inspection, it is observed that neighbouring rows traverse a similar
path down the tree. To maximise this coherence, groups of rows are traversed down the tree
with a small change to the traversal algorithm. Tracing groups of 16 neighbouring rows
through the tree is observed to produce the best results.

Results show that Row Tracing and Packet Row Tracing are very effective algorithms. When
scene sizes are small. Row Tracing and Packet Row Tracing are much faster than packet ray
tracing. Comparing the algorithms with OpenGL (that uses a normal Z-buffer algorithm)
shows that Row Tracing and Packet Row Tracing are much faster when scene sizes are
large. For these scenes, packet ray tracing is also faster than OpenGL owing to its better
complexity. However, Packet Row Tracing is faster than packet ray tracing for these scenes
too.

The methods are new techniques, two of which have been published. However, as with any re­
search, each method has given rise to new ideas that could further improve the techniques. These
will be briefly described.

Future Work

For each technique, some areas of improvement and extensions are easily identifiable.

• RBSP trees - The flexibility of RBSP trees to have several axes that may be in arbitrary
151

directions means that RBSP trees can adapt very well to the scene. It is believed that by
using 8-12 axes customised to the scene, RBSP trees could be the best structure for ray
tracing single rays.

Another method to customise the tree to the scene is to have a large number of axes -
for eg. 1 0 0 directions - and at each step a subset of this used so that the construction
process is simplified. The subset would be determined based on geometric properties of
the geometry in the scene. This would alleviate the problem of slow construction times
seen when the number of axes increases. The constructed tree may sufficiently reduce the
num ber of traversals and intersections to overshadow the increased traversal cost.

Although in this thesis only primary rays are considered to determine primary visibility, it
is believed that the real value of RBSP trees would be when several incoherent rays are to be
traced through the scene like in global illumination algorithms. In these methods, the ability
of RBSP trees to reduce the number of traversals and intersections would be truly valuable.

Although some effort has been put into optimising the traversal methods, the effort has not
been significant. Budge et al. [BCNJ08] state that their optimised methods are 10x faster
than ours. This shows that there is considerable margin for optimisation.

• Coherent Rendering - The Coherent Rendering algorithm described is unoptimised and the
main aim was to demonstrate that a lower complexity was possible. However, the abso­
lute performance is not very competitive. To address this, the algorithm can be optimised
through the use of SSE instructions and by a cycle of optimising and profiling. The perfor­
mance after optimisation may reach competitive levels.

In rasterisation, shadows are generated through the use of shadow maps. Similar techniques
are used for refraction and reflection. The generation of additional images / maps could also
be achieved with a lower complexity leading to a full fledged low complexity renderer.

• Row Tracing and Packet Row Tracing - As described in the future work for Coherent Ren­
dering, shading, reflection and refraction could be simulated through shadow maps, etc.

A major advantage of Row Tracing is that it works very well with simple structures. This
was shown by the efficiency of Row Tracing on octrees. The difference in rendering times
on an SAH kd-tree and an octree is not very significant. Simple structures like the octree are
simpler and much faster to build leading to applications for rendering dynamic scenes. In
addition, the fact that Row Tracing works well on kd-trees and octrees imply that it would
work well with any structure consisting of axis-aligned bounding boxes. Hence, they can be
used with structures that are fast to build like grids or BVHs for use in a dynamic context.

The best results for Row Tracing have been obtained on kd-trees built using the SAH. How­
ever, when a plane is traversed, the probability of a plane intersecting a box should be used
instead of the surface area. This probability - of a random plane intersecting for a box -
is mentioned by Haines [Hai07] as the sum of the edges of the box. Replacing the surface
area with the sum of edges would thus be more accurate and could create kd-trees that are
better for Row Tracing.
Bibliography

[AC97] John Amanatides and Kin Choi. Ray Tracing Triangular Meshes. In Proceedings o f
the Eighth Western Computer Graphics Symposium, pages 43-52, 1997.

[AGL91] Mark Agate, Richard L. Grimsdale, and Paul F. Lister. The HERO Algorithm for
Ray-Tracing Octrees. In Advances in Computer Graphics Hardware IV (Eurograph-
ics'89 Workshop), pages 61-73, London, UK, 1991. Springer-Verlag.

[AM01] Tomas Akenine-Mller. Fast 3D Triangle-Box Overlap Testing. Journal o f Graphics


Tools, 6C1):29—33, 2001.

[Ama84] John Amanatides. Ray Tracing with Cones. SIGGRAPH Computer Graphics,
18(3): 129—135, 1984.

[AMH02] Tomas Akenine-Moller and Eric Haines. Real-Time Rendering. A. K. Peters, Ltd.,
Natick, MA, USA, 2002.

[APB87] Bruno Amaldi, Thierry Priol, and Kadi Bouatouch. A New Space Subdivision
Method for Ray Tracing CSG Modelled Scenes. The Visual Computer, 3(2):98—108,
August 1987.

[App 6 8 ] Arthur Appel. Some techniques for shading machine renderings of solids. In AFIPS
’68 (Spring): Proceedings o f the April 30-M ay 2, 1968, spring joint computer con­
ference, pages 37—45, New York, NY, USA, 1968. ACM.

[Are 8 8 ] Jeff Arenberg. Ray/Triangle Intersection with Barycentric Coordinates. Ray Tracing
News, 1(11), 1988.

[Arv 8 8 ] J Arvo. Linear-time Voxel Walking for Octrees. Ray Tracing News, 1(5), 1988.

[AW87] John Amanatides and Andrew Woo. A Fast Voxel Traversal Algorithm for Ray
Tracing. In Proceedings o f Eurographics ’87, pages 3-10, Amsterdam, August 1987.
North-Holland.

[Bad90] Didier Badouel. An Efficient Ray-Polygon Intersection. Graphics Gems, pages 390-
393, 1990.

[BCG+96] Gill Barequet, Bernard Chazelle, Leonidas J. Guibas, Joseph S. B. Mitchell, and
Ayellet Tal. BOXTREE: A Hierarchical Representation for Surfaces in 3D. Com­
puter Graphics Forum, 15(3):387—396, 1996.

[BCNJ08] Brian C. Budge, Daniel Coming, Derek Norpchen, and Kenneth I. Joy. Accelerated
Building and Ray Tracing of Restricted BSP Trees. IEEE Symposium on Interactive
Ray Tracing, 2008. R T 2008., pages 167-174, Aug. 2008.

152
B IB L IO G R A P H Y 153

[Ben75] Jon Louis Bentley. Multidimensional Binary Search Trees used for Associative
Searching. Communications o f the ACM , 18(9):509—517, 1975.

[Ben79] J.L. Bentley. Multidimensional Binary Search Trees in Database Applications. IEEE
Transactions on Software Engineering, SE-5(4):333-340, July 1979.

[Ben06] Carsten Benthin. Realtime Ray Tracing on Current CPU Architectures. PhD thesis,
Saarland University, Saarbriicken, Germany, 2006.

[BHO+94] M. De Berg, D. Halperin, M. Overmars, J. Snoeyink, and M. Van Kreveldt. Efficient


Ray Shooting and Hidden Surface Removal. In Algorithmica, pages 21-30, 1994.

[Bou70] W. Jack Bouknight. A Procedure for Generation of Three-Dimensional Half-Toned


Computer Graphics Presentations. Communications o f the ACM, 13(9):527—536,
1970.

[BP04] Thierry Berger-Perrin. SSE Ray/Box Intersection Test, available at http:


//w w w .flipcode.com/archives/SSE_RayBox_Intersection_
T e s t .shtml, 2004.

[BP05] Thierry Berger-Perrin. Branchless Ray/Box intersections, available at http: //


ompf .org/ray/ray_box .html, 2005.

[BWPP04] Jin Bittner, Michael Wimmer, Harald Piringer, and Werner Purgathofer. Coherent
Hierarchical Culling: Hardware Occlusion Queries Made Useful. Computer Graph­
ics Forum, Proceedings o f Eurographics, 2004, 23(3):615—624, September 2004.

[BWS06] Solomon Boulos, Ingo Wald, and Peter Shirley. Geometric and Arithmetic Culling
Methods for Entire Ray Packets. Technical report, School of Computing, University
of Utah., 2006.

[Car84] Loren Carpenter. The A-buffer, An Antialiased Hidden Surface Method. SIGGRAPH
Computer Graphics, 18(3): 103-108, 1984.

[Cat74] Edwin Earl Catmull. A Subdivision Algorithm fo r Computer Display o f Curved Sur­
faces. PhD thesis, The University of Utah, 1974.

[CDP95] F. Cazals, G. Drettakis, and C. Puech. Filtering, Clustering and Hierarchy Construc­
tion: A New Solution for Ray-Tracing Complex Scenes. Computer Graphics Forum,
14(3):371-382, September 1995.

[ChaOl] Allen Y. Chang. A Survey of Geometric Data Structures for Ray Tracing. Technical
report, Polytechnic University, Brooklyn, Long Island, Westchester., 2001.

[Cla76] James H. Clark. Hierarchical Geometric Models for Visible Surface Algorithms.
Communications o f the ACM, 19 (10):547—554, 1976.

[C 0 0 8 6 ] Robert L. Cook. Stochastic Sampling in Computer Graphics. ACM Transactions on


Graphics, 5(1):51—72, 1986.

[CPC84] Robert L. Cook, Thomas Porter, and Loren Carpenter. Distributed Ray Tracing.
SIGGRAPH Computer Graphics, 18(3): 137-145, 1984.

[C S08] Daniel S. Coming and Oliver G. Staadt. Velocity-Aligned Discrete Oriented Poly­
topes for Dynamic Collision Detection. IEEE Transactions on Visualization and
Computer Graphics, 14( 1): 1—12, 2008.
B IB LIO G R A P H Y 154

[CW 8 8 ] J G Cleary and G Wyvill. Analysis of an Algorithm for Fast Ray Tracing using
Uniform Space Subdivision. The Visual Computer, (4):65—83, 1988.

[DHK08] H. Dammertz, J. Hanika, and A. Keller. Shallow Bounding Volume Hierarchies


for Fast SIMD Ray Tracing of Incoherent Rays. Computer Graphics Forum,
27(4): 1225-1233, June 2008.

[DK08] Holger Dammertz and Alexander Keller. The Edge Volume Heuristic - Robust Tri­
angle Subdivision for Improved BVH Performance. IEEE Symposium on Interactive
Ray Tracing, 2008. R T 2008., pages 155-158, Aug. 2008.

[EG07] M. Ernst and G. Greiner. Early Split Clipping for Bounding Volume Hierarchies.
IEEE Symposium on Interactive Ray Tracing, 2007. R T ’07., pages 73-78, Sept.
2007.

[EGMM07] Martin Eisemann, Thorsten Grosch, Marcus Magnor, and Stefan Mueller. Fast
Ray/Axis-Aligned Bounding Box Overlap Tests using Ray Slopes. Journal o f
Graphic Tools, 12(4), 12 2007.

[Eri97] Jeff Erickson. Pliicker Coordinates. Ray Tracing News, 10(3), 1997.

[Eri07] Christer Ericson. Plucker coordinates considered harmful! available at http:


//realtimecollisiondetection.net/blog/?p=13, 2007.
[FKN80] Henry Fuchs, Zvi M. Kedem, and Bruce F. Naylor. On Visible Surface Generation by
A Priori Tree Structures. Computer Graphics, (Proceedings o f SIGGRAPH 1980),
pages 124-133, 1980.

[FLT08] Fast light Toolkit, available at http : / / w w w . fltk .org, 2008.

[FS 8 8 ] Donald S. Fussed and K. R. Subramanian. Fast Ray Tracing using K-d Trees. Tech­
nical Report CS-TR-88-07, University of Texas at Austin, 1, 1988.

[FTI8 6 ] A. Fujimoto, T. Tanaka, and K. Iwata. ARTS: Accelerated Ray-Tracing System.


Computer Graphics and Applications, IEEE, 6(4): 16-26, April 1986.

[FvDFH90] James D. Foley, Andries van Dam, Steven K. Feiner, and John F. Hughes. Computer
Graphics: Principles and Practice (2nd ed.). Addison-Wesley Longman Publishing
Co., Inc., Boston, MA, USA, 1990.

[GA93] I. Gargantini and H. H. Atkinson. Ray Tracing an Octree: Numerical Evaluation of


the First Intersection. Computer Graphics Forum, 12(4): 199-210, October 1993.

[GC91] Dan Gordon and Shuhong Chen. Front-to-Back Display of BSP Trees. IEEE Com­
puter Graphics and Applications, 11 (5):79—85, 1991.

[GGW98] Jon Genetti, Dan Gordon, and Glen Williams. Adaptive Supersampling in Object
Space using Pyramidal Rays. Computer Graphics Forum, 17(1):29—54, 1998.

[GKM93] Ned Greene, Michael Kass, and Gavin Miller. Hierarchical Z-buffer Visibility. In
SIGGRAPH ’93: Proceedings o f the 20th annual conference on Computer graphics
and interactive techniques, pages 231-238, New York, NY, USA, 1993. ACM.

[Gla84] Andrew S. Glassner. Space Subdivision for Fast Ray Tracing. IEEE Computer
Graphics and Applications, 4(10): 15-22, 1984.

[Gla90] Andrew S. Glassner. Normal coding. Graphics Gems, pages 257-264, 1990.
B IB L IO G R A P H Y 155

[GLM96] S. Gottschalk, M. C. Lin, and D. Manocha. OBBTree: A Hierarchical Structure for


Rapid Interference Detection. In SIGGRAPH ’96: Proceedings o f the 23rd annual
conference on Computer graphics and interactive techniques, pages 171-180, New
York, NY, USA, 1996. ACM.

[GM03] Markus Geimer and Stefan Muller. A Cross-Platform Framework for Interactive Ray
Tracing. In Proc. o fG I Graphiktag, pages 25-34, 2003.

[GotOO] Stefan Aric Gottschalk. Collision Queries using Oriented Bounding Boxes. PhD
thesis, The University of North Carolina at Chapel Hill, 2000.

[Gre94] Ned Greene. Detecting Intersection of a Rectangular Solid and a Convex Polyhedron.
Graphics gems /V, pages 74-82, 1994.

[Gre96] Ned Greene. Hierarchical Polygon Tiling with Coverage Masks. In SIGGRAPH ’96:
Proceedings o f the 23rd annual conference on Computer graphics and interactive
techniques, pages 65-74, New York, NY, USA, 1996. ACM.

[GS87] Jeffrey Goldsmith and John Salmon. Automatic Creation of Object Hierarchies for
Ray Tracing. IEEE Computer Graphics and Applications, 7(5): 14-20, 1987.

[GTGB84] Cindy M. Goral, Kenneth E. Torrance, Donald P. Greenberg, and Bennett Battaile.
Modeling the Interaction of Light between Diffuse Surfaces. SIGGRAPH Computer
Graphics, 18(3):213-222, 1984.

[Gut84] Antonin Guttman. R-Trees: A Dynamic Index Structure for Spatial Searching. In
SIGMOD '84: Proceedings o f the 1984 ACM SIGMOD international conference on
Management o f data, pages 47-57, New York, NY, USA, 1984. ACM.

[GW09] Anat Grynberg and Greg Ward. Conference Room, available at http://
radsite.l b l .gov/mgf/scenes.html, 2009.

[Hai89] Eric Haines. Essential Ray Tracing Algorithms. An Introduction to Ray Tracing,
pages 33-77, 1989.

[Hai94] Eric Haines. Point in Polygon Strategies, pages 24-46. Academic Press Professional,
Inc., San Diego, CA, USA, 1994.

[Hai07] Eric Haines. Puzzle: Plane Intersection with Spheres vs. Boxes. Ray Tracing News,
20(1), 2007.

[Hav99] Vlastimil Havran. A Summary of Octree Ray Traversal Algorithms. Ray Tracing
News, 12(2), 1999.

[HavOl] V. Havran. Heuristic Ray Shooting Algorithms. PhD thesis, Czech Technical Uni­
versity in Prague, Czech Republic., 2001.

[Hav02] Vlastimil Havran. Mailboxing, yea or nay? Ray Tracing News, 15(1), 2002.

[HB97] Donald Hearn and M. Pauline Baker. Computer Graphics (2nd ed.): C version.
Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1997.

[HBOO] Vlastimil Havran and Jin Bittner. LCTS: Ray Shooting using Longest Common
Traversal Sequences. In Proceedings o f Eurographics (E G ’00), pages 59-70, Inter­
laken, Switzerland, 2000.
B IB LIO G R A P H Y 156

[HB02] V. Havran and J. Bittner. On Improving Kd-trees for Ray Shooting. In Proceedings
ofW SCG , pages 209-216, 2002., 2002.

IHH84] Paul S. Heckbert and Pat Hanrahan. Beam Tracing Polygonal Objects. In SIG ­
GRAPH ’84: Proceedings o f the 11th Annual Conference on Computer Graphics
and Interactive Techniques, pages 119-127, New York, NY, USA, 1984. ACM.

[HHS06] Vlastimil Havran, Robert Herzog, and Hans-Peter Seidel. On the Fast Construction
of Spatial Data Structures for Ray Tracing. In Proceedings o f IEEE Symposium on
Interactive Ray Tracing 2006, pages 71-80, September 2006.

[HilOO] Francis J. Hill. Computer Graphics Using OpenGL. Prentice Hall PTR, Upper
Saddle River, NJ, USA, 2000.

[HKBv97] Vlastimil Havran, Tomas Kopal, Jin Bittner, and Jin Zara. Fast Robust BSP Tree
Traversal Algorithm for Ray Tracing. Journal o f Graphics Tools, 2(4): 15-23, 1997.

[HKRS02] Jim Hurley, Alexander Kapustin, Alexander Reshetov, and Alexei Soupikov. Fast
Ray Tracing for Modern General Purpose CPU. In Proceedings o f Graphicon, 2002.

[HMS06] W. Hunt, W.R. Mark, and G. Stoll. Fast Kd-tree Construction with an Adaptive
Error-Bounded Heuristic. IEEE Symposium on Interactive Ray Tracing, 2006., pages
81-88, Sept. 2006.

[Hof96] Kenny Hoff. A Faster Overlap Test for a Plane and a Bounding Box.
available at http://www.cs.unc.edu/~hoff/research/vfculler/
boxplane.html, 1996.
[HunOS] Warren A. Hunt. Corrections to the Surface Area Metric with Respect to Mail-
Boxing. IEEE Symposium on Interactive Ray Tracing, 2008. R T 2008., pages 77-80,
Aug. 2008.

[Hur05] J. Hurley. Ray Tracing goes Mainstream. Intel Technology Journal 9, (2):99—108,
2005.

[HW91] Eric Haines and John Wallace. Shaft Culling for Efficient Ray-Traced Radiosity. In
Proceedings o f the Eurographics Workshop on Rendering, pages 122-138. Springer
Verlag, 1991.

[Int08] Intel. Pentium III Processors Technical Documents. available at


http:I I www.intel.com/design/intarch/pentiumiii/docs_
pent iumiii_pga370 .htm, 2008.
[Int09] Intel C++ Compiler User and Reference Guides. 2009.

[ISP07] T. Ize, P. Shirley, and S. Parker. Grid Creation Strategies for Efficient Ray Tracing.
IEEE Symposium on Interactive Ray Tracing, 2007. R T ’07., pages 27-32, Sept.
2007.

[IWP08] Thiago Ize, Ingo Wald, and Steven G. Parker. Ray Tracing with the BSP Tree. IEEE
Symposium on Interactive Ray Tracing, 2008. R T 2008., pages 159-166, Aug. 2008.

[JenOl] Henrik Wann Jensen. Realistic Image Synthesis using Photon Mapping. A. K. Peters,
Ltd., Natick, MA, USA, 2001.
B IB LIO G R A P H Y 157

[JonOO] Ray Jones. Intersecting a Ray and a Triangle with Plucker Coordinates. Ray Tracing
News, 13(1), 2000.

[JW89] D Jevans and B Wyvill. Adaptive Voxel Subdivision for Ray Tracing. In Proceedings
o f Graphics Interface 89, pages 164-172, 1989.

[KA91] D. Kirk and J. Arvo. Improved Ray Tagging for Voxel-Based Ray Tracing. In
Graphics Gems II, pages 264—266. Academic Press, 1991.

[Kap85] M Kaplan. Space-Tracing: A Constant Time Ray-Tracer. In SIGGRAPH ’85 State


o f the A rt in Image Synthesis seminar notes, pages 149-158. Addison Wesley, 1985.

[Kel97] Alexander Keller. Instant Radiosity. In SIGGRAPH ’97: Proceedings o f the 24th
annual conference on Computer graphics and interactive techniques, pages 49-56,
New York, NY, USA, 1997. ACM Press/Addison-Wesley Publishing Co.

[KHM+98] J. T. Klosowski, M. Held, J. S. B. Mitchell, H. Sowizral, and K. Zikan. Efficient


Collision Detection Using Bounding Volume Hierarchies of /c-DOPs. IEEE Trans­
actions on Visualization and Computer Graphics, 4( 1):21—36, Mar. 1998.

[KK 8 6 ] Timothy L. Kay and James T. Kajiya. Ray tracing complex scenes. In SIGGRAPH
’86: Proceedings o f the 13th annual conference on Computer graphics and interac­
tive techniques, pages 269-278, New York, NY, USA, 1986. ACM.

[KM07J Ravi P. Kammaje and B. Mora. A Study of Restricted BSP Trees for Ray Tracing.
IEEE Symposium on Interactive Ray Tracing, 2007. RT ’07., pages 55-62, Sept.
2007.

[KM08] Ravi P. Kammaje and Benjamin Mora. Row Tracing using Hierarchical Occlusion
Maps. IEEE Symposium on Interactive Ray Tracing, 2008. R T 2008., pages 27-34,
Aug. 2008.

[KS97] Krzysztof S. Klimaszewski and Thomas W. Sederberg. Faster Ray Tracing Using
Adaptive Grids. IEEE Computer Graphics and Applications, 17( 1):42—51, 1997.

[KS06] A. Kensler and P. Shirley. Optimizing Ray-Triangle Intersection via Automated


Search. IEEE Symposium on Interactive Ray Tracing, 2006., pages 33-38, Sept.
2006.

[LYMT06] C. Lauterbach, Sung-Eui Yoon, Dinesh Manocha, and D. Tuft. RT-DEFORM: Inter­
active Ray Tracing of Dynamic Scenes using BVHs. IEEE Symposium on Interactive
Ray Tracing, 2006., pages 39-46, Sept. 2006.

[Mah05] Jeffrey A. Mahovsky. Ray Tracing with Reduced-Precision Bounding Volume Hier­
archies. PhD thesis, University of Calgary, Calgary, Canada, 2005.

[MB90] David J. MacDonald and Kellogg S. Booth. Heuristics for Ray Tracing Using Space
Subdivision. The Visual Computer, 6(3): 153-166, 1990.

[ME05] Benjamin Mora and David S. Ebert. Low-Complexity Maximum Intensity Projec­
tion. ACM Transactions on Graphics, 24(4): 1392-1416, 2005.

[MFOO] Gordon Muller and Dieter W. Fellner. Hybrid Scene Structuring with Application
to Ray Tracing. In Proceedings o f International Conference on Visual Computing
(1999), 2000.
B IB LIO G R A P H Y 158

[MJCOO] Benjamin Mora, Jean-Pierre Jessel, and Rene Caubet. Accelerating Volume Render­
ing with Quantized Voxels. In VVS ’00: Proceedings o f the 2000 IEEE symposium
on Volume visualization, pages 63-70, New York, NY, USA, 2000. ACM.

[MJC02] Benjamin Mora, Jean Pierre Jessel, and Rene Caubet. A New Object-Order Ray-
Casting Algorithm. In VIS ’02: Proceedings o f the conference on Visualization ’02,
pages 203-210, Washington, DC, USA, 2002. IEEE Computer Society.

[MKJ08] Benjamin Mora, Ravi P. Kammaje, and Mark W. Jones. On the Lower Complexity of
Coherent Renderings. Technical report, Department of Computer Science, Swansea
University, 2008.

[ML03] Jos Pascual Molina Masso and Pascual Gonzalez Lopez. Automatic Hybrid Hierar­
chy Creation: a Cost-Model Based Approach. Computer Graphics Forum, 22(1):5—
14, 2003.

[MT97] Tomas Moller and Ben Trumbore. Fast, Minimum Storage Ray-Triangle Intersec­
tion. Journal o f Graphics Tools, 2( 1):21—28, 1997.

[MW04] Jeffrey Mahovsky and Brian Wyvill. Fast Ray-Axis Aligned Bounding Box Overlap
Tests with Plucker Coordinates. Journal o f Graphics Tools, 9( 1):35—46, 2004.

[NNS72] M E Newell, R G Newell, and T L Sancha. A New Approach to the Shaded Picture
Problem. In Proceedings o f the ACM National Conference, pages 443-450, 1972.

[NT03] K. Ng and B. Trifonov. Automatic Bounding Volume Hierarchy Generation using


Stochastic Search Methods. In CPSC532D Mini-Workshop ’’Stochastic Search A l­
gorithm s”, April 2003.

[ 0 ’R98] Joseph O ’Rourke. Computational Geometry in C. Cambridge University Press, New


York, NY, USA, 1998.

[ORM07] Ryan Overbeck, Ravi Ramamoorthi, and William R. Mark. A Real-time Beam
Tracer with Application to Exact Soft Shadows. In Proceedings o f Eurographics
Symposium on Rendering, pages 85-98, Jun 2007.

[PB85] D.J. Plunkett and M.J. Bailey. The Vectorization of a Ray-Tracing Algorithm for
Improved Execution Speed. IEEE Computer Graphics and Applications, 5(8):52-
60, Aug. 1985.

[PGSS06] Stefan Popov, Johannes Gunther, Hans-Peter Seidel, and Philipp Slusallek. Experi­
ences with Streaming Construction of SAH KD-Trees. IEEE Symposium on Inter­
active Ray Tracing, 2006., pages 89-94, Sept. 2006.

[PH04] Matt Pharr and Greg Humphreys. Physically Based Rendering: From Theory to
Implementation. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2004.

[PKGH97] M att Pharr, Craig Kolb, Reid Gershbein, and Pat Hanrahan. Rendering Complex
Scenes with Memory-Coherent Ray Tracing. In SIGGRAPH ’97: Proceedings o f
the 24th annual conference on Computer graphics and interactive techniques, pages
101-108, New York, NY, USA, 1997. ACM Press/Addison-Wesley Publishing Co.

[PSA07] Matt Phair, Ken Shoemake, and Other Anonymous Authors. Evenly Distributed
Points on Sphere, available at h t t p :/ /www .cgaf a q . inf o/wiki/Evenly_
distributed_points_on_sphere, 2007.
B IB LIO G R A P H Y 159

[ray] Ray-Polygon Intersection. available at http://www.siggraph.org/


education/materials/HyperGraph/raytrace/raypolygon_
intersection.htm.

[Rit90] Jack Ritter. An Efficient Bounding Sphere. Graphics gems, pages 301-303, 1990.

[RSH05] Alexander Reshetov, Alexei Soupikov, and Jim Hurley. Multi-Level Ray Tracing
Algorithm. ACM Transactions on Graphics (Proceedings o f SIGGRAPH 2005),
24(3): 1176-1185, 2005.

[RULOO] J. Revelles, C. Urea, and M. Lastra. An Efficient Parametric Algorithm for Octree
Traversal. In Proceedings ofW SCG, 2000, pages 212-219, 2000.

[RW80] Steven M. Rubin and Turner Whitted. A 3-Dimensional Representation for Fast
Rendering of Complex Scenes. Computer Graphics, (Proceedings o f SIGGRAPH
1980), pages 110-116, 1980.

[Sam89] Hanan Samet. Implementing Ray Tracing with Octrees and Neighbor Finding. Com­
puters And Graphics, 13:445-460, 1989.

[SB87] John M. Snyder and Alan H. Barr. Ray Tracing Complex Models Containing Surface
Tessellations. SIGGRAPH Computer Graphics, 21 (4): 119—128, 1987.

[SBGS69] R. A. Schumacker, R. Brand, M. Gilliland, and W. Sharp. Study for Applying Com­
puter Generated Images to Visual Simulation. Technical report, General Electric,
1969.

[SCS+08] Larry Seiler, Doug Carmean, Eric Sprangle, Tom Forsyth, Michael Abrash, Pradeep
Dubey, Stephen Junkins, Adam Lake, Jeremy Sugerman, Robert Cavin, Roger Es-
pasa, Ed Grochowski, Toni Juan, and Pat Hanrahan. Larrabee: A Many-Core X86
Architecture for Visual Computing. In SIGGRAPH ’08: ACM SIGGRAPH 2008
papers, pages 1-15, New York, NY, USA, 2008. ACM.

[SF90] K. R. Subramanian and Donald S. Fussell. Factors Affecting Performance of Ray


Tracing Hierarchies. Technical report, Austin, TX, USA, 1990.

[SF91] K. R. Subramanian and Donald S. Fussell. Automatic Termination Criteria for Ray
Tracing Hierarchies. In Proceedings o f Graphics Interface 91, pages 93-100, 1991.

[SFOl] R. J. Segura and F. R. Feito. Algorithms To Test Ray-Triangle Intersection. In


Proceedings ofW SC G ., pages 200-205, 2001.

[SH74] Ivan E. Sutherland and Gary W. Hodgman. Reentrant Polygon Clipping. Communi­
cations o f the ACM, 17( 1):32—42, 1974.

[Sho98] Ken Shoemake. Pliicker Coordinate Tutorial. Ray Tracing News, 11(1), 1998.

[SL96] Peter Soderquist and Miriam Leeser. Area and Performance Tradeoffs in Floating-
Point Divide and Square-Root Implementations. ACM Computing Surveys,
28(3):518-564, 1996.

[SS92] Kelvin Sung and Peter Shirley. Ray tracing with the BSP tree. Graphics Gems III,
pages 271-274,1992.

[SSE09a] SSE2. Wikipedia, 2009.

[SSE09b] Streaming SIMD Extensions. Wikipedia, 2009.


B IB LIO G R A P H Y 160

[SSK07] M. Shevisov, A. Soupikov, and A. Kapustin. Highly Parallel Fast KD-tree Construc­
tion for Interactive Ray Tracing of Dynamic Scenes. Computer Graphics Forum
(Proceedings o f Eurographics)., 26(3), 2007.

[STN87] Mikio Shinya, T. Takahashi, and Seiichiro Naito. Principles and Applications of
Pencil Tracing. SIGGRAPH Computer Graphics, 21(4):45-54, 1987.

[Sub91] Kalpathi Raman Subramanian. Adapting Search Structures to Scene Characteristics


fo r Ray Tracing. PhD thesis, University of Texas at Austin, Austin, TX, USA, 1991.

[Sun91] Kelvin Sung. A DDA octree traversal algorithm for ray tracing. In Proceedings o f
Eurographics ’91, pages 73-85, 1991.

[Sze03] Laszlo Szecsi. Graphics Programming Methods, chapter An Effective Implemen­


tation of the K-D tree, pages 315-326. Charles River Media, Inc., Rockland, MA,
USA, 2003.

[TH99] Seth Teller and Michael Hohmeyer. Determining the Lines Through Four Lines.
Journal o f Graphics Tools, 4(3): 11—22, 1999.

[Thi87] William Charles Thibault. Application o f Binary Space Partitioning Trees to Ge­
ometric Modeling and Ray-Tracing. PhD thesis, Georgia Institute of Technology,
Atlanta, GA, USA, 1987. Director-Bruce F. Naylor.

[VCI09] Visual C++ Language Reference, Compiler Intrinsics. available at http :/ /msdn .
microsoft.com/en-us/library/ 2 6 td 2 Ids.aspx, 2009.

[Wal04] Ingo Wald. Realtime Ray Tracing and Interactive Global Illumination. PhD thesis,
Computer Graphics Group, Saarland University, 2004.

[Wal05] Ingo Wald. The RTRT core. In SIGGRAPH ’05: ACM SIGGRAPH 2005 Courses,
page 5, New York, NY, USA, 2005. ACM.

[Wal07] I. Wald. On Fast Construction of SAH-based Bounding Volume Hierarchies. IEEE


Symposium on Interactive Ray Tracing, 2007. RT ’07., pages 33-40, Sept. 2007.

[War69] John Edward Warnock. A Hidden Surface Algorithm fo r Computer Generated


Halftone Pictures. PhD thesis, The University of Utah, 1969.

[Wat70] Gary Scott Watkins. A Real Time Visible Surface Algorithm. PhD thesis, The Uni­
versity of Utah, 1970.

[WBB08] Ingo Wald, Carsten Benthin, and Solomon Boulos. Getting Rid of Packets - Efficient
SIMD Single-Ray Traversal using Multi-Branching BVHs. IEEE Symposium on
Interactive Ray Tracing, 2008. R T 2008., pages 49-57, Aug. 2008.

[WBMS05] Amy Williams, Steve Barrus, R. Keith Morley, and Peter Shirley. An Efficient and
Robust Ray-Box Intersection Algorithm. Journal o f Graphics Tools, 10(l):49-54,
2005.

[WBS07] Ingo Wald, Solomon Boulos, and Peter Shirley. Ray Tracing Deformable Scenes
using Dynamic Bounding Volume Hierarchies. ACM Transactions on Graphics,
26(l):485-493, 2007.

[WBWS01 ] Ingo Wald, Carsten Benthin, Markus Wagner, and Philipp Slusallek. Interactive Ren­
dering with Coherent Ray Tracing. In Computer Graphics Forum (Proceedings o f
B IB L IO G R A P H Y 161

EUROGRAPHICS 2001, volume 20, pages 153-164. Blackwell Publishers, Oxford,


2001 .

[WFMS05] Ingo Wald, Heiko Friedrich, Gerd Marmitt, and Hans-Peter Seidel. Faster Isosur­
face Ray Tracing Using Implicit KD-Trees. IEEE Transactions on Visualization and
Computer Graphics, 11 (5):562—572, 2005.

[WFP+01] Michael Wand, Matthias Fischer, Ingmar Peter, Friedhelm Meyer auf der Heide, and
Wolfgang StraBer. The Randomized Z-buffer Algorithm: Interactive Rendering of
Highly Complex Scenes. In SIGGRAPH 2001, Computer Graphics Proceedings,
pages 361-370. ACM Press / ACM SIGGRAPH, 2001.

[WH06] I. Wald and V. Havran. On Building Fast Kd-Trees for Ray Tracing, and on Doing
That in 0 (N log N). IEEE Symposium on Interactive Ray Tracing, 2006., pages
61-69, Sept. 2006.

[WHG84] Hank Weghorst, Gary Hooper, and Donald P. Greenberg. Improved Computational
Methods for Ray Tracing. ACM Transactions on Graphics, 3(1 ):52—69, 1984.

|Whi80] Turner Whitted. An Improved Illumination Model for Shaded Display. Communica­
tions o f the ACM, 23(6):343-349, 1980.

[WIK+06] Ingo Wald, Thiago Ize, Andrew Kensler, Aaron Knoll, and Steven G Parker. Ray
Tracing Animated Scenes using Coherent Grid Traversal. ACM Transactions on
Graphics (Proceedings o f ACM SIGGRAPH 2006), pages 485-493, 2006.

[WK06J C Wachter and A Keller. Instant Ray Tracing: The Bounding Interval Hierarchy.
In Proceedings o f the 17th Eurographics Symposium on Rendering, pages 139-149,
2006.

[WMG+] Ingo Wald, William R Mark, Johannes Gunther, Solomon Boulos, Thiago Ize, War­
ren Hunt, Steven G Parker, and Peter Shirley. State of the Art in Ray Tracing
Animated Scenes. In Eurographics 2007 State o f the Art Reports, pages 89-116.

[Woo90] Andrew Woo. Fast Ray-Box Intersection. Graphics gems, pages 395-396, 1990.

[WREE67] Chris Wylie, Gordon Romney, David Evans, and Alan Erdahl. Half-Tone Perspective
Drawings by Computer. In AFIPS '67 (Fall): Proceedings o f the November Id-
16, 1967, fa ll joint computer conference, pages 49-58, New York, NY, USA, 1967.
ACM.

[WSC+95] Kyu-Young Whang, Ju-Won Song, Ji-Woong Chang, Ji-Yun Kim, Wan-Sup Cho,
Chong-Mok Park, and Il-Yeol Song. Octree-R: An Adaptive Octree for Efficient Ray
Tracing. IEEE Transactions on Visualization and Computer Graphics, l(4):343-349,
1995.

[WSS05] Sven Woop, Jorg Schmittler, and Philipp Slusallek. RPU: A Programmable Ray Pro­
cessing Unit for Realtime Ray Tracing. ACM Transactions on Graphics, 24(3):434-
444, 2005.

[WWZ+06] Peter Wonka, Michael Wimmer, Kaichi Zhou, Stefan Maierhofer, Gerd Hesina, and
Alexander Reshetov. Guided Visibility Sampling. In SIGGRAPH ’06: ACM SIG ­
GRAPH 2006 Papers, pages 494-502, New York, NY, USA, 2006. ACM.
B IB LIO G R A PH Y 162

[YLM06] Sung-Eui Yoon, Christian Lauterbach, and Dinesh Manocha. R-LODs: Fast LOD-
based Ray Tracing of Massive Models. The Visual Computer, 22(9):772-784, 2006.

[Zha98] Hansong Zhang. Effective Occlusion Culling fo r the Interactive Display o f Arbitrary
Models. PhD thesis, University of North Carolina at Chapel Hill, Chapel Hill, NC,
USA, 1998.

[ZMHH97] Hansong Zhang, Dinesh Manocha, Tom Hudson, and Kenneth E. Hoff. Visibility
Culling using Hierarchical Occlusion Maps. In SIGGRAPH ’97: Proceedings o f
the 24th annual conference on Computer graphics and interactive techniques, pages
77-88, New York, NY, USA, 1997. ACM Press/Addison-Wesley Publishing Co.

[ZRJ95] Maurice Van Der Zwaan, Erik Reinhard, and Frederik W. Jansen. Pyramid Clipping
for Efficient Ray Traversal. In Proceedings o f Eurographics Workshop on Rendering
95, pages 1-10. Springer, 1995.
Appendices

163
Appendix A

Software Design

A software system that uses several different underlying data structures and algorithms such as
ours must have an extensible design in which it is easy to add new rendering methods. The main
aim of the software is to assist and ease research. Thus, the emphasis is on the software being a
platform through which several algorithms can be easily implemented and compared. An object
oriented approach is used to develop the system, with the majority of rendering methods having
their own classes. The implementation follows a modular design that should be easy to understand
for a new developer in addition to being new components.

The system was implemented in C++, using Visual Studio 2005 / 2008 as the IDE. For a majority
of the application and benchmarks, the inbuilt compiler has been used. However, for a few al­
gorithms, like the Row Tracing algorithm, where maximal performance was desired, Intel’s C++
compiler was used to compile the application. In addition, profiling was undertaken with the
Intel’s VTune profiling application to identify the bottlenecks in the application.

A renderer, especially one that should be able to use several interchangeable data structures and
algorithms, is an intricate system with several classes interacting with each other. The aim of
the software is to provide the user with an easy to use system whereby s/he can easily select the
viewpoint. The application allows the selection of the camera parameters like perspective (angle
of view), zoom, camera location, etc through mouse based interaction with the scene. The system
should also allow the user to import models / scenes into the native format of the system. So far,
the application is able to import the Stanford ply format, obj format, 3DS format and a raw triangle
format. A separate importer for VRML models, currently not integrated into the system, has also
been implemented using the OpenVRML library. Importers for several formats were necessary
to have a set of commonly used models so that comparisons could be made against published
algorithms using the same scenes.

This chapter will detail the design issues, including the software and user interface design.

A .l Software Architecture

The software was developed using an object oriented methodology. The main algorithms and data
structures were encapsulated into their own separate classes. The class diagram in Figure A .l
shows the class diagram for the system.

164
A. 1 Softw are A rch itectu re 165

w in d o w
m an
Ray T racing

Figure A. 1: Software Class diagram.


A .l Softw are A rch itectu re 166

The diagram shows thal the architecture is a very simple one that follows object oriented concepts.
The data - the scene to be rendered - is abstracted from the auxiliary structures (the trees), the
user interface and the rendering algorithms.

The user interface classes - not shown in their entirety, but represented by classes ‘Ray Tracing
main window’ and ‘OpenGL Fram e’ in the diagram - are responsible for displaying the user
interface and receiving input from the user. The user inputs are received by the interface class
and passed on to the back end of the application. The class SceneManager is responsible for the
action of passing these commands from the front end to the back end. The scene to be rendered
is represented by the ‘Scene’ class that enables it to be in memory and its components to be
accessible to the renderer. The SceneManager creates the scene to be rendered by either loading
one that is stored in the hard disk or by importing one that exists in a different format. Once the
scene is loaded, most algorithms work with a tree as the underlying structure. Hence, a tree with
a user selected set of parameters is either loaded (if already created and stored onto the hard disk)
or created. Finally, when a rendering type is selected, the SceneManager creates the appropriate
renderer to render the scene. The SceneManager is thus a very important class.

A .1.1 R end erers and D ata S tru ctu res

The application was designed to allow integrating several data structures and algorithms for ren­
dering. The first step was to declare a set of abstract super classes to define the main functionali­
ties.

As the application’s main task is to render scenes, an abstract class - Renderer - was created that
defined the main functions to be implemented by any rendering method.

R en d erer - This class is the super class for all software rendering methods. It is an abstract
class defining several abstract methods that its subclasses have to implement. The main abstract
method it defines is:

• render - This method has to be implemented by each subclass, i.e., each new rendering
method. The subclasses use the specific algorithm to generate the image and the bitmap to
be rendered is stored in an object of the FrcimeBuffer class. This object is then passed onto
the OpenGL Frame object to display the contents of the current FrameBuffer.

As an abstract super-class, it is a placeholder for all rendering methods. Each rendering method is
defined as a subclass that implements this render method. A class RayTracingRender implements
a baseline ray tracing method using kd-trees. This method is used as the method against new algo­
rithms and data structures are compared. The Coherent Rendering method described in chapter 4
is implemented using the subclass SoftRasterizer.

However, when the method used is very similar to ray tracing, it is possible that several functional­
ities of ray tracing are duplicated. Hence, these rendering methods - Ray tracing using RBSP trees
(implemented by class RhspRTRenderer), Packet ray tracing (class PacketRayTracingRenderer -
also a method used as a baseline for comparison). Multi threaded packet ray tracer (class Mul-
tiPRTRenderer - another baseline rendering method), Row Tracing (class RowTracingRenderer
- implementing single threaded Row Tracing and Packet Row Tracing) and multi-threaded Row
tracer (class RowTracingRendererMT - implementing multi threaded Row Tracing and Packet
Row Tracing) - derive from RayTracingRenderer.
A. 1 SoftM'are A rch itectu re 167

The manager class - SceneManager - manages the delegation of work to the rendering methods
based on the user’s selection. Depending on the rendering method chosen, this class creates the
appropriate renderer object with chosen rendering parameters. The object then renders the scene
and displays the image.

As demonstrated, the addition of new rendering algorithms is very simple. The new algorithm is
implemented in its own class. The only modifications necessary to the implementation are to the
SceneManager class that has to create an object of the new Tenderer’s class.

Further, the several data structures are abstracted out of the rendering method so that a renderer
chooses whatever data structure it deems suitable. For eg., Row Tracing has been implemented on
kd-trees as well as octrees. This is easily achieved as the data structure classes are separated from
the rendering classes.

The data structures are also implemented in a similar manner. Since the structures investigated are
all tree structures, an abstract super class called Tree defines the main tree structure. All trees are
implemented as subclasses to this class.

The first data structure implemented was the kd-tree, defined in its own class KDTree.

kdTree The class represents a kd-tree. It provides the data variables and methods necessary to
create and use kd-trees.

A few of the important data members of the kd-tree class are as follows.

• noNodes - The number o f nodes in the kd-tree. Since the tree is stored as an array in
memory for efficiency, the number of nodes is used to allocate the nodes and to ensure that
out of bounds array items are not accessed.

• node Array - The nodes of the kd-tree are stored as an array in this structure. The array
consists of KDTreeNode items that span eight bytes each. Each KDTreeNode represents
either an internal node of the tree or a leaf node of the tree. The structures of these two
kinds of nodes are shown in Tables A. 1 A.2 respectively.

Represents Bits used


Unique pointer to children 32
Quantized value of split position 14
Leaf node flag 1
Unused bit 1
Split axis 16

Table A. 1: Kd-tree node structure.

Represents Bits used


Pointer to start of triangle list 32
Number of triangles in node 16
Leaf node flag 1
Unused bits 15

Table A.2: Kd-tree leaf node structure.


A .l Softw are A rch itectu re 168

• noTotalTriangles - The number of triangles contained by all the leaf nodes of the kdTree.

• triangleArray - In the leaf node of the kd-tree, an index an item of this array containing a
list of triangles is stored. The index points to the first triangle contained in the leaf node.
The leaf node also specifies the number of triangles in it. Together, the two members specify
the triangles contained in the leaf node.

• kdTreeSplitMethod - Indicates the splitting method of the construction process. The split
method could be one of space median, improved space median, SAH or octree. The octree
can be considered as a version of a kd-tree where space is split across all three axes and has
eight child nodes instead of three. However, the octree due to its uniformity does not need
its split position to be represented explicitly and can thus be represented with just 32 bits
that indicate the location of the first child node.

In addition to the data members, the kdTree class also provides methods to create the kd-tree with
several different heuristics. The main methods provided are:

• ConstructTree - The method that constructs the tree. It is a recursive method that works
in two passes. Due to the fact that nodes are stored in an array, the first pass counts the
number of nodes and triangles necessary and the second pass includes the data in these
nodes and triangles. The main factor in the quality of the kd-tree is the split method and
this is implemented in a separate method that returns the split position to be used for the
particular construction step. Upon determination of the split position, the construction step
then classifies the triangles as being on one of the sides or both sides of the split and recurses
if necessary.

• CountNodesAndTriangles - As mentioned earlier, the construction step is a two pass pro­


cess. This method is performed in the first pass and only counts the number of nodes and
triangles that the final tree must contain.

• findPosSAH - Uses the Surface Area Heuristic to determine the locally optimal SAH posi­
tion. The method uses several auxiliary methods to achieve this. It first clips the triangles in
the node by the bounding planes to obtain the potential split positions. Then, it sorts these
split points along all three axes. At each of these points, the SAH cost is determined along
three axes and the point with the minimum cost is determined as the locally optimum SAH
position.

• SortTriangles - This is the method that classifies the triangle as being to the left or to the
right of the split in consideration. An in place method is used that minimises slow memory
allocation and de-allocation operations.

• saveTreeBin - Once the tree with the given parameters is created, it is saved to the hard disk
for further use. The tree is stored in a binary format that is just a binary dump of the bytes
in the nodeArray followed by the bytes in the triangleArray.

• loadTreeBin - If a scene already has a tree constructed and saved, this method loads the tree
from the hard disk rather than constructing the tree from scratch.

• TraceRay - The method traces a ray through the kdTree. It initialises several variables
necessary for the traversal and calls the RecnrsiveRayTraversal method.

• RecursiveRayTraversal - The tree is recursively traversed by a ray in front-to-back order


until either a leaf node is reached or until an exit criteria is reached.
A . 1 Softw 'a re A rchitectu re 169

• ProcessLeafNode - If the recursive traversal reaches the leaf node, then this method pro­
cesses the leaf node and the ray. The ray is intersected with all the triangles in the node and
if there are one or more intersected triangles, the one with the closest intersection - as given
by the t intersection parameter - is returned.

As the KDTree class shows, the class for each data structure defines all the functionalities nec­
essary. They define construction of the structure with several heuristics if necessary. Since the
KDTree was initially developed in our system for ray tracing, the ray traversal code is integrated
in this class. However, this is planned to be moved to the RayTracingRenderer class to make the
design more coherent. In addition, the structure class is responsible for serialising - loading and
saving the structure from and to hard disks - the structure.

Since the application considers the octree as a special form of kd-trees, a separate class has not
been created for octrees. Instead, the kdTreeSplitMethod variable indicates that the tree is an octree
and the ConstructTree method ensures that an octree is created.

The other structure developed is the RBSP tree - described in Chapter 3. Although it is similar to
kd-trees, it has very specific construction and ray traversal methods and hence has been included
in its own class. The class defines the construction, loading and saving of RBSP trees. Rendering
using RBSP trees is handled by the RBSPTreeRenderer class.

A .l .2 Scen e Structure

Another consideration is the lower level consideration of how the actual scene is represented
in memory. This can have a major impact on the efficiency and extensibility of the system. It is
necessary to represent the scene so that it incorporates the main features of most rendering systems
like textures, lights, vertex normals, etc. in addition to the actual geometry. It is necessary to point
out at this time that the geometry of the scene consists solely of triangles. Also, the scene has
to be represented so that vertices shared between triangles are not duplicated. In addition, it also
necessary to abstract the scene from the rest of the application so that the actions pertaining to it
are achieved independently. The class Scene is thus implemented to fulfil these objectives.

Scene - The Scene class represents scenes to be rendered. The structure of an object of this class
is as follows:

• vertices - This is a list of vertices of the scene's triangles. The list contains four floating
point values for each vertex. Although, a vertex can be represented by just three floats, four
values are used with a dummy value at the end so that the data is better aligned.

• nbOfVertices - Indicates how many vertices exist in the scene. This is necessary as the
vertices are stored in an array that does not contain size information.

• triangles - This is a list of indices that point to a vertex in the vertices array. Each triangle
is indicated by the indices of the first coordinate of each of the three vertices forming the
triangle. The coordinates of the vertices are obtained by taking the three consecutive floating
point number for each vertex.

• nhOJTriangles - Indicates the number of triangles in the scene.

• vertexNonnals - A method similar to the vertices is used to represent the normals at every
vertex. The number of vertex normals is the same as the number of vertices. Hence, the
A .l Softw are A rch itectu re 170

same number - nbOfVertices - can be used. Four floating point values in which one is a
dummy is used to indicate the normal vector at the vertex.

• textureDimensions - The dimensions of the textures used are stored in this value. It stores
two integers per texture to indicate the length and the breadth of the texture.

• textures - This is a two dimensional byte array consisting of RGB values of a texture. The
array contains all the textures used in the scene and is referenced to obtain the colour due to
the texture.

• vertexCoIorsOrTextureCoordinates - This is the colour of a vertex or its texture coordinates.


The texture coordinates allow determining the colour at a pixel after the texture has been
applied. In case it is a colour, an RGB value is stored.

• triangleTextureNb - This is a pointer to a texture for every triangle in the scene. The tex­
ture’s colour is mapped onto the triangle to obtain the textured triangle.

• lightSourcePatches - Indicates the lights in a scene. Every light source is indicated with a
list of triangles that form the light source. The representation is similar to how triangles are
stored.

• lightSourceRadiance - Provides the radiance values for every light source, and is indicated
by one floating point value per light source.

• materials - This is a list of materials contained in the scene. The materials are defined in a
separate class that define values for diffuse, ambient and specular components in addition
to the opacity, refraction index and shininess values for the material. This list contains a list
of materials used by the scene.

• nbOfMaterials - The number of materials contained in the scene.

• vertexMateriallndex - A pointer to the material at this particular vertex.

The Scene class also contains a few methods to manipulate the scene. The main methods of the
Scene class are:

• LoadScene - This method loads a scene stored in this format to memory from the hard disk.

• SaveScene - Saves the currently loaded scene to the hard disk.

• AutoGenerateNormals - Generates the normals at the vertices using the triangle’s normal.

• InvertNormals - On a few occasions, the normals are directed opposite to how the scene
is structured. In these cases, this method can invert the normals so that the triangles of the
scene are oriented in the right direction.

• ComputeBoundingBox - Computes the bounding box of the scene. The bounding box of
the scene is represented by six coordinates - the minimum and maximum coordinates along
each axis. This is computed by determining the minimum and maximum coordinates of all
the triangles in the scene.

• operator+= - Combines two scenes to obtain a combined scene incorporating the two
scenes.

The class thus designed represents the scene and the necessary components with minimal dupli­
cation of data. The application currently loads and operates on only one scene at a time, though
A . l S o ftw a re A rch itectu re 171

loading and operating on multiple scenes is relatively easy. Again, the SceneManager class man­
ages the application by being the central object with which other components communicate. The
SceneManager class - being the main class sending and receiving messages - is thus an important
class of the application.

SceneM anager The SceneManager is a class that manages the operations of the application. It
is the direct interface between the user interface, the back end rendering classes and other classes
that can be independent. It is thus responsible for calling the appropriate methods when the user
requests an action. Its main data members are:

• scene - The scene object that is to be rendered.

• renderer - The renderer object that is responsible to render the scene. According to the
method chosen, the appropriate renderer object is constructed and the render method is
called.

• tree - The tree using which the scene is rendered.

Using the above data members, the SceneManager is responsible for loading the scene and the
corresponding tree. It allows the user to specify the parameters with which s/he wants to render
the scene. It also provides functionality to change the type of tree being used and the rendering
method used, to import scenes of different formats into the application and finally to save the
parameters for a rendering or even to save the scene being rendered in native format. It is to be
noted that the SceneManager does not actually implement any of the operations and just calls the
corresponding method from the corresponding object. The methods that allow the SceneManager
to achieve its responsibilities are:

• ChangeRenderingStyle - The method allows changing the rendering type from OpenGL
based to ray tracing, ray tracing with RBSP trees, Row Tracing and packet ray tracing. The
method creates a new renderer of the appropriate type and calls its render method.

• ChangeKDTreeMethod - Changes the kd-tree’s splitting method between space median,


SAH and octree types.

• ChangeKDTreeParameters - Changes other parameters of the kd-tree like the maximum


depth of the tree and number of primitives contained in the leaf node.

• ChangeRBSPTreeParameters - Similar to the kd-tree, the parameters of the RBSP tree can
be changed. In addition to the maximum depth and the number of triangles in the leaf node,
the RBSP tree has an additional parameter - the number of axes used to construct the tree.

• LoadOrConstructTree - Checks if the tree has already been constructed. If it has been, the
tree is loaded into the main memory. Otherwise, the tree construction method is called.

• SavePreferences - Preferences are parameters of a particular rendering scenario. It defines


the scene to render, camera parameters, the tree to use along with its main properties. These
parameters are saved for further retrieval by the LoadPreferences method. The preferences
make it easy to compare several algorithms and data structures with a given scenario. It is
also helpful when debugging problematic scenarios.

• LoadPreferences - This loads the preferences including the scene and the tree given by the
parameters. Subsequently, it applies the camera parameters stored in the parameters file to
the camera so that the viewpoint is exactly as it was when the preferences were saved.
A .2 Scene a n d Tree D a ta S tru ctu re R ep re se n ta tio n 172

• Load3DS, LoadPly, LoadlW, LoadPlyDir - Allows a user to import scenes of various for­
mats into the format used by the application. The method calls an importer that imports the
scene.

• LoadScene - Loads the scene from memory by calling the appropriate method in the Scene
class.

• SaveScene - Saves the scene to the hard disk by calling the appropriate method in the Scene
class.

• LoadSceneFromXML - Using XML files, several scenes can be combined into one scene.
This is very useful when comparing several different methods of rendering. The SceneM­
anager calls the XML importer to achieve this functionality.

The description of the SceneManager clarifies the role of the SceneManager as being integral to
the extensibility and simplicity of the design. By being an intermediary class between the various
components of the application, it allows the application to be modular, simple and extensible.

Thus, with this combination of individual rendering classes for each rendering method, separate
classes for every major data structure, a detailed Scene class and a manager class in the SceneMan­
ager class, the design is flexible enough that developing a combination of structures and rendering
methods is possible very easily.

A.2 Scene and Tree Data Structure Representation

The structures for the scene and the tree aim to maximise memory efficiency. This also achieves
good cache performance. While the members and methods of these structures has been described
earlier, the section below aims to explain them with more clarity.

A .2.1 S cene Structure

In SectionA. 1.2, the class structure for representing a scene was described. However, from the
class structure, it maybe difficult to identify the relations between the various data members of the
scene. Figure A.2 aims to clarify the structure of the scene and its representation in memory once
it has been loaded.
A .2 S cen e a n d Tree D a ta S tru ctu re R ep rese n ta tio n 173

RGB textures

Pointers to
textures

Triangle
0 0 1
texture index

Triangle list

vertex coordinate
list

vertex normal
coordinate list

vertex texture
coordinates . . II I I I la II 111
0

F ig u re A .2: S tru ctu re o f a S c e n e in m em ory.

Figu re A .2 sho w in g only the m ain o b je c ts o f th e scen e, in d ic a te s that the scen e o b ject c o n sists o f
a n u m b er o f array s fo r the m ain ele m e n ts o f the scen e. E ach trian g le in the sc en e c o n sists o f three
p o in ters to the vertex array that c o n sists o f a list o f all the vertices. T h e v ertex list is co m p o se d o f
four co m p o n en ts p er v ertex - the A', Y a n d Z c o o rd in a te s a n d a fo u rth c o m p o n e n t that is alw ays
the value 1. T he ad d itio n a l co m p o n e n t is ad d ed fo r m em o ry a lig n m en t issu es. T he vertex n o rm als
are sim ilarly sto red . T he vertex texture c o o rd in a te s are sto red in a sim ilar array, but w ith three
co m p o n en ts per vertex as it is a c c essed o n ly d u rin g the sh ad in g p art o f the ren d e rin g . T extures
are loaded into m em o ry as 2D R G B values and an a rray o f p o in te rs to the 2D tex tu res is created .
If a trian g le is tex tu red , a p o in te r in the tex tu re index array in d ic a te s the p a rtic u la r tex tu re fo r the
trian g le in co n sid e ra tio n . In this way, the m a jo r c o m p o n en ts o f the scen e are rep resen ted .

A.2.2 Kd-tree Data Structure

T he kd-tree has a stru c tu re that m in im ises the m e m o ry usag e, th u s e n ab lin g o p tim al cach e usage.
It co n sists o f a k d -tree class th a t rep re se n ts the tree. T h e no d es are defined in this class. In o rd er
to save m em o ry and m ax im ise co h e re n ce , the no d es o f the tree are sto red as an array. D ue to
this rep resen ta tio n , the tw o child nodes can be easily in d icated by ju s t one index as long as the
tw o child nodes are stored in a d jac en t item s o f the array. In ad d itio n to sav in g m em ory, the array
rep resen tatio n en su re s th at su b -trees are sto red clo ser, m ean in g that if a n o d e is lo ad ed into the
cache, several a d jac en t n o d es th at are also lo ad ed have a h ig h p ro b a b ility o f b ein g accessed .
A .3 U ser In terfa ce D esig n 174

In this rep resen tatio n , the first elem en t o f the array is the ro o t n ode o f the tree. T he tw o ch ild
nod es o f the root node are sto red as the next tw o ele m e n ts in the array. S im ilarly , each o f the
nod es has tw o child n o d es u n less they are le a f nodes.

O bviously, nodes can be o f tw o types - intern al n o d es and le a f n odes. Both kin d s o f nodes can be
in d icated w ith ju s t 8 b y tes (64 bits). Internal no d es need to sto re the split po sitio n , the axis th at
splits it. an ind icatio n o f if it is a le a f node and fin ally a p o in te r to the first child node.

R ep resen ts B its used


U nique p o in ter to c h ild ren 32
Q u an tised value o f split p o sitio n 14
L e a f n o d e flag 1
U n u sed bit 1
S plit axis 16

Table A .3: k d -tre e node stru c tu re

E ach internal node is sto red w ith the above stru ctu re. H ow ever, w h en a le a f node is to be re p re ­
sented. it has to store d ifferen t in fo rm atio n - the n u m b er o f tria n g le s in the node, the tria n g le s in
the node and an in d ic a to r sp ecify in g if it is a le a f node.

R ep re se n ts Bits used
P o in ter to start o f tria n g le list 32
N u m b er o f tria n g les in node 16
L e a f node flag 1
U n u sed bits 15

T able A .4: k d -tree le a f n ode stru c tu re

S to rin g the actual trian g les in the node is im p ractical. H en ce, an in d ex is sto red . T h e index p o in ts
to a triangle in the list o f trian g les th at co n ta in s all the trian g le s o f all the le a f nodes. T he trian g le
is the first trian g le in the node. In a d d itio n , the n u m b er o f tria n g le s in the n ode is also stored. T he
first triangle and the n u m b e r o f trian g le s thus en ab le id e n tifica tio n o f the trian g les in the node.
T hese trian g les are stored as in d ices p o in tin g to the list o f trian g le s in the S cen e object. T h e S c e n e
object then allow s access to the v ertices o f the trian g le. T he re p re se n ta tio n o f trian g les is m ad e
clea r by F igure A .3

T he data thus rep re se n te d p ro v id es a m o d u lar and efficient w ay to a c c ess the d ata. It m in im ise s
red u n d an t data being sto red keep in g the stru ctu res sm all. T h is m ak es the a p p licatio n m em o ry
efficient and also im p ro v es p erfo rm a n c e by b etter c a ch e c o h eren c e an d red u ced paging.

A.3 User Interface Design

A s an ap p licatio n d esig n e d m ain ly fo r research p u rp o ses, the u se r in terfa ce for the a p p lic a tio n had
several aim s. It w ould have to be an a p p lica tio n that:

• Is C ross p latfo rm .
A .3 U ser In terfa ce D esig n 175

triangle number of
start index triangles in
leaf node

leaf nodes

kd-tree global
triangle list

Scene
triangle list

Scene
vertex
coordinates Q

F ig u re A .3: O b tain in g th e le a f n ode trian g les.

• Is sim ple to use / has a m in im al learn in g curve.

• A llow s easy selectio n o f a scene.

• A llow s easy m a n ip u latio n o f the cam era.

• A ssists deb u g g in g .

• E nables re n d erin g w ith d ifferen t m eth o ds.

• E nables selectio n o f differen t d ata stru ctu res.

• E nables c o m p a riso n b etw een d ifferen t m eth o d s.

T he interface has been d esig n ed w ith these a im s in m ind. T h e u ser in terfa ce d esig n o f the a p p li­
cation w ith the above aim s in m ind, is d etailed below .

Cross platform - A lth o u g h , cu rre n tly the a p p lic atio n o n ly w o rk s on W in d o w s, it w as d e cid e d


early on during the d e v elo p m en t that a cro ss p la tfo rm a p p lica tio n is d esired . C o m p atib ility w ith
the L inux / U nix p la tfo rm w as d e sira b le due to the ro b u stn e ss o f the sy stem as also due to the
p o ssib ility o f fa ste r p erfo rm a n c e . F o r this p u rp o se, FLTK - th e F ast L ig h t T oolkit - w as se lected
as th e G U I too lk it. F ro m F L T K 's w eb site [FLT08] -

‘FL TK (p ro n o u n ced “fu lltic k ”) is a cro ss-p la tfo rm C + + G U I to o lk it fo r U N IX /L in u x


( X I 1), M ic ro so ft W in d o w s, and M acO S X. FLTK p ro v id es m o d ern G U I fu n c tio n a l­
ity w ith o u t the b lo at and su p p o rts 3D g rap h ic s v ia O p e n G L and its b u ilt-in G L U T
em u latio n .

FLTK is d esig n e d to be sm all and m o d u la r en o u g h to be statically linked, but w orks


fine as a sh ared library. FLTK also in clu d es an e x ce lle n t U I b u ild e r called F L U ID
A .3 U ser In terfa c e D esig n 176

that can he used to create a p p lic atio n s in m in u te s.’

A s the qu o te fro m FL TK su g g ests, it is a c ro ss p la tfo rm G U I d e v e lo p m e n t to o lk it that is lig h tw eig h t.


A d d itio n ally it su p p o rts O p e n G L - a key featu re fo r g ra p h ic s a p p lic a tio n s. D ev elo p m en t o f the
u ser interface is easily ach ie v ed th ro u g h the p ro v id ed F L U ID tool that is fairly sim p le to use.
T h ese facto rs m ad e the d ec isio n to use FL TK a stra ig h tfo rw a rd one.

Sim ple to use / has a m inim al learning curve - T he ap p lic a tio n d ev elo p e d by us has to be
sim ple to use. It w as n ece ssa ry that it p o p u la rly used fu n c tio n s co u ld be ac cessed w ith m in im al
effort. T h u s, it w as im p erativ e that the ap p lica tio n fo llo w ed a stan d ard a p p ro ach - that it had a
G U I w here all the fu n c tio n s w ere a c ce ssib le u sin g a m o u se. T h e m ain w in d o w s o f the ap p licatio n
can be show n by F ig u re A .4.

09

• Bom T(*trg ■Oxfftmzrd C Cyhndnt* Bom Tramg


r Born Tracrtg - SS£ Cpamota C sprint* Row Tracng
r Pom Trarng Pacws Tracing r sphere* *>t» Trsang
r Bom Tracng • MJb Trewoea
r Pac*rt Bow Tranng Mu* Trr
r Porn Tracing

F ig u re A .4: A p p lic a tio n W indow s

T he figure show s the a p p lic a tio n ’s sim p licity . It has fo u r w in d o w s w ith the larg est one b ein g the
o ne in w hich the im age is ren d ered .

Easy selection of scene W h en a scen e is lo ad ed , it is in a p p lic a tio n 's m e m o ry an d is d isp lay ed


in the largest w indow . T h e scen e is ch o se n by selec tin g the ap p ro p riate item in the File m enu
show n in F igu re A .5. A s the figure show s, the m en u e n a b le s scen es o f several d ifferen t fo rm ats to
be loaded using the G U I. In this m anner, the scen e is se lected in an easy m anner.

Easy m anipulation of cam era - To c o m p a re a lg o rith m s, it is n ece ssa ry to set a v iew p o in t


th at is suitable. To achieve th is, easy n av ig a tio n th ro u g h th e scen e and settin g the n ecessary
v iew p o in t and c a m e ra p a ra m ete rs is n ecessary . T h e a p p lic a tio n e n ab le s th is easily by allo w in g
A. 3 U ser In terfa ce D esign 177

Load Model
Save Model
lmport3DS
Import Pty
Import Ply Directory.
Import IW
Import xml model.
Load KdTree
Save Kd Tree
Load RBSPTree

Load Octree
Save RBSP Tree
Save image __
Load Preferences
Save Preferences

F ig u re A .5: File M enu

the u ser to rotate, tra n sla te and scale the m o del u sing the m o u se. T h e a p p lic a tio n a lso allow s
setting the p ersp ectiv e an g le and the z o o m fa c to r o f the cam e ra. T h e u ser c an m o v e / ro ta te the
cam e ra by d rag g in g the right m o u se b utton and the left m o u se bu tto n resp ectiv ely . In a d d itio n ,
the a p p licatio n p ro v id e s these fu n ctio n alitie s by w ay o f slid ers th at allow line tu n in g th e cam era
settin g s if necessary. F ig u re A .6 show s the m ain w in d o w in w h ich the scen e can be n a v ig a ted w ith
the m ouse. F ig u re A .6 also show s the w in d o w w ith slid ers w ith w h ich ro ta tio n s an d tra n sla tio n s
can be applied to the m odel.

F ig u re A .6: S ettin g the c a m e ra p a ram e te rs

Assist debugging - D u rin g the d ev elo p m en t o f new a lg o rith m s and d a ta stru c tu re s, th e re are
bound to be several in stan ces w h ere th in g s do not w o rk as ex p ected . D u rin g th e se in sta n c e s, it is
A .3 U ser In terfa ce D esig n 178

useful to v isu alise the m eth o d and the u n d erly in g d ata stru ctu res.

W h en a new stru etu re is b ein g d ev elo p e d , it n eed s to be v isu a lise d so th at the p ro b lem s c an be
identified and fixed w ith m in im al effort. T h e u se r in terfa ce allo w s the u ser to easily select o n e o f
the several d ata stru c tu re s im p le m e n ted an d to v isu alise it. F ig u re A . l sh o w s how several a sp ects
o f a data stru ctu re can be v isu alised d ep e n d in g on w h at is n ecessary .

■L
G l | Ray-Tracing [ G lj DACIsJ 0 0 _ R T ! KD T rees j Options [ Row Tracing | Testing

Display

17 Su rfa ces r O ctree r BSP Tree L levels Filled

r Wireframe r BSP Tree L levels

17 Tree r BSP Tree Last two levels

r V ertices r BSP Tree Leaf N od es


r BSP f ree Empty N od es
r Normals
r BSP Tree
F Bounding Box
r Show splitting a x es
(7 Textured Model _
1 Make U se of Display Lists for s c e n e rendering

Auxiliary GL Display Lists


l ' ' “ ..........
r Frustum/Camera DisplayLlst Current Frustum |
F U st 1
ru st 2
ru st 3
r u st 4 i
-------

F ig u re A .7: O p tio n s to v isu a lise the scen e and the d ata stru etu re

T he F igure A .8 sh o w s the R B S P tree v isu alised u sing the w ire fra m e m eth o d w h ere each n o d e ’s
edges are show n. D u ring the d ev elo p m e n t o f the stru ctu re, this v isu a lisa tio n en ab le d us to ju d g e
the q u ality o f the tree and to id en tify the p ro b lem s. A s F ig u re A . l sh o w s, several o th er ty p e s o f
v isu alisatio n s have been ad d ed to assist the d ev elo p m en t p rocess. T h e se v isu a lisa tio n s also show
the c h a ra cteristic s and q u ality o f the new stru ctu re dev elo p ed .

Enable rendering with different methods - T he u se r in te rfa ce m ak es it very sim p le to c h o o se


the ren d e rin g m eth o d . A s F ig u re A .9 sh o w s, the re n d erin g m eth o d can be c h o sen and c h an g ed
easily, m aking it easy to co m p a re d ifferen t re n d erin g m eth o d s. C o m b in e d w ith the ab ility to select
and chan g e data stru c tu re s fo r the scen e, this is a very effectiv e m eth o d to test several a lg o rith m s
o v er different stru ctu res.

Enable selection of different data structures - A s m en tio n e d earlier, selectin g the u n d erly in g
d ata stru ctu re is very im p o rtan t for the resea rc h . T h e ability to select d ifferen t u n d erly in g d ata
stru ctu res and ch a n g e it at run tim e is one o f the key fe a tu re s o f the a p p licatio n . W h en a stru c tu re
is selected , the a p p lic a tio n a ttem p ts to find the file in w h ich the tree is stored. If the file is fo u n d ,
the ap p licatio n lo ad s the tree. O th e rw ise , the ap p lic atio n calls the tree co n stru c tio n p ro c e ss w ith
the selected p aram eters. T he w in d o w a lso sh o w s the d ifferen t p aram ete rs that can be selected .
A .4 Sum m ary 179

F ig u re A .8: R B S P tree v isu a lise d on the B u n n y.

T his also m ak es it p o ssib le to c o m p are d ifferen t v ariatio n s o f th e sam e stru c tu re that are built w ith
d ifferen t param e te rs.

Enable com parison between different m ethods - T h e u se r in terfa ce w ith its ab ility to select
several differen t re n d erin g m eth o d s and several d ifferen t d ata stru ctu re s is a v ery effective to o l to
co m p are the d ifferen t m eth o d s im p le m e n te d w ith differen t u n d e rly in g stru ctu res.

A s show n, the u ser in terfa ce h as been c u sto m ise d so that several im p o rta n t fea tu res fo cu sin g on
research are p o ssib le e asily and w'ith m in im al effort. T h e a p p lic a tio n fu lfils its g o al o f bein g easy
and sim ple to use w hile at the sam e tim e o ffering several p o w e rfu l fe a tu re s fo r re se a rc h e rs. It
is fully d evelo p e d using the F L U ID tool that m ak es it e asy to add new e le m e n ts and to m ak e
m o d ificatio n s to ex istin g featu res. T h e c o m b in a tio n m ak es th e u se r in terfa ce a good fit fo r the
ap p licatio n .

A.4 Summary

A s the section show s, the so ftw are w as im p le m e n te d to assist the research . T h e sy stem is d e ­
v eloped in a m o d u la r m anner. Several d ata stru ctu re s and a lg o rith m s can be used to re n d e r a
p a rtic u la r scene. T h is allow s e a sy c o m p a riso n b etw een the re n d e rin g m eth o d s. T he sy stem also
A .4 Sum m ary 180

Rendering parameters

Rendering type OpenGL


Ray Tracing
Soft Monte-Carlo
Hardware-Based Monte Carlo
Soft Rasterization
Packet Ray Tracing
BSP Ray Tracing
kd and rbsp tree rendering - benchmark
Row Tracing
DACIS test
Multi Packet Ray Tracing

F ig u re A .9: S electin g a ren d e rin g m eth o d

Raytracng O' 0*Os'oO_»T k© Trees optora Paw Trxmg TfJtng OC Re^rrecng G) 0*03 00_RT KD Trw» Ofeons 1Mow Traewq Twarg

M M tM Ptt ft) 1* iVINLogK 94

HDTnee Tx*fsA>‘ tSrdu(r3«-«<i vj xz>Tntc Typ* E 2 E 3 H C E 3 3 5 B B B


Of MC
S*n pmuc*o Pjraweom MamrMBX 3
^ Ocwe 3: ot
3pe Poi«K> SIW BUMBSP Trt« I u**J or Bower* |
tmptv Spy* 60S (TdiTrr

P Auto trmnjhr icr*j « n : O Petxretf) r Auto TenrmM* «r*y «ona Kt SA* Peautefli
Ten*w*te 1 roue m n » & eountfcqto.JT" KX03 Tentmatr * node arts a %tf Dounong tR»(< >0«
Tmrwuw * brycov n toMTop *[ • . • Tef"wu* *0MlC0««t0itl0p*I JiOOOO

F ig u re A. 10: S e lectin g a d a ta stru ctu re

has a u ser in terfa ce that is sim p le and p ro v id es w ays to access the key fe a tu re s w ith m in im al effort.
Finally, it is ex ten sib le en a b lin g d ev elo p m e n t and use o f new ren d erin g m eth o d s. A ll in all. the
sy stem arch itec tu re has been a m ajo r fa c to r in e n ab lin g o u r research .
Appendix B

Optimising Row Tracing with SSE


instructions

B .0.1 S IM D In stru ction s

SIMD - Single Instruction, Multiple Data - instructions, though available earlier on other archi­
tectures, are a recent development on X86 processors that enable data level parallelism [SSE09b]
[SSE09a], They enable operating on a larger set of data using a single instruction. Intel’s SSE
instructions allow performing four floating point operations in parallel using a single instruction.
Judicious use of SSE instructions can significantly accelerate the performance. Row Tracing pro­
vides several instances where the use of such instructions are very beneficial. Intrinsics enable
implementing these instructions relatively easily and are supported by both Visual C++ [VCI09]
and Intel C++ compilers [Int09]. A few of these instances and the application of SSE to these are
detailed below.

Triangle intersection an d clipping Intersecting triangles in a leaf node and clipping the in­
tersection segment - as necessary for the leaf node processing - is one of the most frequently
performed operations of the algorithm. It is paramount that this part of the algorithm is as efficient
as possible. The process, described in detail in Sections 5.6.1 and 5.6.3, is implemented using
SIMD instructions to optimise them.

The SSE code below determines whether there is an intersection between the row plane and a
triangle and computes the intersection line segment if there is an intersection. The SSE instructions
in the code below leads to a performance boost by vectorising the operations a few operations. The
operations three dot products, vector additions and vector subtractions are achieved using SSE.
//load the three vertices of the triangle into SSE variables
ssePl = __mm_loadu_ps (verts + scTrs [x] );
sseP2 = _mm_loadu_ps(verts+scTrs[x+1]);
sseP3 = _mm_loadu_ps(verts+scTrs[x+2]);

//find the signed distances between the three points and the Row
Plane
rl = _mm_mul_ps(ssePl, sselmagePlane);
r2 = _mm_mul_ps(sseP2, sselmagePlane);
r3 = _mm_mul_ps(sseP3, sselmagePlane);
r4 = _mm_shuffle_ps(r1, r2, _MM_SHUFFLE(1,0,1,0));

181
182

r5 = _mm_shuffle_ps(rl, r2, _MM_SHUFFLE(3,2,3,2))


r6 = _mm_shuffle_ps(r2, r3, _MM_SHUFFLE(1,0,1, 0) )
r7 = _mm_shuffle_ps(r2, r3, _MM_SHUFFLE(3,2,3,2))
r4 = _mm_add_ps(r4, r5);
r6 = _mm_add_ps(r 6 , r7);
r3 = _mm_shuffle_ps(r4, r 6 ,_MM_SHUFFLE(3,2,2,0));
r4 = _mm_shuffle_ps(r4, r 6 ,_MM_SHUFFLE(3, 3, 3, 1) );
r7 = _mm_add_ps(r4, r3);

//check if there is an intersect ion


dirs = 7&(_mm_movemask_ps(r7) ) ;
if((dirs==0) II (dirs == 7))
return false;
//Find dO/dO-dl , dl/ dl~d2, d2/ d2-d0
rl = _mm_shuffle_ps(r7, r7, _MM_SHUFFLE(0,0,2,1));
rl = _mm_sub_ps(r7, rl);
rl = _mm_div_ps(r7, rl);
//Linearly interpolate to get pi, p2, plane intersection point
r2 = _mm_sub_ps(sseP2, ssePl);
r3 = _mm_shuffle_ps(rl, rl, _MM_SHUFFLE (0, 0, 0, 0) );
r2 = _mm_mul_ps(r2, r3);
r5 = _mm_add_ps(r2, ssePl);
//Linearly interpolate to get p2, p3, plane intersection point
r2 - _mm_sub_ps(sseP3, sseP2);
r3 = _mm_shuffle_ps(r1, rl, _MM_SHUFFLE(1,1,1,1));
r2 = _mm_mul_p s (r 2, r3);
r4 = _mm_add_ps(r2, sseP2);

//Select the right two points out of three


r2 = _mm_sub_ps(ssePl, sseP3);
r3 _mm_shuffle_ps(r 1 , rl, _MM_SHUFFLE(2,2,2, 2) ) ;
r2 = _mm_mul_ps(r 2 , r 3);
r 3 = _mm_add_ps (r2, sseP3);

rl = _mm_cmpgt_ps(r7, ZERO_ SSE) ;


r2 = _mm_shuffle_ps(r 1 , rl, _MM_SHUFFLE(0,0,2, l));
r2 = _mrn_xor_ps (r2 , rl);
r2 = _mm_andnot_ps(r2, MASK _TRUE);

r 1 = _mm_shuffle_ps(r 2 , r 2 , _MM_SHUFFLE(0,0,0, 0 ) );
r 6 = _mm_shuffle_ps(r 2 , r 2 , _MM_S HU F F L E (1,1,1, 1 ) );

r2 = _mm_and_p s (r 3, r 1 );
r5 = _mm_andnot_ps(r1, r5);
r 5 = _mrri_or_ps (r 5, r2 ) ;

r2 = _mm_and_ps(r3, r 6 );
r 4 = _mir._andnot_ps (r6 , r4) ;
r 4 = _mir'._or_ps (r 4 , r2) ;
//Select the right two points out of three

L istin g B .l: T ria n g le -R o w in te rse c tio n u sing SSE.

O nce the in tersec tio n seg m en t is found, it is to be e n su re d th at the seg m en t is fu lly in front o f the
n ear plane. T he co d e below is called o n ly if th e n o d e 's b o u n d in g box lies on b oth sid es o f the near
plane. SSE in stru ctio n s o p tim ise the o p e ra tio n by v ecto risin g the c a lc u latio n o f tw o do t p ro d u cts,
clip p in g w ith the n ear p lane an d finding the poin t in tersec tio n (if there is o n e) - b o th o f w h ich use
the p aram etric eq u atio n o f the line.
183

//handle cases when the intersection segment is partly in front


//and partly behind the near plane
if (bbPartlylnFront)
{

r9 =_mm_mul_ps(nearPlaneSSE, r4);
r8 =_mm_mul_ps(nearPlaneSSE, r5);
r6 =_mm_unpacklo_ps(r9, r 8 );
r3 =_mm_unpackhi_ps(r9, r 8 );
r9 =_mm_add_ps(r 6 ,r3);
r3 = _mm_movehl_ps( r9, r9);
r9 =_mm_add_ps(r9,r3);
float dl = M128_F32(r9)[0];
float d2 = M128_F32(r9)[1];
char plEehind = (((dl > 0 ) ) != gv_nearPlaneFarPlaneSign);
char p2Eehind = (((d2 > 0 ) ) != gv_r.earPlaneFarPlaneSign) ;
if (plBehind && p2Benind)
return false;
if (plBehind)
{
dl = d l / (dl-d 2 );
r9 = _mm_sub_ps(r5. r 4 )
rl = _mm_set_psl(dl);
r9 = _mm_mul_ps(r9, rl)
r4 - _mrr;_add_ps (r4, r 9)
ii
else if(p2Behind)
/
I
dl = d 2 / (d2 -dl);
r9 = _mm_sub_ps(r4, r 5)
rl - _mm_ser__psl (dl);
r9 = _mm_mul_ps(r 9, rl)
r5 = _mm_a d d_p s (r 5, r 9)
}

L isting B.2: S S E v ersio n o f alg o rith m th at h a n d les c a ses w h en the trian g le part b ein g ren d ered is
p artly in fro n t an d p a rtly b eh in d the view p o in t.

Finally, the code below en su res that o n ly p arts o f se g m en ts that are w ith in the n o d e ’s b o u n d in g
box are co n sid ered . S im ila r to the ab o v e o p e ra tio n , S S E code v e cto rise s finding the in tersectio n
to the three entry and exit p lan es by u sin g the p a ra m e tric eq u a tio n o f the line.

//clamp the intersect ion line to the bounding box


r 9 = _mm_sub_ps(r5,r4);
r 8 = _mm_rcp_ps(r9);
r 8 = _mm_min_ps(TNFTNITY_SSE_V2, r 8 );
tl = _mm_sub_ps(minVertexSSE, r4);
tl = _mm_mul_ps(1 1 , r 8 );
1 2 = _mm_sub_ps(maxVertexSSE, r4);
t 2 = _mm_mul_ps(1 2 , r 8 );
13 = _mm_min_ps(tl,t 2 );
t 2 = _mm_ma x _ p s (1 1 ,1 2 );
tl = _mm_shuffle_ps(13, t3, _MM_SHUFFLE(0,0,2,1))
tl = _mm_max_ps(1 1 , 13) ;
1 3 = _mm_shuffle_ps(tl, tl, _MM_SHUFFLE(1,1,1,1))
tl = _mm_max_s s (1 1 ,13);
tl = _mm_max_ss(tl,ZERO_SSE);
13 = _mm_shuffle_ps(t2, t2, _MM_SHUFFLE(0,0,2,1))
184

t 3 = _mm_min_ps(13, 1 2) ;
t2 = _ m m _ s h u f f l e _ p s (t3, t3, _MM_SHUFFLE(1,1,1,1));
t3 = _mm_min_ss(t3, t 2 ) ;
t3 = _ m m _ m i n _ s s (t3,ONE_SSE);
i f( _m m_ u c o m i g t _ s s ( t l , t3))
return false;
r 8 = _m m_ sh uf £ l e _ p s ( 1 1 , 1 1 , _MM_SHUFFLE ( 0 , 0 , C , 0 ) ) ;
r7 - _ m m _ s h u f f l e _ p s ( t 3 ,1 3 ,_MM_SHUFFLE(0, 0, 0, 0) ) ;
tl = _mm_inul_ps (r9, r 8 );
tl = _ m m _ a d d _ p s (r4, tl);
t2 - _ m m _ m u l _ p s (r9, r7);
t2 = _ m m _ a d d _ p s (r4 , t2) ;
//clamp the intersection line to the bounding box

return true;

L isting B.3: S S E versio n o f C la m p in g the in te rse c tio n line to the h o u n d in g box.

Node Projection O verestim ate onto the Row Finding the node projection overestimate is an
operation that is undertaken at every traversal step. Hence, it is imperative that this is as optimised
as possible. S S E allows the calculation of this overestimate using the code below. The optimisa­
tion occurs due to the vectorisation of calculation of four dot products, scalar multiplication to a
vector and addition of a constant to all the components of the vector.
void CalculateNodeOverestimate()
<
minVertexSSE = *wMinDiag;
maxVertexSSE = *wMaxDiag;
r5 =*xMaxDiag;
rl -*xMinDiag;
//////////////////////////////////////////////////
//FourDotProds ( tl, 12, r8, r9,
m3, m3, mO, mO) ;
r l - _mm_mul_ps(maxVertexSSE, m3);
r4 = _mm_mul_ps(minVertexSSE, m3);
r5 =_mm_mul_ps(r5, mO);
rl =_mm_mul_ps(r1, mO);

r 8 = _mm_movelh_ps( r7, r4);


r9 - _mm_movehl_ps(r4 , r7);
r l = _mm_movelh_ps( r5, rl);
r4 =_mm_movehl_ps(rl, r5);

r8 =_mm_add_ps(r 8 , r9) ;
r4 =_mm_add_ps(r4 , r l ) ;

rl = _mm_shuffle_ps(r 8 , r4, _MM_SHUFFLE(2,0,2,0));


r8 =_mm_shuffle_ps(r 8 , r4, _MM_SHUFFLE (3,1,3,1));

r 8 =_mm_add_ps(r 8 ,r7);
//FourDotProds - Results are in r8
//r8 = {m3.wl, m3.w2, mO.xl, m0.x2)
////////////////////////////////////////////////

rl =_mm_shuffle_ps(r 8 , r 8 , _MM_SHUFFLE(1,1,0,0));
rl = _mm_movehl_ps(r 8 , r 8 );
r1 =_mm_rcp_ps(r 1 ) ;
185

r 1 = _mm_mul_ps(r7, rl) ;
r 1 = _mm_mul_ps(r 1 , HALF_WIDTH_SSE);
r 8 = _mm_add_ps(r 1 , HALF_WIDTH_SSE);

r5 = _mm_movehl_ps(r 8 , r 8 );
r1 = _mm_min_ps(r 8 , r5) ;
rl = _mm_shuffle_ps(rl, rl, _Mt4_SHUFFLE (1, 1, 1, 1)
rl = _mm_mi n_s s (r 7, rl) ;

r 3 = _mm_ma x_p s (r 8 , r5) ;


r4 = _mm_shuffle_ps(r3, r3, _MM_S H U F F LE (1,1,1,1)
r 5 = _mm_ma x_s s (r 3, r 4) ;

L isting B .4: F in d in g the row o v e re stim a te using S SE

R a s te r is in g th e L a s t 8 P ix e ls - W ith eig h t p ix els, the p ro ce ssin g can be ea sily d one using tw o


S S E units to o p tim ise the p ro cess. T h e co d e below sh o w s the im p le m e n ta tio n in w h ich each
f l o a t co m p o n en t o f an S SE v ariab le c o rre sp o n d s to a pixel. By u sin g tw o S S E u nits and tw o
iteratio n s, the tria n g le s fo r the eig h t pix els are d ete rm in e d .

RasteriseSPixelsSSE( ) / / i n t minX)
(

int minX = minXInt-(minXInt&7), maxX = minX+7, i, startX, endX;


LrlntLlne = iritPoints;
lineTrs = ( ml28 * ) (gv_lineTriangles+minX) ;
r5 = MINUS_ONE;
r 6 = MINUS_ONE;
M12 8_I32(rl)[0] - gv_lineOcclusionMap[(occlMaxDepthStartIndexrminX)>>3];
rl = _mm_shuffle_ps(r1,r1,_MM_SHUFFLE(0,0,0,0));
r 8 = _mm_and_ps(rl, MASK_OCCL_LOW);
r 8 = _mm_cmpeq_ps(r8 , ZERO_SSE);
r9 = _mm_and_ps(r1, MASK_OCCL_HIGH);
r9 = mm cmpeg ps(r9, ZERO_SSE);

first4Pixels = _mm_set_psl(minX);
minXPlus4 = _mm_add_ps(first4Pixels, FOUR);

first4Pixels - _mm_add_ps(first4Pixels, zeroToThree);


second4Pixels = _mm_add_ps(minXPlus4, zeroToThree);
maxXml2 8 = __mm_set_psl (maxX) ;

for(i=0; i < intPointSize; i++)


{
startX = trIntLine->xl;
startX = MAX_2(startX,minX);
endX = trIntLine->x2;
endX - MIN_2(endX, maxX);

if (startX <= endX)


{

startXM128 = _mm_set_psl(startX);
endXM128 = _mm_set_psl(endX);
bM128 = _mm_set_psl(trIntLine->b);
aM128 - _mm_set_psl(trIntLine->a);
M128_I32(trM128)[0] = trIntLine->tr;
t rM128 = _miri_shuf f le_ps (trM128, trM128, _MM_SHUFFLE (0, 0,0, 0) );
186

////Rasterise the first 4 pixels


r l = _mm_cmple_ps(first4Pixels, endXM128);
r3 = _mm_cmpge_ps (first4Pixels., startXM128)
r 7 = _mm_and_ps(r7, r3);
r 3 = _mm_cmp1e_p s (first4Pixels, minXPlus4);
r 3 = _mm_and_ps(r3, r7);
r 3 = _mm_andnot_ps(r3, MASK_TRUE);
r7 = _mm_mul_ps(aM128, first4Pixels);
r l = _mm_add_ps( r l , bM128);
r l = _mm_o r_p s ( r l , r 3);
rl = _mm_o r_p s (r3, trMl28);
r 4 = _mm_cmpgt_p s (r 7, r 5);
r 4 = _mm_and_ps(r4, r 8 );
r 5 = _mm_max_ps(r7, r5);
r 1 = _mm_and_ps(r4, rl);
r2 = _rnm_andnol_ps (r4, 1ineTrs [0 ] );
lineTrs[0] = _mm_or_ps(rl, r2);

//Rasterise the second 4 pixels


r l = _mm_crr.ple_ps (second4Pixels, endXM128) ;
r l = _mm_and_ps( r l , _mm_cmpge_ps(second4Pixels, startXMl28))
r 3 = _mm_and_ps( r l , _mm_cmple_ps(second4Pixels, maxXml28));
r 3 = _mm_andnot_ps(r3, MASK_TRUE);
r l = _mm_mijl_ps (aMl 28, second4Pixels );
r l = _mm_a dd_p s (r 7, bM 12 8 );
r l = _mm_or_ps(r7, r3);
r 3 = _mm_or_ps(r3, trM128);
r 4 = _mm_cmpgt_ps( r l , r 6 );
r4 = _rnm_and_ps (r 4 , r 9) ;
r6 _mm_max_ps(r7, r 6 );
rl = _mm_and_ps(r4, r3);
r 2 = _mm_andnot_ps(r4, lineTrs[l]);
lineTrsfl] = _mm_or_ps(rl, r2) ;
}
trIntLine++;

}
r5 - _mm_cmpneq_ps(r5, MINUS_ONE);
r 6 = _mm_cmpneq_ps(r 6 , MINUS_ONE);

unsigned char shadedFlag= _mm_movemask_ps(r5) I (_mm_movemask_ps(r 6 ) <<


4) ;
UpdateOcclusionMapBy 8 (minX, shadedFlag);
}

L istin g B.5: R a ste risin g the last eig h t pix els u sing SSE.
Appendix C

Low level Optimizations

Low level o p tim iz atio n s en ab le a w ell d e sig n ed alg o rith m to ru n even faster. S om e o f the low'
level o p tim iza tio n s used in the im p lem en tatio n w ere m ulti th re a d in g the ap p lica tio n and the use
o f data level p ara lle lism th ro u g h the use o f S S E in stru c tio n s. M ulti th rea d in g p ro v id ed a sp eed -u p
o f aro u n d 3.5x - 3.9x fo r m any ren d erin g s. S S E p ro v id e d a sp e e d -u p o f 2x - 3x the n o n-S S E
m eth od . T he use o f these tw o fo rm s o f o p tim iz a tio n has en a b le d us to sp eed up o u r a lg o rith m to
co m p etitiv e levels.

SSE

T he code fo r a few' m eth o d s o p tim ized w ith S S E are given below . In the below code, it is to be
n o ted that r l , r ‘2. r3. r l. r5. rG. r l . r8 . r 9 are S S E v a riab les that are defin ed as class v ariables.

F our dot P roducts w ith SSE

T he m eth o d is a p art o f the m eth o d that finds th e p ro je c tio n o f the X -c o o rd in ate o f tw o points
o n to the im age line in the row tracin g alg o rith m . It c o m p u te s fo u r d o t p ro d u c ts - 11.m l , 1 2 .m 2 ,
1 3 .m 3 , 14.m 4 - and sto res th e value in r8 . S in ce w e do not use a stru ctu re o f array s as re c ­
o m m en d ed as th e best m eth o d to use S S E . w e have to use shuffles to h o rizo n tally add the value.
H ow ever, the n u m b e r o f shuffles and h o riz o n ta l m o v es is k ep t to a m in im u m .

FourDotProds(11, 12, 13, 14, ml, m2, m3, m 4 )


{
rl = _mm_mul_ps(1 1 , ml);
r4 = _mm_mul_ps(12, m2);
r5 = _mm_mul_ps(13, m3);
rl = _mm_mul_ps(14, m4);

r8 = _mm_movelh_ps( r7, r4);


r9 = _mm_movehl_ps(r4, r7);
r3 = _mm_movelh_ps( r5, rl);
r4 = _mm_movehl_ps(r1, r5);

r 8 = _mm_add_ps(r 8 , r9);
r4 = _mm_add_ps(r4, r3);

187
188

r3 = _mm_shuffle_ps(r 8 , r4, _MM_SHUFFLE(2,0,2,0));


r 8 = _mm_shuffle_ps(r 8 , r 4 , _MM_SHUFFLE(3,I,3,1));

r 8 = _mm_add_ps(r 8 ,r3);
}

If a stru ctu re o f array s as su g g ested fo r use w ith S S E w ere to be u sed , the d ata w o u ld first have to
be reo rg an ized into this lay o u t. C o n seq u en tly , the d o t p ro d u c ts c o u ld be c a lc u la te d w ith a red u ced
n u m b er o f in stru ctio n s.

T he layout can be c h an g ed to a S S E frien d ly n atu re by u sing the m acro a lread y defined as a p art
o f V isual C ++. T he m acro can be given by

_MM_TRANSPOSE4_PS(rowO, row], row2, row3) {


ml28 tmp3, tmp2, trapl, tmpO;

tmpC = _mm_shuffle_ps((rowO), (rowl), 0x44)


tmp2 = _mm_shuffle_ps((rowO), (rowl), OxEE)
tmpl = _mm_shuffle_ps((row2), (row3), 0x44)
tmp3 - _mm_shuffie_ps((row2), (row3), OxEE)

(rowO) = _mm_shuffle_ps(tmpO, tmpl, 0x88)


(rowl) = _mm_shuffle_ps(tmpO, tmpl, OxDD)
(rcw2) = _mm_shuffle_ps(tmp2, tmp3, 0x88)
(rew3) = _mm_shuffle_ps(tmp2, tmp3, OxDD)

T he m acro tak es in the row s in the n orm al fo rm at and tra n sp o se s it so that all the x. y and z
co m p o n en ts are now in a single row' each. In th is fo rm a t, fo u r dot p ro d u cts can be achieved w ith
few er in stru ctio n s. If l x . l y . l z and m x . m y . m z in d icate the c o m p o n e n ts o f the four v ecto rs
arran g ed in a stru ctu re o f array s form at, then th e fo u r dot p ro d u cts c an be ach ie v ed very e a sily as
show n below.

FourDotProds (lx, ly, lz, mx, my,


{
r7 = _mm_mul_ps(lx, mx);
r4 = _mm_mul_ps(ly, my);
r5 = _mm_mul_ps(lz, mz);

r 8 = _mm_add_ps(r7, r 4);
r4 = _mm_add_ps(r 8 , rS);

T hree dot P roducts w ith SSE

In row tracing , th ere are also c a ses w hen th ree dot p ro d u c ts are n ecessary . T h is is w'hen a p o in t's
X and Z c o -o rd in a tes are to be p ro jected o n to the im ag e row. T h o u g h , the sam e m eth o d used for
fo u r dot pro d u cts c an be used, it is p o ssib le to e lim in ate one m u ltip ly in stru ctio n w hen th ree dot
p ro d u cts are calc u la te d . T h u s, three dot p ro d u c ts are ach ie v ed by the belo w code that c o m p u te s
11.d. 12.d. 13.d.
ThreeDotProds(11, 12, 13, d)
(

rl - _mm_mul_ps(1 1 , d) ;
189

r2 = _mm_mul_ps(1 2 , d) ;
r 3 = _mm_mul_p s ( 1 3, d) ;

r4 = _mm_shuff 1 e_p s (rl, r2 _MM_SHUFFLE(1, 0, 1, 0 ) )


r5 = _mm_shuf fle_ps (rl, r2 _MM_SHUFFLE(3, 2, 3, 2} )
r6 = _mm_shuffle_ps (r2 , r3 _MM_SHUFFLE(1, 0,1,0))
r7 = _mm_sh u f f 1 e_p s (r2 , r3 _MM_SHUFFLE(3,2, 3,2))

r 4 = _mm_add_ps(r4 , r5) ,
r 6 = _mm_add_ps(r 6 , r7) ;

r 3 = _mm_shuffle_ps (r 4 , r6 _MM_SHUFFLE(3, 2, 2, 0) )
r 4 = _mm_shuffle_ps (r4 , r6 _MM_SHUFFLE (3,3,3,!))

r7 = _mm_add_ps(r4, r 3) ,

D e te rm in in g e n tr y a n d ex it p la n e s fo r R B S P tre e s w ith S SE

For R B S P trees, due to the existence of several axes, determining the entry and exit planes can be
done in groups of four axes using SSE . S S E thus allows the computation of four entry and exit
planes in the same time as one plane when S S E is not used - improving performance significantly.
The below code achieves this by considering groups of four axes each.
for(i~0, k= 0; i < noSplitPlanes; i+=4, k++)
1
dirKec = jnm_adci_ps (_mm_mul_ps (vDirSSEX, planeNormalsSSEAl1:k*3]),
_mm_add_ps(_mm_mul_ps(vDirSSEY, planeNormalsSSEAll[k*3+1]),
_mm_muT_ps(vDirSSEZ, planeNormalsSSEAl1[k *3+ 2])));

flag - _mm_cmpeq_ps (dirRec, ZEF.O) ;


flag = _mm_and_ps(f1ag, FEPSILON_M128);
dirRec - _mm_add_ps(dirRec, flag);

tempt ^_mm_rcp_ps(dirRec);
dirRec = _mm_sub_ps(_mm_add_ps(tempi,tempi),_mm_mul_ps(_mm_mul_ps(tempi,
tempi),di rRec));

tempi = _mm_mul_ps(tSSEMin[k], dirRec);


temp2 = _mm_mul_ps(tSSEMax[k] , dirRec);

flag = _mm_cmpgt_ps(dirRec, ZERO);


tSSEO = _mm_or_ps (_mm_and_ps (tempi, flag), _mrr._ar.dnot_ps (flag, temp2));
tSSEl = _mm_or_ps(_mm_and_ps(temp2, flag), _mm_andnot_ps(flag, tempi));

tminSSE = _mm_max_ps(tminSSE, tSSEO);


tmaxSSE = _mm_min_ps(tmaxSSE, tSSEl);

_mm_store_ps((t+2*i), _mm_unpacklo_ps(tSSEO, tSSEl));


_mm_store_ps((t+2*i+4), _mm_unpackhi_ps(tSSEO, tSSEl));

signs = _mm_mcvemask_ps(flag) ;
rayDirs[i] = (signs & 1);
rayDirs[irl] = (signs & 2)>>1;
rayDirs[i+2] = (signs & 4)>>2;
rayDirs[i + 3] = (signs & 8 )>>3;
}
190

tmin = MAX_2(tminSSE.ml28_f32[0], tminSSE.ml28_f32[1]);


tmin = MAX_2(tmin, tminSSE.ml28_f32[2]);
tmin = MAX_2(tmin, tminSSE.ml28_f32[3]);

tmax = MIN_2(tmaxSSE.ml28_f32[0], tmaxSSE.ml26_f32[1]);


tmax = MIN_2(tmax, tmaxSSE.ml2 8_f32[2]);
tmax = MIN_2 (tmax, tmaxSSE.m!28_f32[3]);

if(tmax < 0 I| tmin > tmax)


return - 1 ;

R ow / P lane intersection

For the row tracin g alg o rith m , it is n ece ssa ry to p e rfo rm a row / p lan e in te rse c tio n at each traversal
step. T he test involves tw o dot p ro d u cts fo llo w ed by a test o f th e signs. It can be im p lem en ted in
S S E to achieve sp eed -u p as show n below'.

ImageLinelntersectsBBSSE()
{
rl = _mm_mul_ps(sselmagePlane, minVertexSSE);
r2 - _mm_mul_ps(sselmagePlane, maxVertexSSE);

r3 = _mm_unpacklo_ps(rl, r2);
r4 - _mm_unpackhi_ps(r1, r2J;

r3 = _mm_add_ps(r3,r4);
r4 = _mm_shuffle_ps(r3,r3, _MM_SHUFFLE(3,2,3,2));
r3 = _mm_add_ps(r3,r4);

r3 = _mm_cmpgt_ps(r3, zero);
r3 = _mm_and_ps(r3, one);
r4 = _mm_shuffle_ps(r3,r3, _MM_SHUFFLE(0,0,0,1));
return _mm_comineq_ss(r3,r4);
)

Row / T riangle intersection clam p in g in SSE

In tersectin g tria n g le s in a le a f node and clip p in g the in te rse c tio n se g m en t - as n ecessary fo r the
le a f node p ro c e ssin g - is o n e o f the m o st freq u e n tly p e rfo rm e d o p e ra tio n s o f the alg o rith m . It
is p aram o u n t th at th is part o f the alg o rith m is as efficien t as p o ssib le. T h e p ro cess, d escrib ed in
detail in section 5.6.1 and 5 .6 .3 , is im p lem en te d u sing S1M D in stru c tio n s in o rd er to o p tim ize
them .

T he below cod e lists the S S E code to d e te rm in e w h e th e r there is an in tersec tio n b e tw een the row
plane and a trian g le and c o m p u tes the in te rsec tio n line seg m en t if th ere is an in tersec tio n . T he
S S E in stru ctio n s in the below' code v ecto rizes the o p eratio n by p e rfo rm in g the three dot p ro d u cts
and v ecto r a d d itio n s and su b tractio n s leading to a p e rfo rm a n c e boost.

//load the three vertices of the triangle into SSE variables


ssePl = _mm_loadu_ps(verts+scTrs[x]);
sseP2 = _mm_loadu_ps(verts+scTrs[x+1]);
191

sseP3 = _mm_loadu_ps(verts+scTrs[x+2]);

/ / f i n d t h e s i g n e d d i s t a n c e s b e t w e e n t h e t h r e e p o i n t s a n d t h e Row
Plane
rl - _mm_mul_ps(ssePl, sselmagePlane);
r2 = _mm_mul_ps(sseP2, sselmagePlane);
r3 = _mm_mul_ps(sseP3, sselmagePlane);
r4 - _mm_shuffle_ps(rl, r2, _MM_SHUFFLE(1,0,1,0))
r5 = _mm_shuffle_ps(rl, r2,_MM_SHUFFLE(3,2,3,2))
r 6 = _mm_shuffle_ps(r2, r3, _MM_SHUFFLE(1,0,1,0))
r~ - _mm_shuffle_ps(r2, r3, _MM_SHUFFLE(3,2,3,2))
r4 = _mm_add_ps(r4, r5);
r 6 = _mm_add_ps(r 6 , r l ) ;
r3 = _mm_shuffle_ps(r4, r 6 ,_MM_SHUFFLE(3,2,2,0));
r4 - _mm_shuff1e _ p s (r4, r 6 ,_MM_SHUFFLE(3,3,3,!));
r l = _mm_add_ps(r4, r3);

/ / c h e c k i f t h e r e i s an i n t e r s e c t i o n
dirs = 7&(_mm_movemask_ps(r l ));
if((dirs==0) (dirs == 7))
return false;
/ / F i n d dO/dO -di , d l / d l - d 2 , d 2 / d2-dO
rl = _mm_shuffle_ps(r7, r l , _MM_SHUFFLE(0, 0, 2, 1) ) ;
rl = _mm_sub_ps(r7, rl);
rl = _mm_div_ps(r7, rl);
/ / L i n e a r l y i n t e r p o l a t e t o g e t p i , p2, p l a n e i n t e r s e c t ion p o i n t
r2 = _mm_sub_ps(sseP2, ssePl);
r3 - _mm_shuffle_ps(rl, rl ,_MM_SHUFFLE(0,0,0,0));
r2 = _mm_mul_ps(r2, r3);
r5 = _mm_add_ps(r2, ssePl);
/ / L i n e a r l y i n t e r p o l a t e t o g e t p2, p3, p l a n e i n t e r s e c t ion p o i n t
r 2 = _mm_sub_ps(sseP3, sseP2);
r3 = _mm_shuffie_ps(r1, rl, _MM_SHUFFLE(1,1,1,1));
r2 - _mm_mul_ps(r2, r3);
r4 = _mm_add_ps(r2, sseP2);

//S e l e c t t h e r i g h t t wo p o i n t s o ut o f t h r e e
r2 - _mm_sub_ps(sseP1, sseP3);
r3 = _mm_shuffle_ps(rl, rl, _MM_SHUFFLE(2,2,2,2));
r2 = _mm_mul_ps(r2, r3);
r3 = _inm_add_ps (r2, sseP3);

rl = _mm_cmpgt_ps(r7, ZERO_SSE);
r2 - _mm_shuffle_ps(rl, rl, _MM_SHUFFLE(0,0,2,1));
r2 = _mm_xor_ps(r 2 , rl);
r2 = _mm_andnot_ps(r2, MASK_TRUE);

rl = _mm_shuffle_ps(r2, r2, _MM_SHUFFLE(0,0,0,0));


r 6 = _mm_shuffle_ps(r2, r2, _MM_SHUFFLE(1,1,1,1));

r2 - _mm_and_ps(r3, rl);
r5 = _mm_andnot_ps(rl, r5);
r5 = _mm_or_ps(r5, r2 );

r2 = _mm_and_ps(r3, r 6 );
r4 = _mm_andnot_ps(r 6 , r4);
r4 = _mm_or_ps(r4, r2 );
//S e l e c t t h e r i g h t t wo p o i n t s out of three
192

O nce the in terse c tio n se g m en t is fo u n d , it is to be e n su re d th at the seg m en t if fu lly in front o f the


n e a r plane. T h e below c o d e is called o n ly if the n o d e 's b o u n d in g box lies on both sides o f the near
p lane. S S E in stru ctio n s o p tim iz e the o p eratio n by v ecto rizin g the c alc u la tio n o f tw o d o t p ro d u cts,
clip p in g w ith the n ear p la n e and finding the in terse c tio n poin t ( if th ere is o ne) - both o f w h ich
use the p a ra m e tric eq u a tio n o f the line.

/ / h a n d l e c a s e s when t h e I n t e r s e c t i o n segment is partly in fron t


//a n d p a r t l y behind the near plane
if(bbPartlyInFront)
<
r9 = _mm_mul_ps(nearPlaneSSE, r4);
r 8 = _mm_mul_ps(nearPlaneSSE, r5) ;
r 6 = _mm_unpacklo_ps(r9, r 8 );
r3 = _mm_unpackhi_ps(r9, r 8 );
r9 = _mm_add_ps(r 6 ,r3) ;
r3 = _rnm_movehl_ps( r9, r9) ;
r9 = _mm_add_ps(r9,r3);
float dl = M 128_F32(r9) [0 ];
float d2 = M128_F32(r9) (1] ;
char plBehind = ( ( (dl > 0 ) ) != gv_nearPlaneFarPlaneSign);
char p2Behind = ( ( (d2 > 0 ) ) != gv_nearPlaneFarPlaneSign);
if(plBehind && p2Behind)
return false;
if(piBehi nd)
{
dl = dl/(dl-d 2 );
r9 = _mm_sub_ps (r5, r4);
rl = _mm_set_psl(dl);
r9 — _mm_mul_ps(r9, rl);
r4 = _mm_add_ps(r4, r9) ;
}
else if(p2Behind)
{

dl = d 2 /(d 2 -dl);
r9 = _mm_sub_ps(r4, r5);
rl = _mm_set_psl(dl);
r9 = _mm__mul_ps (r9, rl);
r5 = _mm_add_ps(r5, r9);
}
)

Finally, the belo w code en su re s that o nly parts o f seg m en ts that are w ithin the n o d e ’s b o u n d in g
box are c o n sid e re d . S im ila r to the above o p eratio n . S S E co d e v ecto rizes finding the in tersec tio n
to the three en try and exit p lan es by u sin g the p a ra m e tric e q u a tio n o f the line.

/ / c l a m p the i n t e r s e c t i o n l i n e to the bounding box


r9 = _mm_sub_ps(r5,r4);
r 8 = _mm_r cp_p s (r 9);
r 8 = _mm_min_ps(INFINITY_SSE_V2, r 8 );
tl = _mm_sub_ps(minVertexSSE, r4);
tl = _mm_mul_ps(tl, r 8 ) ;
t2 = _mm_sub_ps(maxVertexSSE, r4);
1 2 = _mm_mul_ps(1 2 , r 8 );
1 3 = _mm_mi n _ p s (1 1 ,1 2 );
1 2 = _m m _iriax_ps ( 1 1 , 1 2) ;
tl = _mm_shuffle_ps(13, t3, _MM_SHUFFLE(0,0,2,1));
tl = _mm_max_ps(t1, t3);
1 3 = _mm_shuffle_ps(tl, tl, _MM_SHUFFLE(1,1,1,!));
193

tl = _mn._max_ss ( 1 1 , 13) ;
tl = _mm_max_ss(tl,ZERO_SSE);
t 3 = _nrr._shuf fie_ps (t2, t2, _MM_SHUFFLE (0,0,2, 1) );
t3 = _mm_min_ps(1 3, 1 2 );
t2 = _mm_shuffle_ps(t3, t3, _MM_SHUFFLE(1,1,1,1));
t3 = _mm_rriin_ss (t3, 1 2 ) ;
13 = _mm_min_s s (13, ONE_SSE);
if(_mm_ucomigt_ss (tl, 13) )
return false;
r 8 = _mm_shuffle_ps(tl,tl,_MM_SHUFFLE(0,0,0,0));
r7 - _mm_shuffle_ps(13,13,_MM_SHUFFLE(0, 0,0,0));
tl = _mm_jnul_ps (r9, r8 );
tl = _mm_add_ps(r4, tl);
t2 = _rnm_mul_ps (r9, r7);
t 2 = _mm_ 3 d d_ps(r4, t 2 ) ;
/ / c l a m p th e i n t e r s e c t ion l i n e to th e boun d ing box

return true;

Packet Ray tracing implementation

P acket R ay tracin g w as im p lem en te d w ith a versio n o f in terv al a rith m e tic , as d e scrib e d in c h a p ­


te r 2. T he versio n is im p lem en ted w ith o u t the use o f SSE . It uses tw o b o u n d ary rays fo r each axis
to traverse the e n tire pack et. T h e im p le m e n tatio n o f the recu rsiv e packet trav ersal m eth o d is given
below .

char RecursiveRayTraversalIntervalSSE (int nodeindex, float tmini, float tmaxi


)//, Interval *ti)
(
packetNodeTraversa1s++;
KDTreeNodel *node = nodeArray+nodelndex;
i f (IS_LEAF_P(node))
(

i f (node->params==3)
return 0;
return ProcessLeafNode(nodeindex);
)
u n signed char axisCur= GET_AXIS_P(node),
sign = signs[axisCur],
axisCurMinlndex = axisCur<<l;

float bbSp - GET_P0S_P(node) * NODE_DIVISION_PRECISION;

bbSp = b b [axisCurMinlndex] + (bb[axisCurMinlndext1] - b b [


axisCurMinlndex])*bbSp;
float terr.p = (bbSp - vpF [axisCur] ) ;
float tSpMax = temp*rayDirRecMax[axisCur];
temp = temp*rayDirRecMin[axisCur];

float tSpMin;
if (temp> tSpMax)
(

tSpMin = tSpMax;
tSpMax = temp;

else
194

tSpMin = temp;

if(tSpMax > tmini && tSpMax > 0)


{

temp = bb[axisCurMinlndex+sign];
b b [axisCurMinlndex+sign] = bbSp;
axisCur - RecursiveRayTraversallntervalSSE(node->leftNode+l-
sign, tmini, MIN_2(tmaxi,tSpMax));//, t i ) ;
bb[axisCurMinlndex+sign] = temp;
i f (axisCur)
return axisCur;
)

if (tSpMin < tmaxi)


{

temp = bb[axisCurMinlndex+l-sign];
b b [axisCurMin!ndex+1 -sign] = bbSp;
axisCur = RecursiveRayTraversallntervalSSE(node->leftNode+
sign, MAX_2(tmini, tSpMin), tmaxi);//, t i ) ;
b b [axisCurMinlndex+l-sign 1 = temp;
!
return axisCur;
)

Multi threading

A n o th e r valu ab le tool valu ab le in the o p tim iza tio n p ro cess is m ulti th re a d in g the a p p licatio n . T h is
w as im p lem en te d w ith the p th rea d s library that allo w ed im p lem e n tin g the th re a d s in a sim p le
m anner. T he below code show s how the ray trac in g co de w as m ulti th read ed .

CastRays()
{
int i;
for(i=0; i < noThreads; i++)
{

pthread_create(&t[i], NULL, RowTracingRenderer2MTSub::


do_thread, &rt[i]);
}
for(i=0; i < noThreads; i++)
(
pthread_join(t[i], NULL);
)
}

static void *RowTracingRenderer2MTSub::do_thread(void* param)


{
static_cast<RowTracingRenderer2MTSub*>(param)->CastRays();
return NULL;
}

T h e m eth o d c re a te s as m any th re ad s as defined an d then calls the ray tra cin g m eth o d d efin ed in
th e su b class - R o w T ra c in g R e n d c re r2 M T S u b in this case.

To be able to be called by a p th re a d , the m eth o d has to be a static fu n c tio n that did n o t d e ­


p en d on the o b ject. T he d o _ th rea d ’ m ethod is th u s defin ed as a static m eth o d th at tak es in the

(
195

‘RowTracingRenderer2MTSub’ object as a parameter and calls its ‘CastRays’ method. Objects


o f RowTracingRenderer2MTSub represent objects allocated to each thread performing the work
allocated to that thread.

You might also like