You are on page 1of 87

Masaryk University

Faculty of Informatics

Evaluation of the suitability of


OptiX engine for rendering
signed distance functions.

Master’s Thesis

Daria Vasilenko

Brno, Spring 2021


Masaryk University
Faculty of Informatics

Evaluation of the suitability of


OptiX engine for rendering
signed distance functions.

Master’s Thesis

Daria Vasilenko

Brno, Spring 2021


This is where a copy of the official signed thesis assignment and a copy of the
Statement of an Author is located in the printed version of the document.
Declaration
Hereby I declare that this paper is my original authorial work, which
I have worked out on my own. All sources, references, and literature
used or excerpted during elaboration of this work are properly cited
and listed in complete reference to the due source.

Daria Vasilenko

Advisor: RNDr. Jan Byška, Ph.D.

i
Acknowledgements
I want to express my gratitude to my supervisor RNDr. Jan Byška,
Ph.D., for the valuable comments, important tips during the research,
and for the many reviews of the text of this thesis. I also want to thank
my consultant Mgr. Matúš Talčík for his valuable advice and many
ideas for improving the created framework.

iii
Abstract
In this thesis, we consider the applicability of the new raytracing
pipeline for visualizing signed distance fields using the example of the
NVIDIA OptiX 7 API. We considered the visualization tasks of both
discrete signed distance field (3d volume texture) and signed distance
field defined by the function. We have implemented our framework
based on NVIDIA OptiX 7. We also implemented visualization meth-
ods in OpenGL and CUDA for comparison with the implementation
on OptiX.

iv
Keywords
ray tracing, ray marching, signed distance field, OptiX, CUDA, OpenGL

v
Contents
1 Introduction 1

2 Background and Related Works 3


2.1 Signed distance function . . . . . . . . . . . . . . . . . . 3
2.2 Discrete signed distance field . . . . . . . . . . . . . . . 6
2.2.1 Brute force . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Hierarchical organization . . . . . . . . . . . . . 7
2.2.3 Distance transform methods . . . . . . . . . . . 8
2.3 Signed distance fields rendering . . . . . . . . . . . . . 10
2.3.1 Basic ray marching . . . . . . . . . . . . . . . . . 10
2.3.2 Sphere tracing . . . . . . . . . . . . . . . . . . . . 13
2.3.3 Cone tracing . . . . . . . . . . . . . . . . . . . . . 15
2.4 Ray tracing pipeline. NVIDIA OptiX 7 . . . . . . . . . . 17
2.4.1 Pipeline . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 Ray generation program . . . . . . . . . . . . . . 18
2.4.3 Intersection program . . . . . . . . . . . . . . . . 19
2.4.4 Any-hit program . . . . . . . . . . . . . . . . . . 19
2.4.5 Closest-hit program . . . . . . . . . . . . . . . . 19
2.4.6 Miss program . . . . . . . . . . . . . . . . . . . . 20
2.4.7 Exception program . . . . . . . . . . . . . . . . . 20
2.4.8 Callables programs . . . . . . . . . . . . . . . . . 20
2.4.9 Shader binding table . . . . . . . . . . . . . . . . 20
2.4.10 Ray payload . . . . . . . . . . . . . . . . . . . . . 21
2.4.11 Ray tracing with NVIDIA OptiX 7 . . . . . . . . 21

3 Implementation 23
3.1 Discrete signed distance field creation . . . . . . . . . . 24
3.2 Framework overview . . . . . . . . . . . . . . . . . . . . 26
3.2.1 Used tools . . . . . . . . . . . . . . . . . . . . . . 27
3.2.2 Framework design . . . . . . . . . . . . . . . . . 27
3.2.3 Usage . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Signed distance functions rendering using NVIDIA OptiX 32
3.3.1 Ray marching calculation in the intersection pro-
gram . . . . . . . . . . . . . . . . . . . . . . . . . 32

vii
3.3.2 Ray marching calculation in the closest hit pro-
gram . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3 Ray marching calculation in the ray generation
program . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 Signed distance functions rendering using OpenGL . . 37
3.5 Signed distance functions rendering using CUDA . . . 38

4 Results 39
4.1 Test setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 The impact of the CUDA block size on performance. . . 42
4.3 The impact of different types of CUDA mathematical
operations on performance and visual result. . . . . . . 42
4.4 Comparison of different methods of visualization of
SDF functions . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5 Comparison of different methods of visualization of
discrete SDF functions . . . . . . . . . . . . . . . . . . . 50

5 Conclusion 55

A List of Electronic Attachments 57

B Requirements for running binaries and for compiling 59

C Instruction for building the framework 61

Bibliography 63

viii
List of Tables
4.1 The dependence of the rendering speed (ms) of different
scenes on the size of the CUDA blocks. Less is better. 43
4.2 Performance (ms) speed up when using fast math for
Mandelbulb scene.The bottom line shows the acceleration as a
percentage. 43
4.3 Performance (ms) results of different methods of
visualization of SDF functions. 47
4.4 Performance (ms) results of different methods of
visualization of discete SDF functions. The number of
primitives in the accelerating structure is indicated in
parentheses. 54

ix
List of Figures

1.1 Character rendered as raymarched procedural SDF made by


Inigo Quilez. Image from [1]. 1
2.1 a) A 2D subset and b) it’s 2D distance field. The subset’s
distance field represents its boundary, its interior (tinted
brown here for illustration) and the space in which it sits.
Image from [3] 4
2.2 3D fractals. Made in my project Fractals3D. 5
2.3 Visualization of basic ray marching. The red dots show all
the sampled points. 11
2.4 Example of skipping a small object in a scene because the
marching step is too large. 12
2.5 Visualization of ray marching. The red dots show all the
sampled points. The blue circles show areas that are
guaranteed to contain no objects. The dashed green lines
show the shortest vectors between each sampled point and the
scene. 14
2.6 The ray in sphere tracing passes parallel to the object
boundary. 16
2.7 Cone tracing. 17
2.8 Relationship of NVIDIA OptiX 7 programs. Green
represents fixed functions; gray represents user programs.
Image from [31]. 18
3.1 Computing the sign of the SDF at two points: 1 and 2.
Before intersecting with point 1, the ray intersects with the
mesh three times, so point one lies inside the mesh, and the
sign of the SDF in it will be negative. And before it intersects
with point 2, the ray intersects with the mesh six times, so
point 2 lies outside the mesh, and the sign of the SDF in it
will be positive. 25
3.2 Visualization of a single 2d layer of a three-dimensional
discrete SDFs calculated using implemented program. There
is discrete SDF for Stanford bunny on the left and discrete
SDF for Utah teapot on the right. 26
3.3 UML diagram of base classes. 27

xi
3.4 Framework interface. 29
3.5 NVIDIA OptiX 7 pipeline. Image from [36]. 33
3.6 The OptiX pipeline. The traced ray intersects with the leaves
of the accelerating structure. Ray marching is calculated in
intersection program. 33
3.7 The OptiX pipeline. The traced ray intersects with the leaves
of the accelerating structure. Ray marching is calculated in
closest-hit program. 35
3.8 Artefacts in closest-hit method, if we compute ray marching
in one voxel. 37
4.1 Scenes that were used to compare the methods that use a
raymarching with a direct evaluation of SDFs for SDF
visualization. Top row from left to right: Sphere, DonutBox2,
SierpinskiTriangle, DonutBox4, Penguin. Bottom row from
left to right: DonutBox, MengerSponge, DonutBox3, Julia,
Mandelbulb. 40
4.2 Scenes that were used to compare the methods that are based
on a raymarching using precomputed 3d volumetric textures.
Top row from left to right: Cornell box, Sphere, Stanford
Bunny, Cow. Bottom row from left to right: Teapot, LPS
Head, Stanford Armadillo, Dragon. 41
4.3 Performance using default math and fast math (less is
better). (*) - uses fast math. 44
4.4 Visualization of Mandelbulb fractal using different math
types. 45
4.5 Visual results of different methods of visualization of SDF
functions. 46
4.6 Examples of SDFs AABBs using in OptiX. 46
4.7 The ratio of the four most frequently used instructions when
calculating the DonutBox4 SDF. FFMA - fp32 multiply and
add, FMUL - fp32 multiply, IMAD- integer multiply and
add, FADD - fp32 add. Charts are from the NVIDIA Nsight
Compute profiler. 47
4.8 Warp state for DonutBox4 scene. Charts are from the
NVIDIA Nsight Compute profiler. 48

xii
4.9 Number of ray marching steps for different visualisation
methods. Increasing the number of steps corresponds to a
color change from dark blue to light blue, white, and then red.
Dark red corresponds to the maximum number of steps (20
steps, in this case). A larger number of iterations leads to
lower performance. 49
4.10 Visual results for different grid sizes of discrete SDF. 50
4.11 Visual results of different methods of visualization of discrete
SDF functions. 51
4.12 Visual artifacts in OptiX IS method. 52
4.13 Performance (ms) results of different methods of
visualization of discete SDF functions. 53

xiii
1 Introduction
The signed distance function of a set defines the distance from any
point in space to the boundary of the set. It has negative values at
points inside the set, positive values outside the set, and it is equal to
zero at the set’s boundary. Instead of representing the geometry in a
traditional triangle-based way, signed distance fields can be used as
a more elegant and compact solution. So, for example, to visualize a
sphere, instead of rendering several hundred triangles, we can easily
calculate the elementary equation:
q
f ( p) = x2 + y2 + z2 − r, p ∈ R3 (1.1)

Moreover, signed distance fields allow performing geometric opera-


tions on them such as boolean operations, blending, warping, bending,
twisting, etc. Therefore, signed distance fields become a powerful tool
for procedural modeling. An example of the SDF model rendered by
sphere tracing is shown in Figure 1.1.

Figure 1.1: Character rendered as raymarched procedural SDF made


by Inigo Quilez. Image from [1].

Sphere tracing (ray marching) is a rendering technique for signed


distance fields. The idea behind ray marching is to move forward along

1
1. Introduction

the traced ray by the distance to the nearest scene object. Eventually,
the ray will get close enough to the surface that the point can be
considered as the intersection point with the object of the scene.
With a new ray tracing pipeline on the modern GPUs, we took
a step toward real-time ray tracing. The new pipeline is supported
by such APIs as Vulkan, DirectX 12 and Nvidia Optix 7. Though the
pipeline is oriented mostly on ray tracing triangle meshes (new ray
tracing cores are computing ray-triangle intersection and perform
BVH traversal on a hardware level), it still supports custom geometry
and can build an acceleration structure. Since sphere tracing is very
similar to ray tracing, we want to determine whether it makes sense
to implement sphere tracing on a new ray tracing pipeline. For the
evaluation we have chosen Optix 7.

2
2 Background and Related Works
This chapter provides background information for this thesis. It con-
sists of four parts. In the first part, the concept of the sign distance
functions is presented (see Chapter 2.1). And the mathematical prin-
ciples necessary for rendering geometry represented using signed
distance functions is described. In the second part, overview of dis-
crete signed distance fields is given (see Chapter 2.2). And methods
for constructing a discrete signed distance function is also described.
In the third part, the current state-of-the-art for rendering signed
distance functions and different signed distance function rendering
techniques is discussed (see Chapter 2.3). And in the last fourth part,
GPU real-time ray tracing pipeline is presented (see Chapter 2.4). And
all terms associated with NVIDIA OptiX 7, which will be used further
in this thesis, are presented and explained.

2.1 Signed distance function

Let’s consider a subset Ω of a metric space X with metric d. Then a


distance function (or distance field) [2, 3] φ(−

x ) is defined as:

φ(−

x ) = min(d(−

x ,−

x I )) for all −

x I ∈ ∂Ω, (2.1)

where ∂Ω is a boundary of Ω.
The subset’s distance field φ(−
→x ) defines the distance from this point
to the boundary of the subset, for any point − →
x in a metric space X.
The distance can be signed to distinguish between the inside and out-
side of a subset (see Figure 2.1). The signed distance function has
negative values at points −→x inside Ω, it increases as −
→x approaches
the boundary of Ω, where the signed distance function is zero, and
takes positive values outside of Ω.

3
2. Background and Related Works

Figure 2.1: a) A 2D subset and b) it’s 2D distance field. The subset’s


distance field represents its boundary, its interior (tinted brown here
for illustration) and the space in which it sits. Image from [3]

A signed distance function (or SDF) [2, 3] is an implicit function f


with | f (−

x )| = φ(−→x ) for all −

x . Thus,


→ for all−

x ∈ ∂Ω+
 φ ( x ),



f (−

x ) = 0, for all−

x ∈ ∂Ω (2.2)

−φ(− →
x ), for all−


x ∈ ∂Ω− ,

where ∂Ω− is the interior region of ∂Ω and ∂Ω+ is the exterior region
of ∂Ω.
Since in this thesis we render the sign distance functions only for
sets in three-dimensional Euclidean space, let’s give a definition of the
sign distance function specifically for our case.

The distance function [4] φ( p) : R3 → R of a set of surface points Ω


is defined as the distance from point p = ( x, y, z) T to the closest point
in Ω:
φ( p) = min(|| p − q||), for all q ∈ Ω, (2.3)
where the surface S = p : φ( p) = 0 and ||.|| is an Euclidean distance
on R3 .

4
2. Background and Related Works

Figure 2.2: 3D fractals. Made in my project Fractals3D.

And a signed distance function f ( p) is defined in the same way as in


Formula 2.2 with φ( p) = min(|| p − q||).
The simplest example of a signed distance function f ( p) is the
SDF for a sphere with center at the origin and with radius r, which is
described by the implicit equation x2 + y2 + z2 − r = 0. Then we can
represent its SDF as follows:

q
f ( p) = x2 + y2 + z2 − r, p ∈ R3 (2.4)

For any point p = ( x, y, z) ∈ R3 , the function f ( p) : R3 → R


gives the signed distance between point p and the closest point on
the surface of the given sphere. For any point p outside the sphere
the SDFs sign will be positive, for any point p on the surface of the
sphere the SDF will be zero, and for any point p inside the sphere
the SDFs sign will be negative. For example, for a point p = (0, 2r, 0)
the function value f ( p) is equal r and this point is located outside the
sphere with the closest point on the surface r units away. A point p
with coordinates (r, 0, 0) lies on the surface of the sphere, since for it
f ( p) = 0.
Examples of more complex signed distance functions are shown
in Figure 2.2. Also a lot of examples of different signed distance func-
tions can be found on the website of Inigo Quilez [1], in Keenan
Crane’s work [5] from 2005, Syntopia blog [6] or on the site Mandel-
bulb.com [7].

5
2. Background and Related Works

2.2 Discrete signed distance field


Simple geometric shapes, such as, for example, spheres, ellipsoids,
parallelepipeds, cones, and cylinders, can be easily represented in
an implicit form with its signed distance functions. However, there
are also several challenges in using pure signed distance functions.
Firstly, it is hard to derive signed distance functions for complex objects.
Secondly, in rendering algorithms for signed distance functions, such
as ray marching [8] or ray tracing of implicit surfaces [9] we have to
evaluate SDFs at different points in space for every pixel. For simple
shapes, such as the sphere, it will not be costly. But evaluating complex
SDFs with a large number of mathematical operations requires much
more time. And it makes the usage of implicit representation of signed
distance functions impractical and impossible to use for real-time
rendering. Therefore, instead of implicit SDFs, discrete signed distance
functions are often used.
We represent the discrete distance field as a three-dimensional voxel
grid (for SDFs in three-dimensional space). In each voxel we store
the shortest distance from the voxel center to the object, for which
we want to construct a signed distance function. The distance from
an arbitrary point in space to the object is reconstructed from local
sampled values using an interpolation function.
Triangle mesh representation is the most common method of rep-
resenting 3D geometry. Therefore, it is very important to be able to
translate triangle meshes into signed distance fields. However, we can
calculate the exact SDF not for all triangle meshes but only for meshes
that are closed, orientable 2-manifolds[10]. In practice, this means
that we require that [10]:

• The mesh does not contain any self-intersections: triangles may


share only edges and vertices and must be otherwise disjoint.

• Each edge must be adjacent to exactly two triangles.

• Triangles incident on a vertex must form a single cycle around


that vertex.

If these conditions are met, we can easily determine which part of the
space is inside or outside the mesh, and we can build an SDF. Several

6
2. Background and Related Works

ways to construct a discrete signed distance field for triangle meshes


are described below.

2.2.1 Brute force


The most straightforward and most understandable algorithm for
computing a discrete SDF for a triangle mesh is the brute force one.
This algorithm can be shortly described as follows: “ For each voxel of
a grid, calculate the shortest distance to each face in a mesh and choose
the smallest of them.” We build a three-dimensional grid, inside which
our model will be entirely located (for example, we can calculate the
axis aligned bounding box (AABB) of the model and divide it into
cells). And we compute the SDF in each cell of the grid. We can build
a grid with cells of the same size, but this will result in many empty
cells in the constructed grid (cells inside which there are no triangles
of the mesh). This will, in turn, lead to an increase in the size of the
required memory since we will also have to store useless empty cells.
To solve this problem, we need to build a grid with different cell sizes.
We choose the size of the cells so that the grid is dense near the surface
of the mesh and as sparse as possible inside and outside the mesh.
This will reduce the total amount of memory required to store the
mesh SDF and improve the approximation of the SDF.

2.2.2 Hierarchical organization


The brute force algorithm calculates the exact SDF value for each
grid cell. But despite its simplicity and accuracy, this approach is
impractical, as it leads to excessively higher computation costs. The
brute force algorithm requires N*M steps, where N is the number of
cells in the grid and M is the number of triangles in the mesh. Therefore,
different ways to speed up this algorithm have been proposed. One
such way, for example, is to use hierarchical data structures. Payne
and Toga [11] organized all triangles of mesh into a hierarchical tree of
bounding boxes. And each triangle occupied only one box at each level.
They could quickly estimate the minimum and maximum distance
from a given point to all triangles inside the box. And based on these
values, immediately discard the unsuitable triangles. Strain [12] used
quadtree to speed up the calculation of the distance to the 2D interface,

7
2. Background and Related Works

and his proposed algorithm runs in O(N) space and O(N log N) time,
where N is a number of elements in a polygonal interface.

2.2.3 Distance transform methods


The distance transform methods are based on the fact that we can
calculate the distance values for the grid points that are near the
border of the triangle mesh, and the values for all remaining grid
cells we can estimate based on the values obtained at the mesh border
using a distance transformation. A distance transformation (DT) is
an operation that converts a grid with calculated values at the border
into a grid in which the minimum distance to the object is stored in
all cells. This method is not as accurate as brute force. It may cause
artifacts to appear since the values for cells located far from the mesh
surface are not calculated precisely but based on the calculated values
of the SDF at the surface.
Several distance transform methods exist. There are two classifica-
tions of these methods: according to distance estimation and according
to propagation.
The distances can be propagated through the volume using [10]:

• Sweeping scheme. We start the propagation at one corner of the


grid and continue it voxel-by-voxel, row-by-row to the opposite
corner. It usually requires multiple passes in several different
directions.

• Wavefront scheme. We propagate the distance from the object’s


surface in order to increase the distance to the surface. And
we continue until the values for all voxels are calculated. The
advantage of this method is that we can stop the calculation
as soon as we reach the desired distance isolevel. So we can
calculate the SDF values not in the entire constructed grid but
only at a distance from the mesh surface at which we need to.

We can estimate the distance value of a given voxel from the known
values of its neighbors using [10]:

• Chamfer DTs [13, 14, 15]. Voxels in which the mesh boundary
lies are initialized either by the minimum distance or zero,

8
2. Background and Related Works

and all remaining voxels are initialized by infinity, i.e., suitably


large number. We use a mask of size NxNxN, (where N is odd
number; in practice, N = 3, 5 or 7 is usually used). The center
of the mask is placed above each voxel. For each voxel in the
mask, we calculate the sum of the value in this voxel and the
distance from this voxel to the center of the mask. After that,
in the voxel of the mask center, we write the minimum of these
sums. The formula for calculating the new value in the central
voxel is as follows:
m +1 m
f i,j,k = min ( f i,j,k + d((i, j, k), ( p, q, r ))), (2.5)
( p,q,r )∈mask

m +1
where f i,j,k is the new value in the central voxel, f i,j,k
m is the

previous value in the central voxel, ( p, q, r ) is a voxel with a


center in p, q, r and d() is the distance between two voxels. The
process continues until no voxel value changes.
• Vector (or Euclidian) DTs [16, 17, 18, 19]. Very similar to Cham-
ber DTs. However, instead of distances, voxels are initialized by
a vector to the nearest point on the surface. And then propagate
those vectors to neighboring voxels.
• Fast Marching Method (or Eikonal solvers) [20, 21]. It is a
wavefront scheme that calculates the distance values from a set
of the distance values at the object boundary. The idea of the
algorithm looks something like this:
1. We freeze the voxel with a known distance value.
2. Calculate the distance for the adjacent voxels to the frozen
voxel.
3. Find the voxel with the minimum value and freeze it.
4. Recalculate the values in all neighboring voxels(except
frozen ones).
5. We loop back and freeze the voxel that now contains the
minimum distance, and so on.
Thus, the set of frozen voxels (voxels for which we have de-
termined the final distance) expands, and the voxel of frozen

9
2. Background and Related Works

voxels is a layer of voxels in which we calculate the distance


value and which are not yet frozen.

2.3 Signed distance fields rendering


We can visualize the signed distance field of an object using fairly
simple algorithms, the main idea of which is to find the intersection
of the ray shot from the camera into the scene and the surface of the
object, on which the SDFs value is equal to zero. And in this part, we
will describe some of the most common methods for visualizing sign
distance functions: ray marching, sphere tracing, and cone tracing.

2.3.1 Basic ray marching


Basic ray marching [22] is one of the 3d-rendering techniques used to
render sign distance functions. Basic ray marching is somewhat similar
to traditional ray tracing. Like in ray tracing, we send a ray from the
camera to the scene for each pixel. And we try to find the intersection
of this ray with the objects of the scene. But in basic ray marching,
instead of solving the intersection equations of the ray with various
objects in the scene and calculating the intersection point analytically,
we "step " along the ray. Each time we move the same distance until
we find the intersection point of the ray with the object or until we
consider that this ray does not intersect with any object in the scene
(Figure 2.3). Below is the basic ray marching algorithm for rendering
an SDF function (Algorithm 1).

Algorithm 1 Basic ray marching


1: for all pixel do
2: samplingPoint = r0
3: k=0
4: while SDF (samplimgPoint) > 0 do
5: k = k+1
6: samplingPoint = r0 + ∆t ∗ k ∗ rd
7: end while
8: Perform shading and write color to pixel
9: end for

10
2. Background and Related Works

Figure 2.3: Visualization of basic ray marching. The red dots show all
the sampled points.

In standard constant step basic ray marching [22] the coordinates


of the next sample point can be calculated using the following equation:

r (k ) = r0 + ∆t ∗ k ∗ rd , (2.6)
where r0 is the origin of the ray, rd is the unit direction vector of the ray,
∆t is the predefined step size and k ∈ 0, 1, 2, ..., n is the step number.
After completing the loop in the algorithm, the point that we get is
the intersection point of the constructed ray with the scene object.
In practice, the basic ray marching loop uses more exit conditions.
For example, in some implementations, the maximum possible dis-
tance that we can move along the ray is added. It means that we will
not visualize all objects that are further than this distance. This allows
us, for example, not to draw objects that occupy a tiny area of the
screen compared to the area of the entire screen. Also, some imple-
mentations use a predefined maximum number of loop iterations (i.e.,
the maximum number of steps along the ray). This is done to ensure,
that the loop ends, if the ray does not intersect with any object in the
scene. In basic ray marching algorithm parallelepiped, that bounds
the hypertexture volume [22], is used to generate the hypertexture
volume. If the ray does not intersect the bounding parallelepiped, then
the algorithm proceeds to process the next pixel. However, if the ray

11
2. Background and Related Works

Figure 2.4: Example of skipping a small object in a scene because the


marching step is too large.

intersects the parallelepiped, then as the starting point of basic ray


marching, we use the point of entry of the ray into the parallelepiped.
Moreover, one of the conditions for completing the basic ray marching
cycle is that the ray goes beyond the parallelepiped. This reduces the
number of times we need to evaluate the signed distance function,
making the algorithm run faster.
Unfortunately, this algorithm has its disadvantages, which arise
because we use a fixed-length step when we are sampling the ray. On
the one hand, if the step size is too large and the scene consists of small
objects comparable in size to the step length, we can skip these objects
when moving along the ray (Figure 2.4). On the other hand, if the
step size is too small, we will move along the ray very slowly. It means
that it will take us a long time to find the intersection of the ray and
the scene object, making this algorithm unsuitable for those tasks in
which the speed of program execution plays an important role. Using
this algorithm, we can accurately determine whether a given point is
inside the object or outside. However, we cannot guarantee that we
will determine the intersection point of the ray and the scene object
with sufficient accuracy, again since we use a fixed-size step.
Several methods exist to solve the latter problem. For example, if
the currently sampled point is inside the object, we can reduce the basic

12
2. Background and Related Works

ray marching step and march back on the ray until the sample point
exits the object. And then again, we can reduce the marching step and
continue sampling towards the object. We can continue this process
until we reach the desired accuracy of the result. Another possible
solution is to use the bisection method or binary search method. If the
sign of the SDF at the current sampling point is negative (that is, we
are inside an object), and the value at the previous point is positive
(outside the object), we use binary search to find the point where the
SDF is equal to zero. However, such solutions, although they help
find more precisely the intersection of the ray and the object, are very
time-consuming.

2.3.2 Sphere tracing

The primary basic ray marching problems associated with using a


fixed-length step when sampling the ray were solved using the sphere
tracing (or ray marching as it is now often called) algorithm, proposed
by Hart [8]. We still walk along the ray with this approach, but instead
of a fixed-length step, we use a dynamically adjusted step along the
marching path (Figure 2.5). The step size is determined by the distance
obtained from a distance function at a given sample point. The basic
sphere tracing algorithm is shown below (Algorithm 2).
The idea of the method is quite simple. At each iteration, we calcu-
late the value of the sign function of the distance at the current point
p1 , which, according to the SDF definition, is equal to the minimum
distance from the current point to the scene. Let p2 be the nearest
point to p1 . Then we can consider a sphere with the center at p1 and
the radius ||− p−→
1 p2 ||. Then we can notice that there will be no object
inside this sphere. Therefore, it is safe to step by this radius along the
ray because we know we will not pass through any surfaces. So now,
instead of moving at a small fixed step, as in basic ray marching, we
can immediately take the maximum step we know is safe without
going through a surface. In such a way, we can drastically reduce
the number of steps required to hit an object’s surface. We apply this
procedure until our distance is less than the specified threshold e.

13
2. Background and Related Works

Figure 2.5: Visualization of ray marching. The red dots show all the
sampled points. The blue circles show areas that are guaranteed to
contain no objects. The dashed green lines show the shortest vectors
between each sampled point and the scene.

Algorithm 2 Sphere tracing


1: for all pixel do
2: samplingPoint = r0
3: step = SDF (samplingPoint)
4: while step > e do
5: step = SDF (samplingPoint)
6: samplingPoint+ = step ∗ rd
7: end while
8: Perform shading and write color to pixel
9: end for

14
2. Background and Related Works

In standard sphere tracing the coordinates of the next sample point


can be calculated using the following equation:
r (k + 1) = r (k) + SDF (r (k)) ∗ rd , r (1) = r0 , (2.7)
where r0 is the origin of the ray, rd is the unit direction vector of
the ray, k ∈ 0, 1, 2, ..., n is the step number, e is a required accuracy
of the solution and SDF () is a signed distance function of the scene.
After completing the loop in the algorithm, the point that we get is
the intersection point of the constructed ray with the scene object.
With this simple step size adjustment, stepping over small objects
in the scene is eliminated. Objects can no longer be skipped since the
step size automatically decreases when we are near an object.
It is easy to see that this technique is very similar to basic ray
marching and suffers more or less from the same problems. We also
need to determine the maximum possible distance that we can move
along the ray so that we do not continue ray marching indefinitely if
the ray does not intersect with any object in the scene. Alternatively,
we can also limit the maximum number of steps along the ray to a
sufficiently large constant. It should also be noted that the step size
should be limited from below by a certain constant. If the ray passes
parallel to the object boundary and very close to it (Figure 2.6), the
step size is significantly reduced. Then it can lead to the fact that we
will move very slowly along the object boundary and never intersect
with it, which will lead to significant performance overhead.

2.3.3 Cone tracing


Cone tracing [23, 24, 25] is an extension of the sphere tracing algorithm.
We represent a ray as a cone or circular pyramid that expands in the
marching direction in cone tracing. Now the ray is determined by its
origin, direction, and angle of spread. To determine the spread angle,
we consider the centerline of the cone and count the angle between it
and the cone boundary, measured at the top of the cone. This angle
is chosen so that, when the ray is sent from the eye, the radius of the
cone at the distance of the virtual screen is equal to the width of the
pixel. At each step of the selection, we move along the centerline of
the cone (Figure 2.7) and compare the current radius of the cone with
the minimum distance to the nearest surface in the scene (that is, with

15
2. Background and Related Works

Figure 2.6: The ray in sphere tracing passes parallel to the object bound-
ary.

the value of the SDF at this point). If the SDF value is less than the
current radius, the cone is assumed to have crossed the object [26].
The most popular use of cone tracing is its use in the eponymous
real-time global illumination calculation algorithm [24, 25]. Voxel cone
tracing is used to create different global illumination effects such as
glossy reflections, ambient occlusion, indirect diffuse lighting trans-
parency, soft shadows. In this algorithm, we rasterize the entire scene
into a set of voxels containing material properties (such as color) and
surface normals. For voxels, the incoming or outgoing radiation from
lighting sources is calculated (i.e., direct lighting is calculated), and
the result is recorded in a 3D texture, which is linearly filtered using a
3D mipmapping scheme. Voxel cones are then traced throughout the
filtered mipmap in order to approximate indirect lighting. A group of
cones is created for each calculated surface pixel. And we march along
the centerline of the cone and use the voxel representation of direct
illumination to accumulate indirect illumination for the surface pixel.
This approach is similar to ray marching but is much more efficient
due to the use of larger (and variable) sampling steps.

16
2. Background and Related Works

Figure 2.7: Cone tracing.

2.4 Ray tracing pipeline. NVIDIA OptiX 7

Today if we want to use a GPU ray tracer utilizing new ray tracing cores,
we have several choices: DirectX Ray Tracing [27, 28], Vulkan Ray
Tracing [29, 30], or OptiX Ray Tracing [31, 32]. Ray tracing pipelines
are very similar for these three application programming interfaces
(APIs) [31], as they are based on the ray tracing technology offered
by NVIDIA and use Nvidia’s graphics products. But since our work is
written using OptiX 7 NVIDIA and for the sake of uniformity of used
terms and formulations, the explanation of the ray tracing pipeline in
this chapter is presented in terms of OptiX 7 NVIDIA.
The following terms will be used later in the text: The “host” is the
processor that begins the execution of an application. The “device” is
the GPU with which the host interacts. A “build” is the creation of an
acceleration structure on the device as initiated by the host [31].
NVIDIA OptiX (OptiX) is a graphics engine for real-time ray trac-
ing rendering using NVIDIA graphics processing units (GPUs). OptiX
uses CUDA technology to perform calculations on GPUs, which is
only supported on Nvidia’s graphics products. Although OptiX is a
graphics engine, it can be used in non-graphical computing. The scope
of use of OptiX is all computationally intensive tasks to which ray trac-
ing can be applied (here, "ray tracing" refers to a method for analyzing

17
2. Background and Related Works

Figure 2.8: Relationship of NVIDIA OptiX 7 programs. Green rep-


resents fixed functions; gray represents user programs. Image from
[31].

and investigating geometric systems by calculating the propagation


of waves or particles). For example, OptiX is used in radiation and
electromagnetic research [33], collision analysis [34, 35], and in many
other fields.

2.4.1 Pipeline
The main idea of the OptiX engine is that most ray tracing algo-
rithms can be implemented using combinations of a small set of
programmable operations (which are called “programs” in OptiX)
[35]. The combination of user programs and hardcoded OptiX code
forms the ray tracing pipeline. The pipeline in OptiX consists of 8 pro-
grammable programs, each of which operates on a single ray at a time:
Ray generation, Intersection, Any-hit, Closest-hit, Miss, Exception,
Direct callables, Continuation callables (see Figure 2.8). Exception
programs can be called from any program or scene traversal (traversal,
i.e., search through the geometric data in the scene).

2.4.2 Ray generation program


Ray generation program is the entry point into the ray tracing pipeline,
invoked by the system in parallel for each pixel, sample, or other user-
defined work assignments [31]. After calling optixLaunch on the host,
ray generation is called on the device for each thread. All optixLaunch

18
2. Background and Related Works

calls are asynchronous, so we need to use CUDA streams and events for
synchronization. When the ray generation program calls the optixTrace
function, a ray is generated, for which the intersection, any-hit, closest-
hit, and other functions are called, according to the pipeline.

2.4.3 Intersection program


Intersection program implements a ray-primitive intersection test, in-
voked during traversal. To determine the intersection of geometric data
by a ray, NVIDIA OptiX 7 searches a graph of nodes composed of accel-
eration structures and transformations. This search is called a traversal;
the nodes in the graph are called traversable objects or traversables
[31]. When the search reaches the leaves of the accelerating structure,
ray-primitive intersection testing is performed. Intersection programs
are used to allow the user to specify a custom geometric primitive
intersection. But other than that, OptiX has built-in intersection sup-
port for triangles. On NVIDIA RTX GPUs, this can be executed on
special purpose hardware to allow exceptionally efficient traversal
of the acceleration structure [36]. So triangles are the most efficient
primitive to intersect on RTX GPUs, often by a large margin.

2.4.4 Any-hit program


Any-hit program is called when a traced ray finds a new, potentially
closest, intersection point, such as for shadow computation [31]. These
programs are called during traversal for each ray-to-object intersection
found (even if that intersection is not the nearest one). The user can
optionally stop further traversal and search for the nearest intersection
point by calling optixTerminateRay. This is done, for example, when we
need to calculate the shadows and we trace the shadow ray since we
are interested in the presence of any intersection of the shadow ray
with the scene, and not necessarily the nearest one.

2.4.5 Closest-hit program


Closest-hit program is called when a ray-traversal is completed and
traced ray finds the closest intersection point [31]. This program typ-

19
2. Background and Related Works

ically computes material shading and passes the results back to the
ray generation program.

2.4.6 Miss program

Miss program is called when a ray-traversal is completed and a traced


ray misses all scene geometry [31]. These programs can be used, for
example, to compute a background color.

2.4.7 Exception program

Exception program is invoked for conditions such as stack overflow


and other errors [31]. OptiX also supports user-defined exceptions
that can be thrown from any program.

2.4.8 Callables programs

Direct callables programs are similar to a regular CUDA function call


and they are called immediately. Continuation callables programs,
unlike direct callables, are executed by the scheduler. Callable pro-
grams allow for additional programmability within the standard set
of NVIDIA OptiX 7 programs [31]. They allow the application to plug
in functionality to a shader at runtime, for example, different shading
effects in response to user input.

2.4.9 Shader binding table

The shader binding table (SBT) binds geometric data to programs and
their parameters. The table stores information about which programs
need to be executed and with which arguments during the ray tracing
of the scene. The SBT mechanism is very flexible. It allows users to
perform many different settings depending on the needs of a partic-
ular application. For example, an application can specify different
programs and data to be used for different types of rays (for example,
shadow rays, camera rays, or ambient occlusion rays)[31, 36].

20
2. Background and Related Works

2.4.10 Ray payload


The ray payload is used to pass data between the ray generation pro-
gram that calls optixTrace and the rest of the programs that are called
during ray traversal. Payload values are passed to and returned from
optixTrace [31]. They are used, for example, to return the background
color from the miss program or the color of the crossed object from
the closest hit program.

2.4.11 Ray tracing with NVIDIA OptiX 7


Using the OptiX 7 engine in a ray tracing application usually involves
the following steps [31]:
1. Create acceleration structures for the scene.

2. Define all the necessary programs, such as ray generation, miss,


intersection, closest hit, any-hit, and so on. And create a pipeline
of programs that will contain all the programs that should be
called during ray tracing.

3. Create a shader binding table that includes references to these


programs and their parameters.

4. Launch a device-side kernel using optixLaunch, that will invoke


a ray generation program to begin traversal and the execution
of the other programs.
If the reader wants to learn more about ray tracing using OptiX 7,
then the following materials will be very useful:
• OptiX 7/7.1 tutorial on GitLab [37] or GitHub [38]. This is a
"tutorial" in how to set up a full Scene - i.e., OptiX context, mod-
ule, programs, pipeline, SBT, accel struct (AS), build inputs,
texture samplers, etc., in the newly introduced OptiX 7.

• NVIDIA OptiX 7.3 Programming Guide [31].

• OptiX developer forum [39]. There are a lot of various topics


about OptiX 7.

• NVIDIA ray tracing documentation [40].

21
2. Background and Related Works

• Here are many articles, webinars, and videos dedicated to real-


time ray tracing [41].

22
3 Implementation
In order to evaluate the suitability of the OptiX engine for rendering
signed distance functions, a framework based on the OptiX engine
and OpenGL was created. This framework supports two methods for
rendering geometry defined using the SDF:
1. a ray marching (sphere tracing proposed by Hart [8]) with a
direct evaluation of SDFs
2. a ray marching (sphere tracing) using precomputed volumetric
textures
Both of these methods use the standard raymarching algorithm de-
scribed in Chapter 2.3.2. But in the first method, the value of the SDF
at each step is calculated by evaluating the value of an explicitly spec-
ified SDF. And in the second method, the value of the SDF at each
step is read from the 3d volume texture, which stores a pre-calculated
discrete SDF of the scene.
In order to evaluate the effectiveness of using OptiX for SDF ren-
dering, both of these methods were implemented using:
• NVIDIA OptiX (see Chapter 3.3)
• OpenGL (see Chapter 3.4)
• CUDA (see Chapter 3.5)
Also, to create the SDF volumetric texture, a separate program
that calculates the discrete SDF for triangle meshes was implemented.
This chapter describes the implementation of the created framework.
It consists of five parts. The first part describes the implementation of
an algorithm for constructing a discrete SDF for triangle meshes. The
second part provides a general description of the framework. The third
part describes how to render the geometry of a given SDF using the
OptiX engine (see Chapter 3.3). The SDF rendering with raymarching
implementation in different parts of the OptiX pipeline (in the closest
hit program, intersection program, and ray generation program) is
described separately. The fourth part describes the implementation
using OpenGL (see Chapter 3.4). And the last fifth part describes
implementations using only CUDA (see Chapter 3.5).

23
3. Implementation

3.1 Discrete signed distance field creation


In order to create a discrete SDF for a given triangle mesh, a sepa-
rate program was implemented in C++. The program is located in
src\hel per folder. The program was parallelized using OpenMP [42].
In order to load the triangle mesh, the tinyobjloader 2.0 library [43]
was used. This program requires input in the following order, sepa-
rated by a spaces:

1. the path to the. obj file that stores the data of the triangle mesh

2. the path to the file where we want to save the calculated discrete
SDF

3. the size of the grid (number of cells) along the x-axis

4. the size of the grid along the y-axis

5. the size of the grid along the z-axis

The output of this program is .bin file that contains in the following
order:

1. the x, y, z grid dimensions (each of 32 bit int type)

2. the upper-right AABBs point of the triangle mesh (three 32 bit


float values)

3. the lower-left AABBs point of the triangle mesh (three 32 bit


float values)

4. the calculated SDF values for each grid cell (each of 32 bit float
type)

In order to create a discrete SDF for a triangle mesh, we chose the


simple brute force algorithm described in Chapter 2.2.1, which is the
slowest algorithm, but at the same time gives the most accurate result.
This particular algorithm was chosen since the performance of the
program that creates a discrete SDF is not essential for us, and the
aim of this thesis was not to implement a fast, complex algorithm for
constructing a discrete SDF.

24
3. Implementation

Figure 3.1: Computing the sign of the SDF at two points: 1 and 2. Before
intersecting with point 1, the ray intersects with the mesh three times,
so point one lies inside the mesh, and the sign of the SDF in it will be
negative. And before it intersects with point 2, the ray intersects with
the mesh six times, so point 2 lies outside the mesh, and the sign of
the SDF in it will be positive.

The main idea of the algorithm is as follows:

1. Calculate AABB for a triangle mesh.

2. Build a voxel grid by splitting the AABB into voxels according


to the input parameters of the program.

3. For each voxel in the grid:

(a) Construct a ray passing through the row in which the


voxel lies. The direction of the ray is {0.0, 0.0, 1.0}, origin of
the ray is {currentVoxel.x, currentVoxel.y, bottomLe f t.z −
epsilon}, where currentVoxel is the center of the current
voxel, bottomLe f t is lower-left point of the mesh AABB.
Moreover, we make a shift along the z-axis by a small
value of epsilon. This is done to correctly calculate the
intersection of the ray with the mesh triangles if the mesh
boundary exactly coincides in this voxel with the AABB
boundary.

25
3. Implementation

Figure 3.2: Visualization of a single 2d layer of a three-dimensional


discrete SDFs calculated using implemented program. There is dis-
crete SDF for Stanford bunny on the left and discrete SDF for Utah
teapot on the right.

(b) Find all intersections of the constructed ray in the pre-


vious step and the mesh triangles. Suppose the number
of intersections of this ray with triangles before the ray
intersects the current sample point (in which we compute
the value of the SDF) is even. In that case, this point lies
outside the mesh, and the sign of the SDF in it will be
positive. And if the number of intersections is odd, then
this point lies inside the mesh, and the sign of the SDF in
it will be negative (see Figure 3.1).
(c) Calculate the distance from the currentVoxel to the nearest
triangle of the mesh using brute force algorithm.
(d) Save the calculated SDF value.
To calculate the intersection of a ray and a triangle in this program,
the Möller–Trumbore ray-triangle intersection algorithm [44] is used.
Figures 3.2 show several discrete SDFs calculated using this algorithm.

3.2 Framework overview


This section provides a general description of the created framework.
The list of used tools is given. The general design of the framework is
described. And a quick guide on how to use it is given.

26
3. Implementation

3.2.1 Used tools


The following tools were used to create the framework:

• NVIDIA OptiX 7.2 [32] graphics engine for ray tracing using
CUDA technology 11.2

• OpenGL 4.3, GLEW 2.1.0#9 for visualization of the obtained


results

• GLFW 3.3.2 library for creating windows, creating OpenGL


context, and controlling input

• Mathematical library GLM 0.9.9.8

• Dear ImGui 1.81 [45] for creating a graphical interface

• The CPU program code is written in C++17.

3.2.2 Framework design


When the program is launched, the Application class is created. It
calls the Application :: Init() function and creates all the program’s
main classes: Window, Camera, Scene, RenderController, Gui (see Fig-
ure 3.3).

Figure 3.3: UML diagram of base classes.

27
3. Implementation

The Window class is responsible for creating the application win-


dow and for calling window callback functions.
The Camera class encapsulates all the camera control functionality.
The Scene class contains all the necessary information about the
entire geometry of the scene. When the Scene :: LoadSDF () function
is called, a new scene described by the discrete SDF function is loaded
from the .bin file. And after that, it is saved to a 3d texture, which is then
used in ray marching. When the Scene :: CreateAABBs() function is
called bounding volume of the loaded scene is created. The bounding
volume is constructed from the voxels of the discrete SDF function.
The RenderController class is responsible for rendering the scene
(read more below).
And the Gui class is responsible for the user interface.
The Application :: Run() function starts the main loop of the
program, which calls the Camera :: U pdate() function for updating
the camera position and direction, and calls the RenderController ::
Draw() function for drawing the scene and Gui :: U pdate() for draw-
ing the user interface.
The RenderController class contains CudaRenderer and OpenGLRen-
derer classes. The RenderController interacts with the Gui class and,
based on the data received from the user, calls one of two render-
ers, CudaRenderer or OpenGLRenderer, which will render the scene
geometry in different ways. Also, the RenderController class is respon-
sible for rendering 3d volume texture, rendering scene AABB (see
Chapter 3.2.3).
The CudaRenderer is responsible for rendering the geometry spec-
ified by SDF using OptiX or using CUDA. Since OptiX is based on the
CUDA, it was decided to combine both of these approaches into one
class. For more information about rendering the geometry of a given
SDF using OptiX, see Chapter 3.3, and using CUDA see Chapter 3.5.
OpenGLRenderer is responsible for rendering the geometry spec-
ified by SDF using OpenGL. For more information about rendering
using OpenGL, see Chapter 3.4.

28
3. Implementation

Figure 3.4: Framework interface.

3.2.3 Usage

All the requirements for running and compiling the framework are
specified in Appendix B. There is also additional instruction on how
to build the project (see Appendix C).
The framework has three windows (see Figure 3.4): the main win-
dow that displays the rendering result and two additional windows:
Rendering parameters and Stats. The Stats window displays the pa-
rameters of the used CPU and GPU, the rendering speed in frames
per seconds (fps) and in ms/ f rame.
To change the viewing angle of a scene user must click on the main
window and then use the following controls: WASD, Space, Left Shift
and mouse movement with the right mouse button held down.

29
3. Implementation

The user can switch between 6 possible rendering types:


• OptixTexture. OptiX raymarching using precomputed volumet-
ric textures is used.
• OptixRealSDF. OptiX raymarching with a direct evaluation of
SDFs is used.
• OpenGLTexture. OpenGL raymarching using precomputed vol-
umetric textures is used.
• OpenGLRealSDF. OpenGL raymarching with a direct evalua-
tion of SDFs is used.
• CUDAVolumeTexture. CUDA raymarching using precomputed
volumetric textures is used.
• CUDARealSDF. CUDA raymarching with a direct evaluation
of SDFs is used.
In addition, depending on the rendering used, the user is provided
with different sets of parameters to customize as they wish. Below is
a list of all the parameters that exist in this framework, with a brief
description of them and a list of rendering types in which they can be
used.
• Step number. It defines the maximum number of steps along
the ray in the ray marching algorithm. The higher the value of
this parameter, the longer it will take to render each frame, but
at the same time, the fewer visual artifacts will appear in the
resulting image. Available for all rendering types.
• Epsilon. Defines the ray marching error (as the SDF boundary,
we are satisfied with the boundary in which the SDF value
lies in the interval (0, epsilon)). The closer the value of this
parameter is to zero, the longer it will take to render each frame,
but at the same time, the fewer visual artifacts will appear in
the resulting image. Available for all rendering types.
• Visualize marching steps count. It allows visualizing the number
of ray marching steps for each pixel. Available for all rendering
types.

30
3. Implementation

• Ambient color, Diffuse color, Specular color, Shininess allow us to


customize the color of the objects in the scene.

• Background color allows us to customize the color of the back-


ground.

• OptiX program. The value of this parameter can be: ClosestHit,


Intersection, or RayGeneration. This parameter determines in
which of the OptiX pipeline programs we will calculate ray
marching. Available only for OptixTexture and OptixRealSDF
rendering types. This parameter was provided to compare how
the choice of the OptiX program affects the speed of image
rendering.

• Volume texture. Using this parameter, we can choose which


precomputed volumetric textures to render (for OptixTexture,
OpenGLTexture, CUDAVolumeTexture rendering types) and
determine based on which volume texture to build an AABB for
rendering the "real" SDF in the OptixRealSDF rendering type.
Available for OptixTexture, OptixRealSDF, OpenGLTexture, CUD-
AVolumeTexture rendering types (Examples of pre-calculated
volume textures can be found in the \data\sd f folder).

• Vizualize AABBs. It enables AABB visualization. Available for


OptixTexture, OpenGLTexture, OpenGLTexture, CUDAVolumeTex-
ture rendering types.

• Eps. It becomes available after enabling the VizualizeAABBs


parameter. This parameter is used to determine which voxels
from the 3d volume texture to render. All voxels that contain
a value greater than Eps will not be rendered. And all voxels
with values less than or equal to Eps will be used for render-
ing. Available in OptixTexture, OpenGLTexture, OpenGLTexture,
CUDAVolumeTexture rendering types. This option allows us to
immediately exclude from rendering those voxels that are far
from the surface of the SDF, which allows us to speed up the
rendering of images.

• Visualize volume texture. This parameter allows us to enable the


visualization of the used volume texture. Available for Optix-

31
3. Implementation

Texture, OpenGLTexture, OpenGLTexture, CUDAVolumeTexture


rendering types.

• Axis. It becomes available after enabling the Visualize volume


texture parameter. Allows us to select the axis along which to
render the 3d volume texture layers. Available for OptixTexture,
OpenGLTexture, OpenGLTexture, CUDAVolumeTexture rendering
types.

• SDF. Allows us to select an SDF that will be rendered using


ray marching with a direct evaluation of SDFs. We can choose
from Sphere, MengerSponge, SierpinskiTriangle, Julia, Mandelbulb,
DonutBox1, DonutBox2, DonutBox3, DonutBox4, Penguin. Avail-
able for OptixRealSDF, OpenGLRealSDF, CUDARealSDF render-
ing types.

3.3 Signed distance functions rendering using


NVIDIA OptiX
This part describes the rendering of SDF functions using NVIDIA
OptiX. The features of rendering with raymarching calculation in
the intersection program, closest hit program, and ray generation
program are described separately. For each of these three methods,
the Optix pipeline will be described in detail. The shortened version
of the pipeline used in this framework is shown in Figure 3.5.

3.3.1 Ray marching calculation in the intersection program


After calling optixLaunch() on the host, the ray generation program
__raygen __renderFrame() is called on the device for each thread. In this
program, by calling the optixTrace() function, a ray is created, and a
tracing is started, for which the intersection, any-hit, closest-hit, and so
on functions are then called, according to the pipeline (see Figure 3.5).
When generating the ray, we must specify tmin and tmax parameters
of the ray. After returning from the optixTrace function, we read the
resulting pixel color and save it into the framebuffer.
The __intersection __radiance() intersection program is called when
the search reaches the leaves of the accelerating structure with which

32
3. Implementation

Figure 3.5: NVIDIA OptiX 7 pipeline. Image from [36].

Figure 3.6: The OptiX pipeline. The traced ray intersects with the leaves
of the accelerating structure. Ray marching is calculated in intersection
program.

33
3. Implementation

this ray intersected. In our case, the leaves of the accelerating structure
are voxels of the discrete SDF. This program is called for each inter-
sected leaf (see Figure 3.6). In each such program, we find two points
of intersection of the voxel and the ray: the point of entry of the ray
into the voxel and the point of exit of the ray from the voxel. After that,
we run the ray marching algorithm with a start point equal to the first
point of intersection of the ray and voxel and with an endpoint equal
to the second point of intersection of the ray and voxel. Suppose ray
marching has found the intersection point of the ray with the SDF. In
that case, we use the optixReportIntersection() function, which takes as
input the distance t from the beginning of the ray to the point where
the intersection occurred, to report an intersection. If tmin ≤ t ≤ tmax
(tmin starting ray parameter and tmax ending ray parameter are set
during ray generation in the ray generation program), then the any-hit
program is called after the intersection program ends. Otherwise, the
miss program is called.
The __anyhit __radiance() any-hit program consists of calling a sin-
gle optixTerminateRay() function, which records hit and stops traversal.
Based on the parameter t passed from the intersection program in
the optixReportIntersection() function, OptiX decides which of the in-
tersection points is the nearest and calls the closest hit program for
it.
The __closesthit __radiance() closest-hit program calculates the color
of the pixel that the intersection occurred with and returns it to the
ray generation program.
The __miss __radiance() miss program is called when we do not
intersect with the scene geometry and returns the background color
to the ray generation program.

3.3.2 Ray marching calculation in the closest hit program

The ray generation program in this method (see Figure 3.7) is the
same as the ray generation program in ray marching calculation in
the intersection program. Here we also start the tracer ray from the
program by calling optixTrace(). And after returning from the optix-
Trace() function, we read the resulting pixel color and save it to the
framebuffer.

34
3. Implementation

Figure 3.7: The OptiX pipeline. The traced ray intersects with the leaves
of the accelerating structure. Ray marching is calculated in closest-hit
program.

On the other hand, the intersection program in this method is


very different from the intersection program described in the previous
method. It is also called when the search reaches the leaves of the
accelerating structure with which the ray intersected. However, now
we do not count the intersection of the ray and voxel and do not run
ray marching. Instead, inside each called intersection program, we
compute the coordinates of the center of the intersected voxel. And
as the parameter t, which optixReportIntersection() takes as input, we
pass length(rayOrigin − voxelCenter ), i.e., the distance from the be-
ginning of the ray to the center of the voxel with which the intersection
occurred. After that, the intersection programs are terminated, and
the any-hit program is called for each of them.
The any-hit program in this method also consists of calling a single
optixTerminateRay() function, which records hit, stops traversal. And
based on the parameter t, it is determined which of the intersection
points is the nearest. Since we passed the distance from the origin to
the center of the intersected voxel as the parameter t, the closest-hit

35
3. Implementation

program will be called for the closest voxel to the beginning of the ray
(i.e., closest to the camera).
In this method, ray marching is called in the closest-hit program.
As soon as this program has been called, we calculate the closest to
the camera intersection point of the voxel and the ray (i.e., point of
entry of the ray into the voxel). After that, we run the ray marching
algorithm from this intersection point to the end of the ray, which
we set in the ray generation program. If ray marching has found the
intersection point, we return to the ray generation program the color
of the scene object defined by the SDF. Otherwise, we return the
background color. It is important to note that in this method, we run
ray marching from the intersection point of the ray and voxel and to
the end of the traced ray, and not from one intersection point of the
ray and voxel and to the second intersection point, as in the previous
method. We do it because in this method the closest-hit program will
always be called for the closest voxel to the camera along the traced
ray. However, this voxel can be located far from the surface of the
object defined by the SDF. Suppose we compute ray marching only
in this voxel (as done in the previous method). In that case, we will
not find an intersection with the surface (since the SDF value in this
voxel will be very different from zero), and we will paint the pixel in
the background color, which will lead to the appearance of artifacts
(see Figure 3.8). For this reason, we restrict ray marching to the top
using ray length. This allows ray marching to go beyond the current
voxel further and continue searching for a point where the SDF value
will be close to zero.
The miss program is also called when we do not intersect with
the scene’s geometry and returns the background color to the ray
generation program.

3.3.3 Ray marching calculation in the ray generation program


Pipeline in this method is very simple. We use only the ray generation
program to render the geometry defined by the SDF. We do not call
the optixTrace() function, as in the two previous methods. And this
means that we do not use the intersection, closest-hit, any-hit, miss
programs, and accelerating structures. We construct a ray and compute
the intersection point of this ray and the AABB of the scene geometry.

36
3. Implementation

Figure 3.8: Artefacts in closest-hit method, if we compute ray marching


in one voxel.

After that, in the same program, we run ray marching from the nearest
to the camera intersection point and to the far one. If ray marching has
found an intersection point with a scene object, we write the object’s
color to the framebuffer. And if the intersection point was not found,
then we write the background color in the framebuffer.

3.4 Signed distance functions rendering using


OpenGL

This method renders the SDF using OpenGL vertex and fragment
shaders, passing all the necessary data to them via uniforms. There is
no complex OptiX pipeline, and we do not use acceleration structures,
which are a significant part of all programs utilizing OptiX.
Rendering the SDF using OpenGL is very similar to rendering the
SDF using the OptiX generation program. We also launch a ray at each
pixel and compute ray marching. And if we find the intersection point
of the ray with the object of the scene, then we paint this pixel in the
object’s color. And if the intersection point is not found, then we paint
the pixel in the background color.

37
3. Implementation

A significant difference is how we compute the start and endpoint


for ray marching. When rendering discrete SDFs, just like in the OptiX
ray generation method, we compute the intersection points of the
constructed ray and the AABB of the scene. And as the starting point
of ray marching, we take the intersection point closest to the camera,
and as the endpoint, the farthest intersection point. We need to do this
since the value of the discrete SDF, unlike the continuous one, is not
defined in the entire space but only in its part (where we pre-computed
it and saved it in the texture).
When we render the geometry given by an explicit SDF function,
we do not use any accelerating structures and start ray marching from
the beginning of the ray and limit it to some constant from above.
This was done to compare the performance of ray marching, which
uses OptiX acceleration structures and unoptimized ray marching on
OpenGL.

3.5 Signed distance functions rendering using CUDA


We use CUDA kernels (similar to OpenGL shaders and OptiX pro-
grams) to render the SDF in this method. The algorithm of this method
entirely coincides with the algorithm on OpenGL. OptiX is a tracing
engine based on CUDA. And it performs many actions inside im-
plicitly from the user, for example, building accelerating structures,
transforming the traced ray from one coordinate system to another,
traversing the BVH tree, and so on. Therefore, this method was im-
plemented to understand whether it makes performance overhead
compared to using “pure” CUDA.

38
4 Results
In this chapter we present the results of comparing OpenGL, CUDA,
OptiX intersection program (OptiX IS), OptiX closest-hit program
(OptiX CH), OptiX ray generation program (OptiX RG) methods for
visualizing SDF and discrete SDF on different scenes.

4.1 Test setup


The performance tests where made on a PC with the following char-
acteristics:

• Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz

• NVIDIA GeForce RTX 2060 GPU

• 16.0 GB DDR4 RAM

• Microsoft Windows 10 OS

To compare the methods that use a raymarching with a direct


evaluation of SDFs for SDFs visualization, we created 10 SDFs (see
Figure 4.1) of various complexity (the SDFs are listed in order of the
increasing complexity of the mathematical functions that describe
them):

1. Sphere with a center at the origin and a unit radius

2. The Menger sponge fractal equation (also known as the Menger


cube)

3. Sierpinski triangle fractal

4. Julia set fractal equation

5. Mandelbulb fractal

6. 1 box with six donuts created by us using constructive solid


geometry

39
4. Results

Figure 4.1: Scenes that were used to compare the methods that use
a raymarching with a direct evaluation of SDFs for SDF visualiza-
tion. Top row from left to right: Sphere, DonutBox2, SierpinskiTrian-
gle, DonutBox4, Penguin. Bottom row from left to right: DonutBox,
MengerSponge, DonutBox3, Julia, Mandelbulb.

7. 2 boxes with six donuts in each created by us using constructive


solid geometry

8. 3 boxes with six donuts in each created by us using constructive


solid geometry

9. 4 boxes with six donuts in each created by us using constructive


solid geometry

10. Penguin created by us using constructive solid geometry

To compare the methods that are based on a raymarching using


precomputed 3d volumetric textures, we calculated discrete sdf func-
tions of sizes 20x20x20, 50x50x50, 100x100x100 for the following 3d
models (see Figure 4.2):

1. Cornell box model (32 triangles, 48 vertices)

2. Sphere model (960 triangles, 482 vertices)

3. Stanford Bunny model (5002 triangles, 2503 vertices)

4. Cow model (5804 triangles, 2903 vertices)

5. Teapot model (8404 triangles, 4203 vertices)

40
4. Results

Figure 4.2: Scenes that were used to compare the methods that are
based on a raymarching using precomputed 3d volumetric textures.
Top row from left to right: Cornell box, Sphere, Stanford Bunny, Cow.
Bottom row from left to right: Teapot, LPS Head, Stanford Armadillo,
Dragon.

6. LPS Head model (17684 triangles, 8844 vertices)


7. Stanford Armadillo model (99976 triangles, 49990 vertices)
8. Dragon model (124943 triangles, 249881 vertices)
and discrete SDF functions of sizes 20x20x20, 50x50x50, 100x100x100,
200x200x200 for the following SDFs:
1. Menger sponge
2. Sierpinski triangle
3. Stanford Bunny model (5002 triangles, 2503 vertices)
4. Julia set
5. Mandelbulb
6. 1 box of donuts
7. Penguin
All tests were performed with a fixed screen resolution of 1280x1024.
For measurement tests, the timer is set at the beginning of the applica-
tion loop and stops at the end, and the average result for 1000 rendered
frames was calculated. The NVIDIA Nsight Compute profiler was
also used to analyze CUDA cores.

41
4. Results

4.2 The impact of the CUDA block size on


performance.
CUDA uses a large number of separate threads for calculations; often,
each calculated element corresponds to a single thread. All threads are
grouped into a hierarchy. The upper level (grid) corresponds to the
core and combines all the threads running this core. A grid is an array
of blocks. Each block is an array of threads. In this case, each block
is an entirely independent set of interacting threads. Threads from
different blocks can not interact with each other. From a hardware
point of view, all threads are divided into so-called warp blocks of
consecutive threads that are simultaneously (physically) executed and
interact with each other. The size of the warp is equal to 32. We want
to have as close to 32 active warps as possible. The number of active
threads in the warp is highly dependent on the number of threads
allocated per block. Therefore very small block sizes (e.g., 32 threads
per block) may limit performance due to occupancy. Occupancy is
defined as the ratio of active warps on a Streaming Multiprocessor
(SM) to the maximum number of active warps supported by the SM.
Very large block sizes, for example, 1024 threads per block, may also
limit performance if there are resource limits (e.g., registers per thread
usage or shared memory usage). It is usually best to use block sizes
from 128 to 512. The optimal number of threads per block depends
on the used GPU.
Different numbers of threads per CUDA block were tested, and
the 8x32 size was the fastest on the used GPU(see Table 4.1). So all
the other tests were done using the 8x32 block size of CUDA.

4.3 The impact of different types of CUDA


mathematical operations on performance and
visual result.
CUDA supports two types of runtime operations. Their names can
distinguish them: the names of functions of the first type begin with
two underscores __functionName() (such as _sinf(x)), and the names
of functions of the second type are written without underscores func-
tionName(). By default, CUDA uses the more precise but slower math-

42
4. Results

8x8 8x16 16x8 16x16 8x32 32x8 16x32 32x16 32x32


Sphere 2.056 2.334 2.024 2.016 2.016 2.246 2.250 2.252 2.315
MengerSponge 2.371 2.478 2.321 2.338 2.306 2.499 2.423 2.533 2.612
SierpinskiTriangle 3.403 3.444 3.401 3.426 3.395 3.516 3.457 3.541 3.617
Julia 2.790 2.780 2.794 2.823 2.759 2.995 2.827 3.023 3.071
Mandelbulb 6.053 5.900 5.919 5.914 5.898 6.107 6.049 6.092 6.273
Bunny 2.096 2.340 1.996 2.081 2.080 2.251 2.139 2.266 2.323
Teapot 2.102 2.332 2.033 2.091 2.088 2.267 2.145 2.272 2.309

Table 4.1: The dependence of the rendering speed (ms) of different


scenes on the size of the CUDA blocks. Less is better.

CUDA OptiX IS OptiX CH OptiX RG


Default math (ms) 10.231 8.513 15.538 14.012
Fast math (ms) 5.898 6.536 9.999 8.873
Acceleration, % 42.352 23.223 35.648 36.675

Table 4.2: Performance (ms) speed up when using fast math for Man-
delbulb scene.The bottom line shows the acceleration as a percentage.

ematical operations functionName(). The __functionName() operations


are faster as they map to fewer instructions on the device than the
functionName() operations, but on the other hand, they are less accu-
rate.
The performance increase after replacing mathematical functions
with their fast analogs is shown in Figure 4.3. The use of fast math-
ematical functions gives an increase in performance for all methods
using CUDA and on all tested scenes. As we can see in Figure 4.3,
usage of fast mathematical operations does not give a large increase
in performance on SDFs (Sphere, MengerSponge, SierpinskiTriangle)
without a lot of expensive functions such as sin, cos, pow, sqrt. But
on scenes whose SDF contains a lot of trigonometric functions (for
example, Mandelbulb), the acceleration is much more significant (see
Table 4.2)

43
4. Results

Figure 4.3: Performance using default math and fast math (less is
better). (*) - uses fast math.

44
4. Results

And although fast mathematical operations are less accurate, their


usage does not affect the visual result (see Figure 4.4). So in all further
tests, methods using fast mathematical operations will be used.

(a) Default math. (b) Fast math.

Figure 4.4: Visualization of Mandelbulb fractal using different math


types.

4.4 Comparison of different methods of visualization


of SDF functions
Usage of different methods (OpenGL, CUDA, OptiX IS, OptiX CH,
OptiX RG) based on a raymarching with a direct evaluation of SDFs
does not affect the visual result (see Figure 4.5).
However, the performance, on the contrary, depends very much
on the used method. Table 4.3 shows the results of comparing the
performance of the five proposed methods on ten different scenes.
The testing objects were rendered at such a distance that they fit en-
tirely on the screen, and at the same time, they were close so we can
see them in detail. Also, the OptiX acceleration structure in this case
consists only of scene objects AABBs (see Figure 4.6). As we can see,
on simple SDF with a small number of arithmetic operations (Sphere,
MengerSponge), the fastest results were obtained using "pure" CUDA.
The OpenGL method shows the fastest result (and by a considerable
margin) on all other scenes. In scenes with medium complexity of the
SDFs, CUDA shows the second result after OpenGL, and the methods
based on OptiX are the slowest, although with a slight loss in per-
formance. However, in scenes with complex SDF functions, CUDA
begins to lose very much in performance compared to methods based
on OpenGL or OptiX.

45
4. Results

(a) OptiX IS. (b) OptiX CH. (c) OptiX RG.

(d) OpenGL. (e) CUDA.

Figure 4.5: Visual results of different methods of visualization of SDF


functions.

(a) Julia AABB. (b) Donut box AABBs.

Figure 4.6: Examples of SDFs AABBs using in OptiX.

There are several reasons why CUDA is slow. When calculating


complex SDF functions, we have to perform a lot of mathematical
operations (see Figure 4.7) on one thread. Therefore, one thread re-
quires a large number of registers. But the total number of registers
allocated per block is limited to 255. And to achieve 100% occupancy,

46
4. Results

OpenGL CUDA Optix IS Optix CH Optix RG


Sphere 2.392 2.321 4.183 3.041 2.810
Menger sponge 2.605 2.573 5.563 6.033 5.533
Sierpinski triangle 3.783 3.961 5.713 6.301 4.453
Julia 2.794 3.723 5.901 5.303 5.542
Mandelbulb 2.753 10.231 8.513 15.538 14.013
Penguin 3.206 75.164 24.637 59.531 31.123
DonutBox1 10.750 53.713 13.213 33.012 30.123
DonutBox2 19.593 112.317 30.513 82.304 67.013
DonutBox3 33.531 193.013 53.273 143.332 114.813
DonutBox4 46.150 248.803 89.071 305.123 269.713

Table 4.3: Performance (ms) results of different methods of visualiza-


tion of SDF functions.

we can have at most 32 registers. Usually, the longer the computation,


the more the performance degrades because register usage grows.
And so the number of active threads decreases, and low occupation
occurs. For example, when calculating the DonatBox4 SDF function,
one thread uses from 76 registers, and the occupation drops to 60%,
and instead of the possible active 32 threads, we use only 20.
Also, many threads when calculating ray marching are stall with-
out performing instructions (see Figure 4.8). The reason for this is the
following. The thread block is executed on the multiprocessor in parts

Figure 4.7: The ratio of the four most frequently used instructions
when calculating the DonutBox4 SDF. FFMA - fp32 multiply and add,
FMUL - fp32 multiply, IMAD- integer multiply and add, FADD - fp32
add. Charts are from the NVIDIA Nsight Compute profiler.

47
4. Results

Figure 4.8: Warp state for DonutBox4 scene. Charts are from the
NVIDIA Nsight Compute profiler.

(warps). Multiple warps can be executed simultaneously on a single


multiprocessor. Tasks inside warp are executed in Single Instruction
Multiple Data (SIMD) style, i.e., only one instruction can be executed
simultaneously in all threads inside warp. And until all threads in
the warp are terminated, the multiprocessor cannot replace the active
executable warp with a new one. Thus, if 31 threads in the warp have
already finished their calculations, and 1 is performing very long cal-
culations, 31 threads in the warp will stall until that thread finishes
its calculations. In the ray marching task, each thread corresponds to
one traced ray. But each ray has its own number of iterations of ray
marching. Since one ray can intersect the SDF, another ray can pass
closely parallel to its surface, and a third can miss and terminate in
several iterations. Thus, some threads are completed much faster than
others, and they have to wait for their completion and stall. And the
more calculations the threads need to perform, the longer the faster
threads will stall. For this reason, CUDA takes a long time to calculate
complex SDFs containing a large number of mathematical operations.
Since OptiX is based on CUDA, it suffers from the same problems
listed above. But the use of OptiX acceleration structures gives a per-
formance gain compared to the implementation on pure CUDA. Since
using them, we reduce the number of pixels, for which we need to
calculate ray marching, a lot, and we also reduce the length of the ray
along which we need to calculate ray marching.
If we compare the three OptiX-based methods separately, the fastest
method for rendering scenes with simple SDF (Sphere, MengerSponge,
Julia) is OptiX RG. OptiX IS and OptiX CH loses out to this method
because they use Optix’s accelerating structures. The overhead of
searching through the structure is not compensated by the SDF cal-
culation time, since in these examples, they are very simple. But on

48
4. Results

more complex SDFs, OptiX IS is the fastest because using this method,
we very much limit the length of the ray, along which ray marching
needs to be calculated, to the boundaries of the AABB of the scene
object, which leads to a decrease in the number of ray marching steps.
And in OptiX CH, we restrict the traced ray to only one side of the
AABB side closest to the camera, and therefore we have to count more
ray marching steps (see Figure 4.9). And even though in OptiX RG we
also limit ray marching on both sides to the same values as in OptiX IS
(which means that the number of ray marching steps also matches),
OptiX IS for some internal implementation reason calculates complex
SDF much faster.

(a) OptiX CH. (b) OptiX IS. (c) OptiX RG.

(d) OpenGL. (e) CUDA.

Figure 4.9: Number of ray marching steps for different visualisation


methods. Increasing the number of steps corresponds to a color change
from dark blue to light blue, white, and then red. Dark red corresponds
to the maximum number of steps (20 steps, in this case). A larger
number of iterations leads to lower performance.

49
4. Results

Figure 4.10: Visual results for different grid sizes of discrete SDF.

4.5 Comparison of different methods of visualization


of discrete SDF functions

In this part, we analyzed the results of using various visualization


methods for rendering discrete SDF represented as a three-dimensional
voxel grid.
The quality of the visual results depends very much on the size of
the grid in which the values of the discrete SDF are stored. Figure 4.10
shows comparisons of visual results for different grid sizes. As we can
see, the smaller the voxel size, the better the visual result we get. The
significant disadvantage of using visualization methods based on the
usage of OptiX acceleration structures is that the maximum number of
primitives per acceleration structure is limited to 229 (this value may
vary depending on the GPU generation). This means that the voxel
grid with 512x512x512 size already almost entirely occupies all the
provided memory if we will use in accelerating structure all voxels.
In Figure 4.11 we can see that OpenGL, CUDA, OptiX CH, OptiX
RG methods, based on ray marching using precomputed volumetric
textures, give the same visual results. The visual result is different only
for the OptiX IS method. In this method, such visual artifacts, as shown

50
4. Results

in Figure 4.12, periodically appear. The reason for the appearance of


such artifacts is that the traced ray passes exactly between two voxels
of the accelerating structure, and so OptiX sometimes does not detect
the intersection with the object.

(a) OptiX IS. (b) OptiX CH. (c) OptiX RG.

(d) OpenGL. (e) CUDA.

Figure 4.11: Visual results of different methods of visualization of


discrete SDF functions.

Table 4.4 and Figure 4.13 show the results of performance testing
on different scenes for different methods. As we can see, the slowest
of all methods is OpenGL. The performance for all other methods is
about the same. But the fastest ones are OptiX CH and CUDA. Also,
when using the OptiX IS method, there is a decrease in performance
as the number of elements in the accelerating structure increases. This
is because we count the ray marching for each voxel intersected by
the ray in this method. And then, we select the smallest value from
the obtained values. Therefore, as the number of voxels increases,
the number of mathematical operations performed increases, and
therefore the performance decreases.

51
4. Results

Figure 4.12: Visual artifacts in OptiX IS method.

52
4. Results

Figure 4.13: Performance (ms) results of different methods of visual-


ization of discete SDF functions.

53
4. Results

OpenGL CUDA Optix IS Optix CH Optix RG


Teapot(15666) 3.602 3.383 3.394 3.353 3.339
Armadillo(18750) 3.609 3.354 3.354 3.332 3.363
Cow(24907) 3.577 3.352 3.402 3.331 3.336
Dragon(30567) 3.636 3.347 3.353 3.324 3.377
Bunny(31534) 3.597 3.328 3.423 3.380 3.350
Head(41365) 3.598 3.337 3.341 3.312 3.412
Sphere(46736) 3.634 3.338 3.424 3.341 3.383
Penguin(77919) 3.613 3.369 3.386 3.337 3.375
Cornell box(111569) 3.603 3.342 3.451 3.373 3.368
Sierpinski (546904) 3.607 3.358 4.341 3.835 3.421
Julia (1253886) 3.614 3.342 5.318 4.217 3.415

Table 4.4: Performance (ms) results of different methods of visual-


ization of discete SDF functions. The number of primitives in the
accelerating structure is indicated in parentheses.

54
5 Conclusion
In order to evaluate the suitability of the OptiX engine for rendering
signed distance functions, a framework based on the OptiX engine and
OpenGL was created. This framework supports the visualization of
SDF and discrete SDF. To visualize both types of SDF functions, three
different methods based on OptiX were written : OptiX IS method with
ray marching implementation in the intersection program, OptiX CH
method with ray marching implementation in the closest hit program,
and OptiX RG method with ray marching implementation in the ray
generation program. Additionally, methods for visualizing SDF and
discrete SDF using "pure" CUDA were implemented. Moreover, a
separate program was written for calculating the discrete SDF for
triangular meshes.
Using created framework, all of the above methods were compared
in terms of their performance and the quality of the visual result.
All the proposed methods for visualizing SDF give the same visual
results, but the performance of the methods is very different. So for
visualization of simple SDF functions, CUDA or OpenGL are best
suited. And for visualizing more complex SDFs, OpenGL is best suited.
All methods based on OptiX have much lower performance, so they
are not suitable for visualizing SDF functions. Thus, it is best to use
OpenGL to visualize the SDF, which gives good results on all the
tested scenes, regardless of their complexity.
For the visualization of discrete SDF, "pure" CUDA is best suited.
Its performance is almost independent of the size of the visualized
SDF. OptiX-based methods are not suitable for visualizing discrete
SDF functions, even though on small discrete SDFs, their performance
is very close to that of CUDA. Because with an increase in the size of
discrete SDF, the performance of these methods drops significantly.
And OptiX becomes unsuitable for rendering large discrete SDFs.
Thus, OptiX is not suitable for visualizing geometry defined by
SDF.

55
A List of Electronic Attachments
Along with the thesis, the attachments are included in Application.zip
file consisting of:

• folder bin(win64) contains a main.exe file for running the ap-


plication

• folder cmake contains cmake scripts

• folder data/glsl contains glsl shaders

• folder data/images contains some images created using the


application

• folder data/models contains .obj files of used triangle meshes

• folder data/sd f contains precomputed discrete SDFs

• folder nsight_compute_reports contains reports created using


NVIDIA Nsight Compute

• folder src contains source code

• folder third_party contains third party source code

57
B Requirements for running binaries and for
compiling
For running binaries:

• Any NVIDIA GPUs of Compute Capability 5.0 (Maxwell) or


higher.

• NVIDIA OptiX 7.2.

• OptiX requires a driver numbered 455 or higher.

• Windows 10 64-bit.

For compiling:

• Any NVIDIA GPUs of Compute Capability 5.0 (Maxwell) or


higher.

• NVIDIA OptiX 7.2 (+ driver numbered 455 or higher).

• CUDA toolkit 11.2.

• C/C++ Compiler. A compiler compatible with the used CUDA


Toolkit is required.

• Windows 8.1/10 64-bit.

59
C Instruction for building the framework
1. Install everything required for compiling from Appendix B

2. Install vcpkg

3. Install glew, glfw, glm, tinyobjloader using vcpkg

4. Install cmake

5. Specify path to vcpkg, CUDA and OptiX in cmake (variables


PATH _TO _VCPKG, OptiX _INCLUDE, OptiX _INSTALL _DIR,
CUDA _SDK _ROOT _DIR). You can use cmake -D var=value,
or cmake GUI.

6. Compile

61
Bibliography
1. Website [online]. Inigo Quilez [visited on 2021-04-23]. Available
from: https://www.iquilezles.org.
2. OSHER, Stanley; FEDKIW, Ronald. Signed Distance Functions.
In: Level Set Methods and Dynamic Implicit Surfaces. New York,
NY: Springer New York, 2003, pp. 17–22. isbn 978-0-387-22746-7.
Available from doi: 10.1007/0-387-22746-6_2.
3. FRISKEN, Sarah; PERRY, Ronald. Designing with Distance Fields.
In: 2005, vol. 2005, pp. 58–59. Available from doi: 10.1109/SMI.
2005.16.
4. HAUGO, Simen; STAHL, A.; BREKKE, E. Continuous Signed
Distance Functions for 3D Vision. 2017 International Conference on
3D Vision (3DV). 2017, pp. 116–125.
5. CRANE, Keenan. Ray Tracing Quaternion Julia Sets on the GPU
[online]. [N.d.] [visited on 2021-04-23]. Available from: https:
/ / www . cs . cmu . edu / ~kmcrane / Projects / QuaternionJulia /
paper.pdf.
6. Syntopia. Generative art, 3D fractals, creative computing [online].
Mikael Hvidtfeldt Christensen, 2018 [visited on 2021-04-23].
Available from: http://blog.hvidtfeldts.net.
7. Mandelbulb.com [online]. Matthew Haggett [visited on 2021-04-
23]. Available from: https://www.mandelbulb.com.
8. HART, John. Sphere Tracing: A Geometric Method for the An-
tialiased Ray Tracing of Implicit Surfaces. The Visual Computer.
1995, vol. 12. Available from doi: 10.1007/s003710050084.
9. KNOLL, A.; HIJAZI, Y.; KENSLER, A.; SCHOTT, Mathias;
HANSEN, C.; HAGEN, H. Fast Ray Tracing of Arbitrary Implicit
Surfaces with Interval and Affine Arithmetic. Computer Graphics
Forum. 2009, vol. 28.
10. JONES, M.W.; BAERENTZEN, J.A.; SRAMEK, M. 3D distance
fields: a survey of techniques and applications. IEEE Transactions
on Visualization and Computer Graphics. 2006, vol. 12, no. 4, pp. 581–
599. Available from doi: 10.1109/TVCG.2006.56.

63
BIBLIOGRAPHY

11. PAYNE, B.A.; TOGA, A.W. Distance field manipulation of surface


models. IEEE Computer Graphics and Applications. 1992, vol. 12,
no. 1, pp. 65–71. Available from doi: 10.1109/38.135885.
12. STRAIN, John. Fast Tree-Based Redistancing for Level Set Com-
putations. Journal of Computational Physics. 1999, vol. 152, no. 2,
pp. 664–686. issn 0021-9991. Available from doi: https://doi.
org/10.1006/jcph.1999.6259.
13. BORGEFORS, Gunilla. Distance transformations in digital im-
ages. Computer Vision, Graphics, and Image Processing. 1986, vol. 34,
no. 3, pp. 344–371. issn 0734-189X. Available from doi: https:
//doi.org/10.1016/S0734-189X(86)80047-0.
14. ROSENFELD, Azriel; PFALTZ, John L. Sequential Operations in
Digital Picture Processing. J. ACM. 1966, vol. 13, no. 4, pp. 471–
494. issn 0004-5411. Available from doi: 10.1145/321356.321357.
15. SVENSSON, Stina; BORGEFORS, Gunilla. Digital Distance Trans-
forms in 3D Images Using Information from Neighbourhoods up
to 5×5×5. Computer Vision and Image Understanding. 2002, vol. 88,
no. 1, pp. 24–53. issn 1077-3142. Available from doi: https://
doi.org/10.1006/cviu.2002.0976.
16. DANIELSSON, Per-Erik. Euclidean distance mapping. Computer
Graphics and Image Processing. 1980, vol. 14, no. 3, pp. 227–248. issn
0146-664X. Available from doi: https://doi.org/10.1016/0146-
664X(80)90054-4.
17. MULLIKIN, James C. The vector distance transform in two and
three dimensions. CVGIP: Graphical Models and Image Processing.
1992, vol. 54, no. 6, pp. 526–535. issn 1049-9652. Available from
doi: https://doi.org/10.1016/1049-9652(92)90072-6.
18. SATHERLEY, Richard; JONES, Mark W. Vector-City Vector Dis-
tance Transform. Computer Vision and Image Understanding. 2001,
vol. 82, no. 3, pp. 238–254. issn 1077-3142. Available from doi:
https://doi.org/10.1006/cviu.2001.0915.
19. CUISENAIRE, O; MACQ, B. Fast Euclidean Distance Transforma-
tion by Propagation Using Multiple Neighborhoods. Computer
Vision and Image Understanding. 1999, vol. 76, no. 2, pp. 163–172.

64
BIBLIOGRAPHY

issn 1077-3142. Available from doi: https://doi.org/10.1006/


cviu.1999.0783.
20. SETHIAN, J A. A fast marching level set method for monoton-
ically advancing fronts. Proceedings of the National Academy of
Sciences. 1996, vol. 93, no. 4, pp. 1591–1595. issn 0027-8424. Avail-
able from doi: 10.1073/pnas.93.4.1591.
21. BÆRENTZEN, Jakob Andreas. On the implementation of fast march-
ing methods for 3D lattices. 2001.
22. PERLIN, K.; HOFFERT, E. M. Hypertexture. SIGGRAPH Comput.
Graph. 1989, vol. 23, no. 3, pp. 253–262. issn 0097-8930. Available
from doi: 10.1145/74334.74359.
23. AMANATIDES, John. Ray Tracing with Cones. SIGGRAPH Com-
put. Graph. 1984, vol. 18, no. 3, pp. 129–135. issn 0097-8930. Avail-
able from doi: 10.1145/964965.808589.
24. CRASSIN, Cyril. GigaVoxels: A Voxel-Based Rendering Pipeline For
Efficient Exploration Of Large And Detailed Scenes. [Online]. 2011
[visited on 2021-04-23]. Available from: https : / / maverick .
inria . fr / Publications / 2011 / Cra11 / CCrassinThesis _ EN _
Web.pdf. PhD thesis. Universite de Grenoble.
25. CRASSIN, Cyril; NEYRET, Fabrice; SAINZ, Miguel; GREEN, Si-
mon; EISEMANN, Elmar. In: SIGGRAPH 2011 : Technical Talk
[SIGGRAPH 2011 : Technical Talk]. ACM SIGGRAPH, 2011.
Available also from: https://research.nvidia.com/sites/
default / files / publications / GIVoxels - pg2011 - authors .
pdf.
26. Dynamic Occlusion with Signed Distance Fields. Advances in Real-
Time Rendering, SIGGRAPH 2015. [Online]. Wright, D., 2015 [vis-
ited on 2021-04-23]. Available from: https://cutt.ly/XbXsBYk.
27. DirectX Raytracing (DXR) Functional Spec [online]. DirectX [vis-
ited on 2021-04-27]. Available from: https://microsoft.github.
io/DirectX-Specs/d3d/Raytracing.html.
28. DIRECTX 12 ULTIMATE [online]. NVIDIA [visited on 2021-04-
27]. Available from: https://developer.nvidia.com/directx.
29. Vulkan [online]. NVIDIA [visited on 2021-04-27]. Available from:
https://developer.nvidia.com/vulkan.

65
BIBLIOGRAPHY

30. Vulkan overview [online]. Khronos [visited on 2021-04-27]. Avail-


able from: https://www.khronos.org/vulkan/.
31. NVIDIA OptiX 7.3 – Programming Guide [online]. NVIDIA [vis-
ited on 2021-04-27]. Available from: https://raytracing-docs.
nvidia.com/optix7/guide/index.html#preface#preface.
32. NVIDIA OPTIX RAY TRACING ENGINE [online]. NVIDIA [vis-
ited on 2021-04-27]. Available from: https://developer.nvidia.
com/optix.
33. FELBECKER, Robert; RASCHKOWSKI, Leszek; KEUSGEN, Wil-
helm; PETER, Michael. Electromagnetic wave propagation in
the millimeter wave band using the NVIDIA OptiX GPU ray
tracing engine. In: 2012 6th European Conference on Antennas and
Propagation (EUCAP). 2012, pp. 488–492. Available from doi:
10.1109/EuCAP.2012.6206198.
34. PARKER, Steven G.; FRIEDRICH, Heiko; LUEBKE, David; MOR-
LEY, Keith; BIGLER, James; HOBEROCK, Jared; MCALLISTER,
David; ROBISON, Austin; DIETRICH, Andreas; HUMPHREYS,
Greg; MCGUIRE, Morgan; STICH, Martin. GPU Ray Tracing.
Commun. ACM. 2013, vol. 56, no. 5, pp. 93–101. issn 0001-0782.
Available from doi: 10.1145/2447976.2447997.
35. PARKER, Steven G.; BIGLER, James; DIETRICH, Andreas;
FRIEDRICH, Heiko; HOBEROCK, Jared; LUEBKE, David;
MCALLISTER, David; MCGUIRE, Morgan; MORLEY, Keith;
ROBISON, Austin; STICH, Martin. OptiX: A General Purpose
Ray Tracing Engine. ACM Trans. Graph. 2010, vol. 29, no. 4. issn
0730-0301. Available from doi: 10.1145/1778765.1778803.
36. How to Get Started with OptiX 7 [online]. Keith Morley [visited on
2021-04-27]. Available from: https://developer.nvidia.com/
blog/how-to-get-started-with-optix-7/.
37. Siggraph 2019/2020 OptiX 7/7.1 Course Tutorial Code [online]. Ingo
Wald [visited on 2021-04-27]. Available from: https://gitlab.
com/ingowald/optix7course.
38. Siggraph 2019/2020 OptiX 7/7.1 Course Tutorial Code [online]. Ingo
Wald [visited on 2021-04-27]. Available from: https://github.
com/ingowald/optix7course.

66
BIBLIOGRAPHY

39. OptiX developer forum [online]. NVIDIA [visited on 2021-04-27].


Available from: https://forums.developer.nvidia.com/c/
professional-graphics-and-rendering/advanced-graphics/
optix/167.
40. NVIDIA Ray Tracing Documentation [online]. NVIDIA [visited on
2021-04-27]. Available from: https://raytracing-docs.nvidia.
com.
41. Get started with real time ray tracing [online]. NVIDIA [visited on
2021-04-27]. Available from: https://developer.nvidia.com/
rtx/raytracing.
42. OpenMP. The OpenMP API specification for parallel programming.
[Online]. OpenMP [visited on 2021-05-03]. Available from:
https://www.openmp.org.
43. tinyobjloader [online]. tinyobjloader [visited on 2021-05-03].
Available from: https : / / github . com / tinyobjloader /
tinyobjloader.
44. MÖLLER, Trumbore. Fast, Minimum Storage Ray/Triangle Inter-
section [online]. [N.d.] [visited on 2021-05-03]. Available from:
http://www.graphics.cornell.edu/pubs/1997/MT97.pdf.
45. Dear ImGui [online]. Omar Cornut [visited on 2021-05-03]. Avail-
able from: https://github.com/ocornut/imgui.

67

You might also like