Hardware Accelerated High Quality Recons

Hardware-Accelerated High-Quality Reconstruction of Volumetric Data
on PC Graphics Hardware
Markus Hadwiger Thomas Theußl y Helwig Hauser Eduard Gröller y
VRVis Research Center y Institute of Computer Graphics and Algorithms

Vienna, Austria Vienna University of Technology, Austria
Abstract used in practice. Using such high-order reconstruction filters in

software indeed easily leads to exploding rendering times.
We describe a method for exploiting commodity 3D graphics hard- In this paper, we will show how to exploit consumer 3D graph-
ware in order to achieve hardware-accelerated high-quality filtering ics hardware for accelerating high-order reconstruction of volumet-
with arbitrary filter kernels. Our approach is based on reordering ric data. The presented approach works in one, two, and three di-
the evaluation of the filter convolution sum to accommodate the mensions, respectively. The cases most interesting to us up to now
way the hardware works. We exploit multiple rendering passes to- are the reconstruction of images or slices in two dimensions, and
gether with the capability of current graphics hardware to index into reconstruction of oblique slices through volumetric data. An inter-
several textures at the same time (multi-texturing). esting application of such slices is to use them for direct volume
The method we present is applicable in one, two, and three di- rendering. Standard texture mapping hardware can be exploited for
mensions. The cases we are most interested in up to now are two- volume rendering by blending a stack of texture-mapped slices on
dimensional reconstruction of object-aligned slices through volu- top of each other [3]. These slices can be either viewport-aligned,
metric data, and three-dimensional reconstruction of arbitrarily ori- which requires 3D texture mapping hardware [9, 23], or object-
ented slices. As a fundamental building block, the basic algorithm aligned, where 2D texture mapping hardware suffices [19]. Our
can be used in order to directly render an entire volume by blend- high-quality filtering approach can be used to considerably improve
ing a stack of slices reconstructed with high quality on top of each reconstruction quality of the individual slices in both of these cases,
other. thus increasing the quality of the entire rendered volume.
We have used bicubic and tricubic B-splines and Catmull-Rom The method we present works by evaluating the filter convolu-
splines, as well as windowed sinc filters, as reconstruction kernels. tion sum in a different order than the one naturally used when filter-
However, it is important to emphasize that our approach has no ing in software. Instead of gathering the contribution of all neigh-
fundamental restrictions with regard to the filters that can be used. boring input samples in order to calculate a single output sample
at a time, we distribute the contribution of a single input sample to
Keywords: filtering, high-order reconstruction, slice rendering, all neighboring output samples. These contributions are added up
volume rendering, 3D hardware acceleration over multiple rendering passes in hardware, eventually yielding the
same result as the standard evaluation of the convolution sum. In
order to improve performance, each pass employs multiple textures
1 Introduction at the same time, for multiplying input samples by corresponding
values in the filter kernel.
A fundamental problem in computer graphics is how to reconstruct The structure of the paper is as follows. After discussing related
images and volumes from sampled data. The process of deter- work in section 2, we describe our method for hardware-accelerated
mining the original continuous data–or at least a sufficiently ac- high-order filtering in section 3. We discuss the basic idea and prin-
curate approximation–from discrete input data is usually called re- ciple in sections 3.1 and 3.2, before applying it to reconstruction of
construction. In volume visualization, the input data is commonly images and axis-aligned slices of volumetric data in section 3.3,
given at evenly spaced discrete locations in three-space. In theory, and arbitrarily oriented slices in section 3.4. In the latter case, al-
the original volumetric data can be reconstructed entirely, provided though the target is two-dimensional, reconstruction has to be done
certain conditions are honored (cf. sampling theorem [18]). In re- in three dimensions. Three-dimensional reconstruction for three-
ality, of course, filtering is always a trade-off between performance dimensional images is described in section 3.5, where we describe
and reconstruction quality. This is especially true for hardware im- how our approach can be applied to volume rendering using a stack
plementations. Reconstruction in graphics hardware is usually done of slices. Section 4 describes some issues of current commodity
by using simple linear interpolation. In the most common case graphics hardware, especially with respect to what our method re-
of two-dimensional texture mapping hardware, bilinear filtering is quires or can make use of. Section 5 presents results. Section 6
employed in order to reconstruct texture images. Analogously, 3D summarizes what we have presented, tries to draw some general
graphics accelerators that are able to use three-dimensional textures conclusions, and wraps up with future work.
filter them using trilinear interpolation.
Linear interpolation is fast, but introduces significant reconstruc-
tion artifacts. On the other hand, a lot of research in the last few 2 Related work
years has been devoted to improving reconstruction by using high-
order reconstruction filters [8, 12, 13, 14, 21]. Among the inves- The work relevant to ours is twofold. First, we are using filter theory
tigated filters are piecewise cubic functions, as well as windowed and results on what types of filters are good in which situations.
ideal reconstruction functions (windowed sinc filters). However, Second, since we propose several methods and applications making
these functions were usually deemed to be too slow in order to be heavy use of hardware acceleration, work that has been done in
these areas is also very relevant.
fHadwiger,Hauserg@VRVis.at, http://www.VRVis.at/vis/ There is a vast amount of literature on filter analysis and de-
y ftheussl,groellerg@cg.tuwien.ac.at, http://www.cg.tuwien.ac.at/home/ sign. Keys [7] derived a family of cardinal splines, and showed, us-
ing a Taylor series expansion, that the Catmull-Rom spline, which 3 Hardware-Accelerated High-Order Re-
is in this family of cubic splines, is numerically most accurate. construction
Mitchell and Netravali [11] derived another family of cubic splines
quite popular in computer graphics, the BC-splines. Marschner This section presents our approach for filtering input data by con-
and Lobb [8] compared linear interpolation, cubic splines, and win- volving it with an arbitrary filter kernel stored in multiple texture
dowed sincs. They concluded that linear interpolation is certainly maps, exploiting low-cost 3D graphics hardware. This method can
the cheapest option and will likely remain the method of choice for be used in order to achieve hardware-accelerated high-quality re-
time critical applications. Cubic splines perform quite well, espe-
cially those BC-splines that fulfill 2C + B = 1 which include
construction of volumetric data. The basic principle is most easily
illustrated in the case of one-dimensional input data (sections 3.1
the Catmull-Rom spline. Windowed sincs can provide arbitrar- and 3.2), and can also be applied in two (section 3.3), as well as
ily good reconstruction, although they are much more expensive. three dimensions (sections 3.4 and 3.5).
Möller et al. [12] provide a general framework for analyzing fil-
ters in spatial domain again using a Taylor series expansion of the
convolution sum. They use this framework to analyze the cardinal 3.1 Basic principle
splines [12], affirming that the Catmull-Rom splines are numeri- Since we want to be able to employ arbitrary filter kernels for re-
cally most accurate, and the BC-splines [13], affirming that the fil-
ters with parameters fulfilling 2C + B = 1 are numerically most
construction, we have to evaluate the well-known filter convolution
X
sum:
accurate. Since numerical considerations alone may not always be
appropriate, they also show how to use their framework to design bx + m
accurate and smooth reconstruction filters [14]. Turkowsky [22] g(x) = f [x℄ h(x) = f [i℄h(x i) (1)
used windowed ideal reconstruction filters for image resampling i =bx m +1
tasks. Theußl et al. [21] used the framework developed by Möller et
al. [12] to assess the quality of windowed reconstruction filters and This equation describes a convolution of the discrete input samples
to derive optimal values for the parameters of Kaiser and Gaussian f [x℄ with a continuous reconstruction filter h(x). In the case of
windows. reconstruction, this is essentially a sliding average of the samples
and the reconstruction filter. If the function to be reconstructed was
bandlimited and sampled properly, using the sinc function as recon-
In the field of hardware-accelerated visualization, Meißner et struction filter would result in perfect reconstruction. However, the
al. [10] have done a thorough comparison of the four prevalent ap- sinc function is impracticable because of its infinite extent. There-
proaches to volume rendering, one of them being the use of tex- fore, in practice finite approximations to the sinc filter are used,
ture mapping hardware. Hardware-accelerated texture mapping has which yield–depending on the properties of these approximations–
been used to accelerate volume rendering for quite some time. The results of varying quality. In equation 1, the (finite) half-width of
original SGI RealityEngine [1] introduced 3D textures. They can be the filter kernel is denoted by m.
used effectively for rendering volumes at interactive frame rates, as In order to be able to exploit standard graphics hardware for per-
shown by Cabral et al. [3], and Cullip and Neumann [4], for exam- forming this computation, we do not use the evaluation order usu-
ple. These early approaches all stored a preshaded volume in a 3D ally employed, i.e., in software-based filtering. The convolution
texture, which had to be recomputed whenever the viewing parame- sum is commonly evaluated in its entirety for a single output sam-
ters, lighting conditions, or transfer function were changed. Wester- ple at a time. That is, all the contributions of neighboring input
mann and Ertl [23] introduced several approaches how to accelerate samples (their values multiplied by the corresponding filter values)
volume rendering with graphics hardware that is capable of three- are gathered and added up in order to calculate the final value of a
dimensional texture mapping and supports a color matrix. They certain output sample. This “gathering” of contributions is shown
were able to render monochrome images of shaded isosurfaces at in figure 1(a). This figure uses a simple tent filter as an example. It
interactive rates–without explicitly extracting any geometry–and shows how a single output sample is calculated by adding up two
performed shading on-the-fly, without requiring the volume tex- contributions. The first contribution is gathered from the neigh-
ture to be updated. Meißner et al. [9] have extended this approach boring input sample on the left-hand side, and the second one is
for semi-transparent volume rendering, and are able to apply both gathered from the input sample on the right-hand side. In the case
classification and colored shading at run-time without changing the of this example, the convolution results in linear interpolation, due
volume texture itself. Their approach additionally requires the SGI to the tent filter employed. For generating the desired output data
pixel textures extension of OpenGL [20]. Dachille et al. [5] used in its entirety, this is done for all corresponding resampling points
a mixed-mode approach between applying a transfer function and (output sample locations).
shading in software, and rendering the resulting texture volume in Our method uses a different evaluation order. Instead of focusing
hardware. Many of the approaches presented earlier have been on a single output sample at any one time, we calculate the contri-
adapted to standard PC graphics hardware later on, e.g., by Rezk- bution of a single input sample to all corresponding output sample
Salama et al. [19]. They are only using two-dimensional textures in- locations (resampling points) first. That is, we distribute the con-
stead of three-dimensional ones, exploiting the multi-texturing and tribution of an input sample to its neighboring output samples, in-
multi-stage rasteriziation capabilites of NVIDIA GeForce graphics stead of the other way around. This “distribution” of contributions
cards. The high-quality filtering approach we are proposing can be is shown in figure 1(b). In this case, the final value of a single out-
used in conjunction with many of the methods previously described. put sample is only available when all corresponding contributions
of input samples have been distributed to it.
The OpenGL [17] imaging subset (introduced in OpenGL 1.2) We evaluate the convolution sum in the order outlined above,
offers convolution capabilities, but these are primarily intended for since the distribution of the contributions of a single relative input
image processing and unfortunately cannot be used for the kind of sample can be done in hardware for all output samples (pixels) si-
reconstruction we desire. Furthermore, these features are usually multaneously. The final result is gradually built up over multiple
not supported natively by the graphics hardware, especially not by rendering passes. The term relative input sample location denotes
consumer PC graphics accelerators. Hopf and Ertl [6] have shown a relative offset of an input sample with respect to a set of output
how to achieve 3D convolution by building upon two-dimensional samples. In the example of a one-dimensional tent filter (like in fig-
convolution in OpenGL. ure 1(a)), there are two relative input sample locations. One could
output sample input samples
input samples
filter kernel
+
+
x
resampling point resampling points
(a) (b)
Figure 1: Gathering vs. distribution of input sample contributions (tent filter): (a) Gathering all contributions to a single output sample (b)
Distributing a single input sample’s contribution (tent filter not shown).
be called the “left-hand neighbor,” the other the “right-hand neigh- In the case of a tent filter (which has width two) this can be seen in
bor.” In the first pass, the contribution of all respective left-hand figure 1(a), by imagining the filter kernel sliding from the second
neighbors is calculated. The second pass then adds the contribution input sample shown to the third input sample.
of all right-hand neighbors. Note that the number of passes depends From now on we will be calling a segment of width one from one
on the filter kernel used, see below. integer location in a filter kernel to the next an integer filter kernel
Thus, the same part of the filter convolution sum is added to the tile, or simply filter tile. Consequently, a one-dimensional tent filter
previous result for each pixel at the same time, yielding the final has two filter tiles, corresponding to the fact that it has width two.
result after all parts have been added up. From this point of view, Looking again at figure 1(a), we can now say that each filter tile
the graph in figure 1(b) depicts both rendering passes that are nec- covers exactly one input sample, regardless of where the kernel is
essary for reconstruction with a one-dimensional tent filter, but only actually positioned.
with respect to the contribution of a single input sample. The con-
tributions distributed simultaneously in a single pass are depicted input samples
in figures 2 and 3, respectively. In the first pass, shown in figure 2,
the contributions of all relative left-hand neighbors are distributed.
Consequently, the second pass, shown in figure 3, distributes the
contributions of all relative right-hand neighbors. Adding up the
distributed contributions of these two passes yields the final result
for all resampling points (i.e., linearly interpolated output values in
this example).
3.2 Accommodating graphics hardware

The rationale for the approach described in the previous section is
that, at the pixel or fragment level, graphics hardware basically op- resampling points
erates by applying simple, identical operations to a lot of pixels
simultaneously–or at least in very rapid succession, which for all Figure 2: Distributing the contributions of all “left-hand” neigh-
conceptual purposes can be viewed as fully parallel operation from bors; tent filter.
the outside. These operations have access to relatively few inputs.
However, in general filtering based on a convolution of an input sig-
nal with a filter kernel, a potentially unbounded number of inputs input samples
needs to be available for the calculation of a single output value.
Therefore, in order to exploit graphics hardware and accommodate
the way it works, our method reorders the evaluation order of the
filter convolution sum. For a more elaborate discussion of the fea-
tures of graphics hardware that can be exploited by our approach,
as well as additional hardware-related issues, see section 4.
Section 3.1 has already adapted the usual evaluation order of the
convolution sum in order to facilitate the use of graphics hardware.
We are now going to describe in detail how this basic principle
can be employed in order to achieve high-order reconstruction in
hardware. The following facts are crucial to the operation of our
method, and together form the reason why it works. If we look at
filtering and the convolution of an input signal with a filter kernel resampling points
at a single output sample location, we can observe that–even for a
kernel of arbitrary width–each segment from one integer location of Figure 3: Distributing the contributions of all “right-hand” neigh-
the kernel to the next is only relevant for exactly one input sample. bors; tent filter.
shifted input samples (texture 0)
If we now imagine all output sample locations (resampling (nearest−neighbor interpolation)
points) between two given input samples simultaneously, instead x
of concentrating on a single output sample, we can further observe filter tile (texture 1)
that:
this area has width one, which is exactly the width of a single
pass 1
filter tile,
all output samples in this area get a non-zero contribution 1 2 3 4
from each filter tile exactly once, mirrored

filter tile 1 +
as many input samples yield a non-zero contribution as there
are filter tiles in the filter kernel, and
pass 2
this number is, as defined above, the width of the filter kernel.
2 3 4 5
Now, instead of imagining the filter kernel being centered at the
“current” output sample location, we note that an identical map- filter tile 2 −
ping of input samples to filter values can be achieved by replicating
a single filter tile mirrored in all dimensions (in the case of more
pass 3
than one dimension) repeatedly over the output sample grid, see 0 1 2 3
figure 4. The scale of this mapping is chosen so that the size of a
single tile corresponds to the width from one input sample to the filter tile 0
−
next. Note that the need for mirroring filter tiles has nothing to do
input sample coordinates
with the fact that a filter kernel is usually mirrored in the filter con-
volution sum (see equation 1). Mirroring is necessary to compen-
sate for the fact that, instead of “sliding” the filter kernel, individual pass 4
filter tiles are positioned at (and replicated to) fixed locations. Fig- 3 4 5 6
ure 4 shows how this works in the case of a one-dimensional cubic
Catmull-Rom spline, which has width four and thus consists of four resampling points filter tile 3
filter tiles. We calculate the contribution of a single specific filter
tile to all output samples in a single pass. The input samples used
in a single pass correspond to a specific relative input sample loca- Figure 4: Catmull-Rom spline of width four used for reconstruction
tion or offset with regard to the output sample locations. That is, in of a one-dimensional function in four passes
one pass the input samples with relative offset zero are used for all
output samples, then the samples with offset one in the next pass,
and so on. The number of passes necessary is equal to the number 3.3 Reconstruction of images and aligned slices
of filter tiles the filter kernel used consists of. Note that the corre-
spondence of passes and filter tiles shown in figure 4 is not simply When enlarging images or reconstructing object-aligned slices
left-to-right, reflecting the order we are using in our actual imple- through volumetric data taken directly from a stack of such slices,
mentation in order to avoid unintentional clamping of intermediate high-order two-dimensional filters have to be used in order to
results in the frame buffer (see section 4.4). achieve high-quality results.
Remember that there are two basic inputs needed by the convolu- The basic algorithm outlined in the previous section for one di-
tion sum, the input samples, and the filter kernel. Since we change mension can easily be applied in two dimensions, exploiting two-
only the order of summation but leave the multiplication untouched, texture multi-texturing hardware in multiple rendering passes. For
we need these two available at the same time. Therefore, we em- each output pixel and pass, our method takes two inputs. Unmod-
ploy multi-texturing with (at least) two textures and retrieve input ified (i.e., unfiltered) image values, and filter kernel values. That
samples from the first texture, and filter kernel values from the sec- is, two 2D textures are used simultaneously. One texture contains
ond texture. Actually, due to the fact that only a single filter tile is the entire source image, and the other texture contains the filter tile
needed during a single rendering pass, all tiles are stored and down- needed in the current pass, which in this case is a two-dimensional
loaded to the graphics hardware as separate textures. The required unit square. Figure 5 shows a two-dimensional filter kernel texture.
replication of tiles over the output sample grid is easily achieved For this image, a bicubic B-spline filter has been used. All sixteen
by configuring the hardware to automatically extend the texture do-

tiles are shown, where in reality only four tiles would actually be
main beyond [0; 1℄ [0; 1℄ by simply repeating the texture1 . In downloaded to the hardware. Due to symmetry, the other twelve
order to fetch input samples in unmodified form, nearest-neighbor tiles can easily be generated on-the-fly by mirroring texture coordi-
interpolation has to be used for the input texture. The textures con- nates. Figure 6 shows one-dimensional cross-sections of additional
taining the filter tiles are sampled using the hardware-native linear filter kernels that we have used.
interpolation. If a given hardware architecture is able to support 2n In addition to using the appropriate filter tile, in each pass an
textures at the same time, the number of passes can be reduced by appropriate offset has to be applied to the texture coordinates of the
n. That is, with two-texture multi-texturing four passes are needed
texture containing the input image. As explained in the previous
for filtering with a cubic kernel in one dimension, whereas with section, each pass corresponds to a specific relative location of an
four-texture multi-texturing only two passes are needed, etc. input sample. Thus, the slice texture coordinates have to be offset
Our approach is not limited to symmetric filter kernels, although and scaled in order to match the point-sampled input image grid
symmetry can be exploited in order to save texture memory for the with the grid of replicated filter tiles. In the case of a cubic filter
filter tile textures. It is also not limited to separable filter kernels kernel for bicubic filtering, sixteen passes need to be performed on
(in two and three dimensions, respectively), as will be shown in the two-texture multi-texturing hardware.
following sections.
As mentioned in the previous section, our approach is neither
1 In OpenGL, via a clamp mode of GL REPEAT limited to symmetric nor separable filters. However, non-separable
with two 3D textures, but of course uses a lot more polygons. In
the multi-texture approach, just a single polygon has to be drawn in
each pass. If the second 3D texture is not available, several thou-
sand polygons have to be drawn instead. If, however, a 2D texture
is available concurrently with the 3D texture, it should be possible
to adapt the approach for rendering oblique slices with trilinear in-
terpolation described by Rezk-Salama et al. [19] to our method. In
this case, instead of clipping the slice against all voxels, clipping
would only need to be done against a set of planes orthogonal to a
single axis. This would reduce the number of polygons needed to
the resolution of the volume along that axis (e.g., 128).
Figure 5: Bicubic B-spline filter kernel; filter tiles separated by
white lines.
3.5 Volume rendering
filter kernels may need additional passes if they contain both pos- Since we are able to reconstruct axis-aligned slices, as well as arbi-
itive and negative areas, see section 4.4. Apart from this, sepa- trarily oriented slices, with our high-quality filtering approach, our
rable as well as non-separable filter kernels are simply stored in technique can also be used for direct volume rendering by using one
two-dimensional filter tile textures. of the two major approaches of DVR exploiting texture mapping
hardware. The first possibility is to render the volume by blending
a stack of object-aligned slices on top of each other. Each one of
3.4 Reconstruction of oblique slices these slices would be reconstructed with the high-quality 2D filter-
When planar slices through 3D volumetric data are allowed to be lo- ing approach outlined in section 3.3. All of these slices would then
cated and oriented arbitrarily, three-dimensional filtering has to be be blended via back-to-front compositing. The second possibility is
performed although the result is still two-dimensional. On graphics to render the volume by blending a stack of viewport-aligned slices
hardware, this is usually done by trilinearly interpolating within a on top of each other, each one of them individually reconstructed
3D texture. Our method can also be applied in this case in order with the 3D filtering approach of section 3.4. Analogously to the
to improve reconstruction quality considerably. The conceptually standard approach using 3D texture mapping hardware and trilinear
straightforward extension of the 2D approach described in the pre- interpolation, our method in this case requires the capability to use
vious section, simultaneously using two 2D textures, achieves the 3D textures, but achieves higher reconstruction quality, since the
equivalent for three-dimensional reconstruction by simultaneously individual slices can be reconstructed with high-order filter kernels.
using two 3D textures. The first 3D texture contains the input vol- Furthermore, our approach can also be used to reconstruct gradi-
ume in its entirety, whereas the second 3D texture contains the cur- ents in high quality, in addition to reconstructing density values.
rent filter tile, which in this case is a three-dimensional unit cube. In This is possible in combination with hardware-accelerated methods
the case of a cubic filter kernel for tricubic filtering, 64 passes need that store gradients in the RGB components of a texture [9, 23].
to be performed on two-texture multi-texturing hardware. If such
a kernel is symmetric we download eight 3D textures for the filter
tiles, generating the remaining 56 without any performance loss by 4 Commodity graphics hardware issues
mirroring texture coordinates. Due to the high memory consump-
tion of 3D textures, it is especially important that the filter kernel This section discusses features and issues of current low-cost 3D
need not be downloaded to the graphics hardware in its entirety if it graphics accelerators that are required or can be exploited by
is symmetric. Texture compression can also be used in order to min- our method. We are specifically targeting widely available con-
imize on-board texture memory consumption. Various compression sumer graphics hardware on the PC platform, like the NVIDIA
schemes are available for this purpose [16]. GeForce [15], and the ATI Radeon [2]. The graphics API we are
Unfortunately, current hardware–especially the ATI Radeon– using in our work is OpenGL [17].
does not support multi-texturing with two 3D textures. Although
the Radeon has three texture units, the most it can do is multi-
texturing with one 3D texture and one 2D texture at the same time. 4.1 Single-pass multi-texturing
Since we need data from two 3D textures available simultaneously, A very important feature of today’s graphics hardware is the capa-
the missing functionality has to be emulated in this case. Concep- bility to texture-map a single polygon with more than one texture
tually, this is quite simple. It incurs a major performance penalty, at the same time2 . Usually, the process of multi-texturing is ac-
however, and thus has only the character of a proof-of-concept. As- cessed by programming several texture stages, the output of each
suming that only a single 3D texture is available, we provide the stage becoming the input to the next. More recent functionality3
second input (after the filter kernel) to the filter convolution sum adds a lot more flexibility to this model, which is especially true for
as color of flat-shaded polygons. This is possible since the input NVIDIA’s extensions for register combiners4 and texture shaders5 ,
volume need only be point-sampled. We generate these polygons the latter only having been introduced with the GeForce 3 [16].
on-the-fly by intersecting the voxel grid of the input volume with The practically de-facto standard on current graphics hardware is
the plane of the slice we want to reconstruct. Each clipped part of two-texture multi-texturing, where two textures can be used simul-
the slice that is entirely contained within a single voxel is rendered taneously. However, recent hardware like the ATI Radeon and the
as a polygon of a single color, the color being manually retrieved GeForce 3 are already able to use three and four textures, respec-
from the input volume and assigned to the polygon. At the same tively. This can be exploited in order to reduce the number of ren-
time, the current filter tile is activated as a 3D texture filtered via dering passes required.
trilinear interpolation. For each output pixel and pass, the color
of the polygon (which would normally be fetched from the second 2 exposed
via ARB multitexture in OpenGL
3D texture that is not available) is multiplied by the kernel value 3 EXT texture env combine, NV texture env combine4 etc.
fetched from the 3D texture and stored into the frame buffer. This 4 NV register combiners, NV register combiners2
approach ultimately yields the exact same result as multi-texturing 5 NV texture shader, NV texture shader2
1.2 1.2
Cubic B-spline Blackman windowed sinc
Catmull-Rom spline Blackman window (width = 4)
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
(a) (b)
Figure 6: Filter kernels we have used: (a) Cubic B-spline and Catmull-Rom spline; (b) Blackman windowed sinc depicting also the window
itself.
4.2 3D texturing 4.4 Frame buffer subtraction

For a long time, 3D textures have only been available on Another problem resulting from the [0; 1℄ range of current frame
workstation-class graphics hardware [1]. Recently, however, they buffers is that negative numbers cannot be represented directly.
have also entered the realm of consumer graphics hardware, e.g., Usually, this limited range permeates the entire graphics pipeline,
with the ATI Radeon. 3D textures have a number of uses, ranging but even on the latest hardware like the NVIDIA GeForce 3, that
from solid texturing to direct volume rendering. Analogously to the offers texture formats with signed values6 , the result of a single
standard bilinear interpolation performed by 2D texture mapping pass is clamped to [0; 1℄ before it is combined with the contents of
hardware, the three-dimensional counterpart uses trilinear interpo- the frame buffer. Applications that need to subtract from the frame
lation within the texture volume in order to reconstruct the texture buffer cannot do this by simply using negative numbers.
at the desired locations. Direct volume rendering is easily able to
In OpenGL, at least explicit subtraction is supported in princi-
exploit 3D texture mapping capabilities by blending a stack of 2D
ple7 . Unfortunately, not all graphics accelerators support this func-
slices on top of each other [23]. These slices are usually aligned
tionality. Even if explicit subtraction is supported by the hardware
with the viewport and therefore have an arbitrary location and ori-
and exported by the API, it is not possible to switch between ad-
entation with respect to the texture domain itself. Although 3D
dition and subtraction on a per-pixel basis, i.e., the behavior that
textures are not yet widely supported in the consumer marketplace,
could naturally be achieved by using signed numbers. Frame buffer
they will very likely become a standard feature in the immediate fu-
subtraction is a part of the imaging subset of OpenGL 1.2. The en-
ture. The method proposed in this paper employs three-dimensional
tire subset is usually only supported in software. Thus, NVIDIA
texture mapping capabilities for reconstructing arbitrarily oriented
GeForce cards, for example, offer this capability8 separately.
slices through volumetric data.
Due to the [0; 1℄ range of the frame buffer, care also has to be
taken with respect to the order of rendering passes, in order to avoid
4.3 Frame buffer precision and range the possibility of intermediate negative results. Usually, all posi-
tive passes are rendered first. Then, the absolute values of negative
Practically all rendering algorithms employing multiple passes are passes can be subtracted from the contents of the frame buffer.
prone to artifacts due to limited frame buffer precision. In addition
to the quite obvious problem of insufficient precision, the integer In our case, the requirement for signed numbers, or at least the
or fixed point nature of most frame buffers, together with the usual possibility of explicit subtraction–which in our method suffices for
[0; 1℄ range, removes a lot of flexibility with regard to storing in- all practical uses–, depends on the filter kernel that one wants to
termediate results. Frame buffers are usually employed to store use. In the simplest case of entirely positive kernels all intermedi-
the final image of a rendered scene. Hence, a frame buffer pre- ate and end results are positive as well. If the filter kernel is partially
cision of eight bits per color channel easily suffices in a standard negative, it can easily be used as long as all zero crossings are at in-
scenario—although even then the limited range can lead to prob- teger positions. If this is not the case, filter tiles containing entries
lems with accurately displaying certain static images. However, of differing sign need to be split into two separate tiles. The first
if the frame buffer is used for holding the result of intermediate tile must contain the positive entries with all negatives set to zero.
computations or for blending many textures on top of each other, Conversely, the second tile contains the absolute values of the neg-
accumulating color information over many passes, a resolution of ative entries, with all positives set to zero. These two split-up tiles
eight bits is often too low. Unfortunately, the frame buffers of all have to be processed in separate rendering passes, increasing the
currently available consumer 3D accelerators are still limited to at overall number of passes. Fortunately, kernels that switch between
most eight bits of precision per color channel. The usual approach negative and positive areas at arbitrary fractional locations are not
to avoiding excessive artifacts due to limited frame buffer precision very common. For instance, all practical interpolatory filters that
is to calculate intermediate results with scaled-up values. This gives are also separable satisfy the criterion that all zero crossings are at
rise to the problem of unintentional clamping, where values exceed integer positions. See figure 6 for the filter kernels we have used as
the [0; 1℄ range of the frame buffer and are therefore clamped to 1:0. an example. All of them change sign at the integers, if at all.
With care, this can be avoided by reading back intermediate results
after a couple of passes and temporarily storing them. After all the 6 SIGNED RGB NV, SIGNED APLHA NV, etc.
intermediate results have been calculated, they can be combined to 7 via glBlendEquationEXT(GL FUNC REVERSE SUBTRACT)
the final image in the frame buffer, scaling them down on-the-fly. 8 via EXT blend subtract
(a) (b) (c) (d)
Figure 7: Slice of MR head data set, reconstructed using: (a) Bilinear interpolation; (b) Blackman windowed sinc; (c) Bicubic Catmull-Rom
spline; (d) Bicubic B-spline.
Bicubic B-spline Bicubic Catmull-Rom Tricubic B-spline

kernel width 4 4 4
slice orientation aligned aligned oblique
number of passes 16 16 64
slice resolution 256x256 256x256 128x128
render resolution 600x600 600x600 500x500
NVIDIA GeForce 2 [fps] 10.88 10.88 -
ATI Radeon [fps] 8.91 8.91 1.24
Table 1: Frame rates for different scenarios. Note that the tricubic case uses the fall-back algorithm.
5 Results screws. The slice is not aligned with the slice stack, but instead
has an arbitrary location and orientation with respect to the texture
In our work, we are focusing on widely available low-cost PC domain. Since the Radeon does not support multi-texturing with
graphics hardware. Currently, the two premier graphics accelera- two 3D textures, we have used the fall-back approach for this case
tors in this field are the NVIDIA GeForce 2 and the ATI Radeon. outlined in section 3.4.
We have implemented and tested our method on a GeForce 2 with Table 1 shows some timing results of our test implementation.
32MB RAM, and a Radeon with 64MB RAM. The graphics API We have used a Pentium III, 733 MHz, 512MB of RAM, with the
we are using is OpenGL. Although the GeForce 3 is already on the two graphics cards described above (GeForce 2 and Radeon). Note
horizon, it is not widely available yet. that the timings do not depend on the shape of the filter kernel per
The GeForce 2 supports multi-texturing with two 2D textures at se, only on its width.
the same time and offers the capability to subtract from the contents
of the frame buffer. It does not support 3D textures, however. The
Radeon supports up to three simultaneous 2D textures, as well as 6 Conclusions and future work
one 3D texture and one 2D texture at the same time. Unfortunately,
it does not allow to subtract from the frame buffer. These properties We have presented a general approach for high-quality filtering that
of the graphics hardware we have used lead to a couple of restric- is able to exploit hardware acceleration for reconstruction with ar-
tions with respect to what parts of our approach can be implemented bitrary filter kernels. Conceptually, the method is not constrained to
on what platform. a certain dimensionality of the data, or the shape of the filter kernel.
As filter kernels we have used a cubic B-spline of width four and In practice, limiting factors are the number of rendering passes and
a cubic Catmull-Rom spline, also of width four. In addition to these the precision of the frame buffer.
we have also tested a windowed sinc filter kernel. Our method is quite general and can be used for a lot of appli-
We have reconstructed object-aligned planar slices on both the cations. With regard to volume visualization, the reconstruction of
GeForce 2, as well as the Radeon. Figure 7(a) shows a part from object-aligned, as well as oblique slices through volumetric data is
a 256x256 slice from a human head MR data set. The image was especially interesting. Reconstruction of slices can also be used for
rendered in a resolution of 600x600. In this figure, reconstruction direct volume rendering.
was done with the hardware-native bilinear interpolation. Thus, We are exploiting commodity graphics hardware, multi-
interpolation artifacts are clearly visible when viewed on screen. texturing, and multiple rendering passes. The number of passes
The same slice with the same conditions but different reconstruc- is a major factor determining the resulting performance. Therefore,
tion kernels is shown in figures 7(b) (Blackman windowed sinc), future hardware that supports many textures at the same time–not
7(c) (cubic Catmull-Rom), and 7(d) (cubic B-spline). The Black- only 2D textures, but also 3D textures–will make the application of
man windowed sinc and Catmull-Rom kernels can only be used on our method more feasible for real-time use.
the GeForce, due to the fact that the Radeon does not support frame Since hardware that is able to combine multi-texturing with 3D
buffer subtraction. Figure 9 (colorplate) shows slices reconstructed textures is not yet available to us, we are still using a “simula-
with different filters together with magnified regions to highlight tion” of our algorithm for reconstructing oblique slices that is much
the differences. slower. Thus, we are really looking forward to the immediate fu-
Since the GeForce 2 does not support 3D textures, we have tested ture where such hardware will be available. Still, we would also
our approach for reconstructing arbitrarily oriented slices only on like to adapt the approach outlined by Rezk-Salama et al. [19] in or-
the Radeon. Figure 8 shows a slice from a vertebrae data set with der to considerably speed up the intermediate solution. We would
(a) (b) (c) (d)
Figure 8: Oblique slice through vertebrae data set with screws: (a) Trilinear filter; (b) Part of (a) magnified; (c) Tricubic B-spline filter; (d)
Part of (c) magnified.
also like to experiment with additional filter kernels. Please see [10] M. Meißner, J. Huang, D. Bartz, K. Müller, and R. Crawfis.
http://www.VRVis.at/vis/research/hq-hw-reco/ for high-resolution A practical evaluation of four popular volume rendering al-
images and the most up-to-date information on this ongoing work. gorithms. In Procceedings of IEEE Symposium on Volume
Visualization, pages 81–90, 2000.
7 Acknowledgments [11] D. P. Mitchell and A. N. Netravali. Reconstruction filters in

computer graphics. In Proceedings of SIGGRAPH ’88, pages
221–228, 1988.
Parts of this work have been carried out as part of the basic research
on visualization at the VRVis Research Center in Vienna, Austria [12] T. Möller, R. Machiraju, K. Müller, and R. Yagel. Classifica-
(http://www.VRVis.at/vis/), which is funded in part by an Austrian tion and local error estimation of interpolation and derivative
research program called Kplus. filters for volume rendering. In Proceedings of IEEE Sympo-
sium on Volume Visualization, pages 71–78, 1996.
References [13] T. Möller, R. Machiraju, K. Müller, and R. Yagel. Evalua-

tion and Design of Filters Using a Taylor Series Expansion.
[1] K. Akeley. RealityEngine graphics. In Proceedings of IEEE Transactions on Visualization and Computer Graphics,
SIGGRAPH ’93, pages 109–116, 1993. 3(2):184–199, 1997.
[14] T. Möller, K. Müller, Y. Kurzion, Raghu Machiraju, and Roni
[2] ATI web page. http://www.ati.com/.
Yagel. Design of accurate and smooth filters for function and
derivative reconstruction. In Proceedings of IEEE Symposium
[3] B. Cabral, N. Cam, and J. Foran. Accelerated volume ren-
on Volume Visualization, pages 143–151, 1998.
dering and tomographic reconstruction using texture mapping
hardware. In Proceedings of IEEE Symposium on Volume Vi- [15] NVIDIA web page. http://www.nvidia.com/.
sualization, pages 91–98, 1994.
[16] NVIDIA OpenGL extension specifications document.
[4] T. J. Cullip and U. Neumann. Accelerating volume recon- http://www.nvidia.com/developer.
struction with 3D texture mapping hardware. Technical Re-
port TR93-027, Department of Computer Science, University [17] OpenGL web page. http://www.opengl.org/.
of North Carolina, Chapel Hill, 1993. [18] A. V. Oppenheim and R. W. Schafer. Digital Signal Process-
ing. Prentice Hall, Englewood Cliffs, 1975.
[5] F. Dachille, K. Kreeger, B. Chen, I. Bittner, and A. Kaufman.
High-quality volume rendering using texture mapping hard- [19] C. Rezk-Salama, K. Engel, M. Bauer, G. Greiner, and T. Ertl.
ware. In Proceedings of Eurographics/SIGGRAPH Graphics Interactive volume rendering on standard PC graphics hard-
Hardware Workshop 1998, 1998. ware using multi-textures and multi-stage rasterization. In
Proceedings of Eurographics/SIGGRAPH Graphics Hard-
[6] M. Hopf and T. Ertl. Accelerating 3D convolution using ware Workshop 2000, 2000.
graphics hardware. In Proceedings of IEEE Visualization ’99,
pages 471–474, 1999. [20] Silicon Graphics, Inc. Pixel textures extension. Specification
available from http://www.opengl.org, 1996.
[7] R. G. Keys. Cubic convolution interpolation for digital im-
[21] T. Theußl, H. Hauser, and M. E. Gröller. Mastering windows:
age processing. IEEE Trans. Acoustics, Speech, and Signal
Improving reconstruction. In Proceedings of IEEE Sympo-
Processing, ASSP-29(6):1153–1160, December 1981.
sium on Volume Visualization, pages 101–108, 2000.
[8] S. R. Marschner and R. J. Lobb. An evaluation of reconstruc- [22] K. Turkowski. Filters for common resampling tasks. In An-
tion filters for volume rendering. In Proceedings of IEEE Vi- drew S. Glassner, editor, Graphics Gems I, pages 147–165.
sualization ’94, pages 100–107, 1994. Academic Press, 1990.
[9] M. Meißner, U. Hoffmann, and W. Straßer. Enabling clas- [23] R. Westermann and T. Ertl. Efficiently using graphics hard-
sification and shading for 3D texture mapping based volume ware in volume rendering applications. In Proceedings of
rendering. In Proceedings of IEEE Visualization ’99, pages SIGGRAPH ’98, pages 169–178, 1998.
207–214, 1999.
(a)
(b)
(c)
(d)
Figure 9: Slice from MR data set, reconstructed using different filters (right images show shaded portion of left image enlarged): (a) Bilinear
filter; (b) Blackman windowed sinc; (c) Bicubic Catmull-Rom spline; (d) Bicubic B-spline.

Hardware Accelerated High Quality Recons

Uploaded by

Copyright:

Available Formats

You might also like

Hardware Accelerated High Quality Recons

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hardware Accelerated High Quality Recons

Uploaded by

Copyright:

Available Formats

Hardware-Accelerated High-Quality Reconstruction of Volumetric Data

VRVis Research Center y Institute of Computer Graphics and Algorithms

Abstract used in practice. Using such high-order reconstruction filters in

resampling point resampling points

3.2 Accommodating graphics hardware

from each filter tile exactly once, mirrored

4.2 3D texturing 4.4 Frame buffer subtraction

Bicubic B-spline Bicubic Catmull-Rom Tricubic B-spline

7 Acknowledgments [11] D. P. Mitchell and A. N. Netravali. Reconstruction filters in

References [13] T. Möller, R. Machiraju, K. Müller, and R. Yagel. Evalua-

You might also like