You are on page 1of 13

Experimental study of the influence of refraction

on underwater three-dimensional reconstruction


using the SVP camera model

Lai Kang,1,3,* Lingda Wu,1,2 and Yee-Hong Yang3


1
College of Information System and Management, National University of Defense Technology, Changsha 410073, China
2
The Key Laboratory, the Academy of Equipment Command and Technology, Beijing 101400, China
3
Department of Computing Science, University of Alberta, Edmonton, Alberta T6G 2E8, Canada
*Corresponding author: lkang.vr@gmail.com

Received 6 July 2012; revised 26 September 2012; accepted 26 September 2012;


posted 27 September 2012 (Doc. ID 172006); published 26 October 2012

In an underwater imaging system, a perspective camera is often placed outside a tank or in waterproof
housing with a flat glass window. The refraction of light occurs when a light ray passes through the
water–glass and air–glass interface, rendering the conventional multiple view geometry based on the
single viewpoint (SVP) camera model invalid. While most recent underwater vision studies mainly focus
on the challenging topic of calibrating such systems, no previous work has systematically studied the
influence of refraction on underwater three-dimensional (3D) reconstruction. This paper demonstrates
the possibility of using the SVP camera model in underwater 3D reconstruction through theoretical ana-
lysis of refractive distortion and simulations. Then, the performance of the SVP camera model in multi-
view underwater 3D reconstruction is quantitatively evaluated. The experimental results reveal a rather
surprising and useful yet overlooked fact that the SVP camera model with radial distortion correction
and focal length adjustment can compensate for refraction and achieve high accuracy in multiview under-
water 3D reconstruction (within 0.7 mm for an object of dimension 200 mm) compared with the results of
land-based systems. Such an observation justifies the use of the SVP camera model in underwater ap-
plication for reconstructing reliable 3D scenes. Our results can be used to guide the selection of system
parameters in the design of an underwater 3D imaging setup. © 2012 Optical Society of America
OCIS codes: 330.1400, 010.7340.

1. Introduction by the refraction of light. For underwater imaging,


Image-based three-dimensional (3D) reconstruction the quality of image degrades due to effects of light
has been widely studied during the last two decades attenuation and scattering. In order to improve the
[1]. For land-based systems, remarkable success has quality of underwater images, several algorithms
been achieved [2] Accurate 3D reconstruction from have been proposed in the literature [5,6]. The other
images captured by underwater cameras, however, major challenge in underwater vision is caused by
has not attracted much attention from the computer the refraction of light. In particular, a perspective
vision community until recently [3,4]. Image-based camera is often placed outside a water-filled tank or
underwater 3D reconstruction faces challenges posed in a waterproof housing with a flat glass window. In
by underwater imaging and image distortion caused such an imaging system, the refraction of light occurs
when a light ray passes through the water–glass and
air–glass interface, rendering the conventional pin-
1559-128X/12/317591-13$15.00/0 hole camera model invalid in theory. The refractive
© 2012 Optical Society of America distortion is known to be highly nonlinear and

1 November 2012 / Vol. 51, No. 31 / APPLIED OPTICS 7591


depends not only on the distance from the optical [7,14,15]. Besides image-based methods, an active
axis but also on the depth of a scene point [7]. scanning system has also been developed for under-
Recently, there has been a growing interest in water 3D reconstruction [16].
using a physically correct refractive camera model
in underwater vision. Treibitz et al. [7] study the flat 2. Motivations and Contributions
refractive geometry and show that a camera with a This paper is motivated by the following three obser-
flat refractive interface inherently has a non-single vations. First, although it is well known that refrac-
viewpoint (SVP) property. Chari and Sturm [3] ana- tive distortion cannot be modeled by the image space
lyze theoretically the underlying multiview relation- distortion model (e.g., lens radial distortion) in the-
ships between cameras of an underwater vision ory [7], the influence of refraction on the accuracy of
system. The authors show linear relations of such 3D reconstruction has not been quantitatively eval-
a system with the use of lifted coordinates [8] and de- uated in any previous work. Second, although flat re-
monstrate the existence of geometric entities such as fractive geometry has drawn increasing attention
the 12 × 12 refractive fundamental matrix. Unfortu- from the computer-vision community in recent years,
nately, no practical applications of these theoretical algorithms explicitly modeling the refraction effect
results are given in [3]. Chang and Chen [4] study a [4,7,10,12] still suffer issues such as efficiency and
similar configuration involving multiple images of an applicability, which limit their applications in prac-
underwater scene through a single interface. Refrac- tice. Specifically, existing algorithms are either too
tive distortion is explicitly modeled as a function of time-consuming [10] or require an object with known
depth. In their work, an additional piece of hardware dimensions [7,11,12,17] or a priori information of
called the inertial measurement unit [9] is required camera motion [4] for calibration. In comparison,
to provide the roll and pitch angles of the camera. the SVP camera model has been well studied [1,2]
Also, the normal of the refractive interface is as- and does not suffer any of the above issues. Thus un-
sumed to be known. Based on this additional infor- derwater 3D reconstruction based on the SVP cam-
mation, the authors derive a linear solution to the era model would be more practical if the refraction
relative pose problem and a closed-form solution to effect could be compensated. Third, essential algo-
the absolute pose problem. Sedlazeck and Koch [10] rithms, such as fundamental matrix estimation, tri-
study the calibration of housing parameters for angulation [1], multiview stereo [18], etc., involved in
underwater stereo camera rigs. Rather than mini- underwater 3D reconstruction, have not been well es-
mizing the reprojection error in image space, the re- tablished in the literature. The task of reconstructing
projection error on the outer interface plane is an accurate dense 3D model from underwater images
minimized by deriving the virtual perspective projec- based on the refractive camera model remains a
tion for each 3D point [11]. One issue of this method, challenging one.
as reported in [10], is that the optimization process is Aiming at studying the performance of the SVP
time consuming (on the order of 3 h). More recently, camera model in underwater 3D reconstruction, this
Agrawal and Ramalingam [12] show that the under- paper has the following original contributions:
lying geometry of rays in the refractive camera model • We provide theoretical analysis of refractive
corresponds to an axial camera [13], based on which distortion in images and make an important observa-
a general theory for calibrating such systems is tion that refractive distortion can be well approxi-
proposed. mated using the SVP camera model after applying
In order to analyze the distortions in non-SVP a similarity transformation to the 3D scene. This ob-
imaging systems, Swaminathan et al. [14] introduce servation justifies the use of the SVP camera model
a taxonomy of distortions based on the geometry of to obtain accurate metric underwater 3D reconstruc-
imaging systems and derive a metric to quantify tion up to an arbitrary similarity transformation.
caustic distortions. The authors also present an algo- • We quantitatively evaluate the performance of
rithm to compute minimally distorted images using the SVP camera model in underwater 3D reconstruc-
simple priors of scene structure. Treibitz et al. [7] tion with respect to several important parameters in
analyze the refractive distortion and show the error an underwater imaging system using both syntheti-
in SVP approximation of a refractive camera using cally generated images and real underwater images.
simulations. The analysis of refractive distortion in The results of underwater 3D reconstruction are
this paper is similar to that of [7] but more extensive. compared with the ground-truth 3D model obtained
Specifically, in addition to the simulations performed by a land-based system, using a new sparse-to-dense
in [7], we also analyze the SVP approximation when (S2D) point clouds alignment approach particularly
a similarity transformation is applied to the 3D suitable for image-based 3D reconstruction.
scene, which is more reasonable in the context of me- • The results reveal a rather surprising and use-
tric 3D reconstruction. Another interesting related ful yet overlooked fact that the SVP camera model
work shows that spatiotemporal caustic patterns can with radial distortion correction and focal length ad-
be exploited for establishing correspondences be- justment can compensate for refraction and achieve
tween two underwater images [15]. It is noteworthy high accuracy in underwater 3D reconstruction
that the influence of refractive distortion on the (within 0.7 mm for an object of dimension 200 mm)
accuracy of 3D reconstruction is not studied in compared with the results of land-based systems.

7592 APPLIED OPTICS / Vol. 51, No. 31 / 1 November 2012


Such an observation justifies the use of the SVP
camera model in underwater applications for recon-
structing a reliable 3D scene and also can be used to
guide the selection of system parameters in the de-
sign of underwater 3D imaging systems.
The remainder of this paper is organized as fol-
lows. In Section 3, we analyze the refractive distor-
tion and its SVP approximation in underwater
imaging. In Section 4, we present details on the
design of experiments. The qualitative and quantita- Fig. 1. (Color online) (a) An example showing the refraction ef-
tive experimental results are presented in Section 5. fect: an image of a checkerboard submerged in a water-filled tank
captured by a camera placed in front of a flat glass interface of the
Finally, we conclude in Section 6.
tank. Straight lines are bent in the image due to the refraction of
light. (b) 2D illustration of a general underwater imaging system: a
3. Refractive Distortion and its Approximation camera placed in a waterproof housing with a flat glass interface
observing a straight line. See Table 1 for detailed description of
A. Characterizing Refractive Distortion notations.
In the presence of a refractive interface, a light ray no
longer travels in a straight line, resulting in distor- image point is back projected. The light ray is re-
tion in the observed images captured using a typical fracted twice (one at the air–glass interface and
camera [an example is shown in Fig. 1(a)]. As pointed the other at the water–glass interface) before inter-
out by Treibitz et al. [7], the distortion induced by re- secting with the observed scene at a scene point.
fraction depends on the position of the object and Therefore, the derivation of the refractive distortion
cannot be modeled as lens radial distortion, which at an observed image point xa with image coordinate
is an image space model. In this subsection, we char- ua involves determining three intersection points
acterize the refractive distortion with respect to the xg  rg ; zg ⊤ , xw  rw ; zw ⊤ , and xo  ro ; zo ⊤ , and
location of the observed image point and parameters
the projection of xo onto the image plane.
of an underwater imaging system. Without loss
From Fig. 1(b), it is easy to see that the light ray
of generality, the two-dimensional (2D) case is con-
intersects with the air–glass interface at xg with
sidered. A 2D illustration of a general underwater
imaging system and the description of the corre- (  
ua
sponding notations are shown in Fig. 1(b) and rg  da tan θc  arctan
Table 1, respectively. In this example, the scene is
f : 1
zg  d a
chosen as a straight line for simplicity. Since the
parameter vector Θn  na ; ng ; nw ⊤ consists of
known constant refractive indices of air, glass, and Given xg , the intersection xw between the light ray
water, the underwater imaging configuration is fully and the water–glass interface can be determined ac-
specified by parameters Θc  f ; da ; dg ; θc ⊤ and cording to Snell’s law [19]:
Θo  dw ; θo ⊤ . In particular, the scene structure of
line L is determined by the distance da  dg  dw na sin θa  ng sin θg ; (2)
and the angle θo [see Fig. 1(b)]. In order to character-
ize the refractive distortion in image space, a light where the trigonometric functions can be calculated
ray goes through the camera center and the observed as

Table 1. Notations Used in the Underwater Imaging System Shown in Fig. 1(b)

Notation Description
c Coordinates of the center of perspective camera: c  0; 0⊤
f Focal length of the perspective camera and the z-axis
L Scene structure of a straight line
da Distance between the camera and the inner glass interface
dg Thickness of the glass interface
dw Distance between the outer glass interface and the intersection between the z-axis and L
θc Angle between the optical axis of the perspective camera and the normal of the glass interface
θo Angle between the glass interface and L
θw Angle between the z-axis and the observed light ray in water
θa Angle between the z-axis and the observed light ray in air
xo A scene point lies on L: xo  ro ; zo ⊤
xw Intersection between the observed light ray and the outer glass interface: xw  rw ; zw ⊤
xg Intersection between the observed light ray and the inner glass interface: xg  rg ; zg ⊤
xa Projection of the scene point xo (with image coordinate ua )
xp Projection of the scene point xo (with image coordinate up ) under the pinhole camera model

1 November 2012 / Vol. 51, No. 31 / APPLIED OPTICS 7593


8 rg
< sin θa  p According to Definition 1, the difference between
2 2 rg da
rw −rg : 3 ua and up results in refractive distortion at ua ,
: sin θg  p
2 2
 which is
rw −rg  dg

E1 ua ; Θc ; Θo   jua − up j: (10)


Combining Eqs. (2) and (3), we get the coordinates
of xw
Given an image point with coordinate ua and
( n a rg d g parameter vectors Θc and Θo , E1 ua ; Θc ; Θo  can be
rw  rg  p computed by sequentially calculating rg according
2 2 2 2 2 2
da ng −na rg ng rg : 4
zw  d a  d g to Eq. (1), rw according to Eq. (4), and scene point
xo according to Eq. (6), and then xo can be projected
onto the image plane according to Eq. (9) to obtain up .
Similar to the way of calculating rw , ro can be re- Figure 2(a) shows the refractive distortion in an
presented by a function of rw by solving the following image under five different system configurations.
equations: From this figure, we can see that with θc  0, differ-
ent system parameters da , dg , dw and θo preserve the
 monotonicity of distortion along the radial direction.
ng sin θg  nw sin θw The refractive distortion under a more general con-
ro −rw
sin θw  p ; 5 figuration, in which θc is not equal to zero, is shown
2
ro −rw  dw −ro sin θo 2
in Fig. 2(b) (denoted as original E1 ). Under this con-
figuration, the distortion becomes much more com-
where sin θg is the same as in Eq. (3). Then, we get plex than the previous one.
the coordinates of scene point xo : Remark 1. Under general configuration, the re-
fractive distortion in an image is neither symmetric

r o  Ao with respect to the image center nor monotonic along
; 6 the radial direction.
zo  da  dg  dw − Ao sin θo
B. Approximation Based on the SVP Camera Model
where Ao is the solution of ro from Eq. (5). Based on Compared with refractive distortion, lens radial dis-
the solution found by the Matlab symbolic toolbox, tortion in the SVP camera model is much simpler be-
we have cause it is an image space model. The observed image

rw B2  ng dw − rw sin θo rw − rg B − dw n2g rw − rg 2 sin θo


ro  ; (7)
B2 − n2g rw − rg 2 sin2 θo

where point ua and its undistorted image point u0p are


q related by [1]:
B d2g n2w  n2w − n2g rg − rw 2 : (8)   2  0 4 
0
u0p up
ua  p u0p   u0p 1  k1  k2 ; (11)
f f
Under the pinhole camera model, the image point
of xo is given by where k1 and k2 are two distortion coefficients and
p0 · represents the function that explicitly relates
    u0p and the observed image point ua . Given ua and
ro
up  pxo   f tan arctan − θc ; (9) the intrinsic camera parameter vector Θs f ;k1 ;k2 ⊤,
zo u0p is obtained by solving the following equation:
where p· represents the function explicitly relating
xo and its image point up . k2 u05 2 03 4 0 4
p  k1 f up  f up − f ua  0: (12)
Definition 1. In the refractive camera model, the
refractive distortion at a given image point is the dif- Thus, the lens radial distortion at ua can be char-
ference between the projection of the corresponding acterized as
scene point under the pinhole camera model and the
given image point. E0 ua ; Θs   jua − u0p j: (13)

7594 APPLIED OPTICS / Vol. 51, No. 31 / 1 November 2012


Fig. 2. (Color online) (a) Refractive distortion versus image coordinate under different system configurations. (b) Refractive distortion
(denoted as original E1 ) versus image coordinate under a general configuration (da  100, dg  10, dw  1000, θc  10°, θc  30°) and
approximation to the refractive refraction using two methods based on the SVP camera model. For all imaging setups used in this figure,
the focal length f is 800 pixels and the unit of da , dg , and dw is pixels. See text for details.

To this end, both the refractive distortion and the where


lens radial distortion are characterized with respect
to the location of the observed image point and some E2 uia ; Θc ; Θo   juia − pT 2 xio j (16)
system parameters. The problem of modeling re-
fraction based on the SVP model corresponds to ap- is a transformed version of Eq. (10). Here, the func-
proximating E1 ua ; Θc ; Θo  defined in Eq. (10) by tion p· is defined by Eq. (9) and T 2 is a similarity
E0 ua ; Θs  defined by Eq. (13). In the following, we transformation, which converts a scene point xo to
analyze three strategies to find such an approxima-
tion. uia 1 ≤ i ≤ M denotes the coordinates of the ith T 2 xo   sR2 xo  t2 ; (17)
image point, xio the ith scene point.
where s is a scale factor, R2 is a 2 × 2 rotation matrix,
(1) A straightforward way of finding the approxi- and t2 is a 2 × 1 translation vector. Since there exists
mation is to directly search for the best intrinsic cam- an ambiguity of an arbitrary similarity transforma-
era parameter Θs by minimizing the following tion in metric 3D reconstruction, the result of
objective function: this strategy shows the smallest error of SVP approx-
imation, assuming that the 3D reconstruction is
X accurate. Also, as an example, the result of this strat-
arg min jE1 uia ; Θc ; Θo  − E0 uia ; Θs j2 : (14) egy under the general configuration is shown in
fΘs g 1≤i≤M Fig. 2(b), where transformed E2 corresponds to the
optimized refractive distortion and radial (tran) E0
the optimized radial distortion. It is easy to see that
In this paper, we refer to this strategy as the naïve the difference between the curves of transformed E2
method, in which the scene structure is not allowed to and radial (tran) E0 is very small, which means that
change and only the intrinsic camera parameter Θs is this strategy could be a good approximation without
optimized. The result of this strategy under the gen- introducing distortion in the reconstructed 3D scene.
eral configuration is shown in Fig. 2(b), where naïve (3) In practice, since both the system parameters
E1 corresponds to the refractive distortion and Radial and the ground-truth scene structure are not avail-
(naïve) E0 the radial distortion using the optimized able, we cannot calculate the refractive distortion
parameter Θs. Obviously, the SVP approximation ob- based on Eq. (10) or apply a similarity transforma-
tained by this strategy is poor due to the large differ- tion to the scene. The success in land-based 3D recon-
ence between naïve E1 and radial (naïve) E0 . struction can be attributed to the deployment of
(2) Recall that metric 3D reconstruction is up to bundle adjustment [2,20], in which the scene points
an arbitrary similarity transformation [1], so we are adjusted separately rather than as a group under
can apply a similarity transformation to the scene a similarity transformation. The intrinsic camera
structure before calculating the SVP approximation. parameters and scene structure are simultaneously
This strategy calculates the smallest error in SVP optimized by minimizing the following objective
approximation under the assumption of perfect function:
metric 3D reconstruction. It leads to an alternative X
approximation by minimizing arg min jE3 uia ; xio  − E0 uia ; Θs j2 ; (18)
fΘs ;yo g 1≤i≤M
X
arg min jE2 uia ; Θc ; Θo  − E0 uia ; Θs j2 ; (15) where yo  fx1o ; x2o ; …; xM
o g is the set of M scene points
fΘs ;T 2 g 1≤i≤M and

1 November 2012 / Vol. 51, No. 31 / APPLIED OPTICS 7595


E3 uia ; xio   juia − pxio j: (19) sampled on a ring) and the setup in Fig. 3(b) is used
to capture underwater images. A detailed structure
Since the scene is not constrained by a similarity of each camera in the setup shown in Fig. 3(b) is
transformation as in the second strategy, even better presented in Fig. 3(c). The focal length of the syn-
approximation in terms of the objective function va- thetic camera is 600 pixels and the FOV is about
lue can be obtained using this strategy. The result of 56°. The resolution of the synthetic image is 640 ×
this strategy is not shown in Fig. 2 because the 480 pixels. In the simulation, the refractive indices
curves almost overlap the results of the second ap- of air, water, and glass are assumed to be Θn 
proximation strategy. 1; 1.3333; 1.5333⊤ . The ratio of the object size to
Remark 2. Refractive distortion can be well ap- its distance from the camera is about 0.56, and the
proximated by lens radial distortion correction and object occupies a large portion of the FOV of the setup
focal length adjustment under the SVP camera mod- (see captured underwater images in Fig. 5).
el if a proper similarity transformation is applied to For capturing real ground-truth images and under-
the scene structure. This observation justifies the use water images, we use two different setups. The first
of the SVP camera model in metric 3D reconstruc- setup consists of a camera and a turntable placed in
tion. In practice, the approximation can be even front of the camera [see Fig. 3(d)]. This setup is used
better because the transformation is not constrained to capture the ground-truth images of 3D objects. In
to similarity transformation. Although distortion the second setup, we employ a large water-filled tank
would be introduced in the 3D reconstruction if with multiple flat glass interfaces in our lab. Two
there is no constraint of the similarity transforma- imaging methods under this setup are considered:
tion, such distortion is negligible when multiple first, a single camera is placed in front of the largest
images are used, as will be demonstrated in our interface of the water tank [see Fig. 3(e)], and multi-
experiments. ple views of an object are captured by turning the
object; second, an eight-camera array is deployed
4. Experimental Design around the water tank as shown in Fig. 3(f) with each
In this paper, both synthetically rendered images camera placed in front of a separate flat glass inter-
and real images are used for experiments, where ren- face. All cameras employed in the real imaging set-
dered images are captured by virtual imaging setups ups are the commonly used Point Grey Research
and real images are captured by real imaging setups Flea2 cameras with a focal length of 1800 pixels
in our laboratory. For the virtual imaging setups, the and a FOV of 32°. The resolution of the real image
system parameters da , dg , dw , and f are all measured is 1032 × 776 pixels. The physical dimension of a pix-
in pixels. For the real imaging setup, da , dg , and dw el is 4.65 μm. Note that while the setup for capturing
are measured in millimeters and the focal length f is real underwater images is slightly different from the
measured in pixels. For non-SVP systems, if the size synthetic one, the underlying refractive geometry is
of the object relative to its distance from the camera exactly the same except that the Grey Research
is too small or if the dimensions in the setup to the Flea2 camera has a smaller FOV than the virtual
focal length are small, then it can be approximated cameras in the synthetic imaging setups. For real ex-
by SVP, which basically means that the locus of caus- periments, the ratio of the object size to its distance
tics is small [14]. Also, if the field of view (FOV) of a from the camera ranges from 0.23 to 0.41 and the ob-
non-SVP system is too small, it again can be approxi- jects occupy nearly the full FOV of the setup (see cap-
mated using SVP. In order to evaluate the perfor- tured underwater images in Fig. 9). Although the
mance of SVP-based underwater 3D reconstruction FOV of real cameras is smaller than that used in syn-
under general imaging setups, we have avoided the thetic experiments, the distortion in 3D reconstruc-
above cases in our experimental design. Specifically, tion caused by the refractive effect is significant (see
both the size of objects and the camera parameters the last row of Fig. 9).
are selected to ensure that the refractive effect is sig-
nificant and the objects occupy a large portion of
FOV. More details regarding this issue are given
in the following subsections.
A. Imaging Setups
In order to generate realistic underwater images, we
use POVRay [21] (a publicly available ray tracer) to
create a synthetic camera placed in waterproof hous-
ing with a flat glass window. The synthetic 3D scene
is a texture-mapped bunny [22] sitting on a square
base with rounded corners. Two images of the syn-
thetic imaging setup when the pool is empty and
when it is filled with water are shown in Figs. 3(a)
and 3(b), respectively. The setup in Fig. 3(a) is used Fig. 3. (Color online) Image acquisition setups used in our study.
to capture the ground-truth images (32 images See text for details.

7596 APPLIED OPTICS / Vol. 51, No. 31 / 1 November 2012


Table 2. Synthetic Datasets and System Configurations (d a , d g are measured in pixels)

Datasets Nv da dg θc
SC1 3 ≤ N v ≤ 32 1000 100 0°
SC2 32 1000 ≤ da ≤ 10; 000 100 0°
SC3 32 1000 100 ≤ dg ≤ 5000 0°
SC4 32 1000 1 0° ≤ θc ≤ 30°

B. Datasets different reconstruction strategies, namely NRF,


Using the synthetic datasets allows us to systemati- RDist, FAdj, and RDist  FAdj. NRF refers to cam-
cally study the influence of system parameters of un- era calibration without radial distortion correction
derwater imaging systems, including the number of or focal length adjustment, RDist refers to camera
images N v, the distance between the camera and its calibration with radial distortion but without focal
refractive interface da , the thickness of refractive in- length adjustment, FAdj refers to camera calibration
terface dg , and the angle between the optical axis of with focal length adjustment but without radial dis-
the camera and the normal of the glass interface θc . tortion correction, and RDist  FAdj refers to using a
Note that changing da and dg also causes dw to combination of radial distortion and focal length ad-
change because da  dg  dw is fixed at 45,000 justment in camera calibration. The objective func-
pixels. For all synthetic images, the angle between tions used in the above four methods are variants
the normal of two adjacent housing interfaces is fixed of Eq. (18) by adjusting different numbers of ele-
at 11.25°. We use the notation cf N v ∕da ∕dg ∕θc  ments of Θs. Given the calibration data, the multiple-
to specify the dataset captured under a specific view stereo algorithm proposed by Furukawa and
synthetic system configuration. For example, Ponce [23] is performed to reconstruct dense 3D point
cf 16∕1000∕10∕10° refers to the set of 16 images cloud in the second step. For each underwater data-
captured with da  1000, dg  10 and θc  10° set, we evaluate the resulting dense 3D point cloud
(length is measured in pixels). The four sets of syn- by comparing it with the ground-truth point cloud re-
thetic datasets are summarized in Table 2. constructed from images captured in the air. For the
For the experiments with real images, three aqua- reconstruction of the ground-truth model, the strat-
rium decorations are used, namely Shell, Well, and egy RDist  FAdj is used.
Arch (see Fig. 4). The four real datasets used in our Unlike existing evaluation benchmarks for 3D re-
experiments are listed in Table 3. Specifically, the da- construction in land-based systems [18,24], where
tasets shell_Ring, well_Ring and arch_Ring are cap- the cameras are assumed to be calibrated, and thus
tured by the underwater imaging setup shown in the reconstructed model and the ground-truth model
Fig. 3(e) and the images are sampled on a circular are already located in the same coordinate system,
ring. The dataset arch_Sparse consists of eight comparing 3D models obtained in our experiments
images captured by the underwater imaging setup is more difficult due to the ambiguity of metric 3D
shown in Fig. 3(f). In order to evaluate the quality of reconstruction for uncalibrated image sequences. De-
3D reconstruction under varying system configura- note by M  fXk j1 ≤ k ≤ Mg the ground-truth model
tions, different values of system parameters da , dw , consisting of M 3D points and D  fYk j1 ≤ k ≤ Ng the
and θc are selected for each dataset. Note that in reconstructed model consisting of N 3D points; M
practice the distance da is normally small and θc is and D need to be aligned for quantitative evaluation.
within a few degrees, so the system configuration Unfortunately, the most popular point cloud align-
for the dataset arch_Ring (with large da and θc ) is ment algorithm iterative closest point (ICP) [25] can-
a more challenging one than for most practical sys- not be applied to our data due to the following three
tems. The dataset arch_Sparse poses challenges as reasons: (1) M and D are both the results of metric
well because the number of images is small. The 3D reconstruction, each of which is defined up to an
thickness of the glass interface is around 6 mm for arbitrary similarity transformation. Hence we need
all real datasets. to estimate not only the rigid transformation but also
C. Metric 3D Reconstruction and Evaluation Methodology the scale factor between M and D. (2) ICP requires a
good initialization of alignment that is not available
In this paper, we assume that the intrinsic para-
meters of the cameras are known and underwater
3D reconstruction based on the SVP camera model
is performed in two steps. In the first step, the struc-
ture and motion (SaM) algorithm [2] is employed to
obtain the camera calibration data and sparse 3D
point cloud of a scene. Specifically, the above SaM
uses bundle adjustment [20] to refine both the cam-
era parameters and scene structure by minimizing Fig. 4. (Color online) Sample images of real objects captured in
reprojection errors like in Eq. (18). For comparison the air. The size (L × W × H mm) of (a) shell is 165 × 108 × 95,
purposes, we investigate the performance of four (b) well is 229 × 152 × 165, and (c) arch is 178 × 76 × 165.

1 November 2012 / Vol. 51, No. 31 / APPLIED OPTICS 7597


Table 3. Real Datasets and the System Configurations evaluated by examining the following three
(d a and d w are measured in millimeters)
quantities.
Dataset Object Nv da dw θc
• The effectiveness of reconstruction is defined
shell_Ring Shell 32 10 400 ≈0° as the ratio of the number of 3D points in D that have
well_Ring Well 32 10 600 ≈0° correspondences in the ground-truth model M to the
arch_Ring Arch 32 200 600 ≈0° total number of 3D points in D, i.e.,
arch_Sparse Arch 8 5 500 ≈0°
jD00 j
Eff M; D  ; (22)
jDj
in our case. (3) Since M and D could have only small
partial overlay, a large portion of data will be re- where D00  fYk j1 ≤ k ≤ N; dYk ; M < δ · BBMg,
garded as outliers. j · j represents the cardinality of a set, and
To address the evaluation issue, we develop a new
S2D point cloud alignment algorithm particularly dYi ; M  min ‖TYi  − Xj ‖2 (23)
1≤j≤M
suitable for registering point clouds obtained from
image-based 3D reconstruction algorithms. The S2D is the minimal distance between the 3D point Yi and
algorithm consists of coarse alignment of the two each 3D point in M.
models using sparse data and alignment refinement • The completeness of reconstruction is defined
using dense 3D point clouds. Here, the sparse data as the ratio of the number of 3D points in M that
M0  fX0k j1 ≤ k ≤ M 0 g and D0  fY0k j1 ≤ k ≤ N 0 g refer have correspondences in the reconstructed model
to the ground truth sparse 3D point cloud and the D to the total number of 3D points in M, i.e.,
reconstructed sparse 3D point, respectively. For each
3D point X0 in the sparse 3D point cloud M0 , we cal- jM00 j
culate its feature descriptor as CompM; D  ; (24)
jMj

1 X where
N t

Desc · X0   Desc · xi ; (20)


N t i1
M00  fXk j1 ≤ k ≤ M; dXk ; D < δ · BBMg: (25)

where Desc · xi 1 ≤ i ≤ N t  is the 128-dimension • The accuracy of reconstruction is defined as the
scale invariant feature transform (SIFT) feature de- root-mean-square distance of correspondences be-
scriptor [26] of the ith image point used for triangu- tween D and M, i.e.,
lating X0 in the SaM algorithm. The feature
descriptor for each point in D0 can be calculated in v
u
a similar way. Given the feature descriptors, putative u 1 X jD00 j
AccM; D  t dD00 i ; M2 ;
correspondences between M0 and D0 are established jD00 j i1
(26)
by performing fast approximate nearest-neighbor
search in a high-dimensional feature space [27].
Then, we run three-point absolute orientation with where D00 i represents the ith element of D00.
scale estimation [28] using RANSAC to get a robust
estimate of the similarity transformation between
M0 and D0 , which serves as an initial coarse align-
ment between M and D. The initial transformation
is refined by running a robust ICP alignment algo-
rithm [29] on the dense point clouds. In both the
coarse alignment and the alignment refinement,
outliers are identified as correspondences whose dis-
tances are larger than δ (δ  1% in this paper) times
the maximal size of the bounding box BBM of
the ground-truth model. Denoted by T 3 the final si-
milarity transformation, a 3D point Y in D is trans-
formed as
Fig. 5. (Color online) Visual comparison of images captured un-
T 3 Y  sR3 Y  t3 ; (21) der different system configurations. The image in (a) is the image
captured in the air. The image in (b) is the underwater image with
(da  1000, dg  100, θc  0°). The image in (c) is an overlay of the
where ss > 0 is the scale factor, R3 is a 3 × 3 rotation two images in (a) and (b). An overlay of the image in (b) and the
matrix, and t3 is a 3 × 1 translation vector. underwater image with (da  10;000, dg  100, θc  0°,
Based on the result of accurate alignment between (da  1000, dg  5000, θc  0°) and da  1000, dg  100, θc  30°
M and D, the underwater 3D reconstruction is are shown in (d), (e), and (f), respectively.

7598 APPLIED OPTICS / Vol. 51, No. 31 / 1 November 2012


5. Results model in underwater 3D reconstruction. The results
In this section, we present qualitative and quantita- on synthetically rendered images and real images
tive evaluation of the performance of the SVP camera are presented in Subsections 5.A and 5.B, respectively.

Fig. 6. (Color online) Quantitative evaluation of underwater 3D reconstruction using synthetic datasets listed in Table 2. Shown in (a),
(b), and (c) are the results on dataset SC1 . Shown in (d), (e), and (f) are the results on dataset SC2 . Shown in (g), (h), and (i) are the results on
dataset SC3 . Shown in (j), (k), and (l) are the results on dataset SC4 . See text for details.

1 November 2012 / Vol. 51, No. 31 / APPLIED OPTICS 7599


A. Synthetic Data Next, we present quantitative results of systema-
Before presenting the results of 3D reconstruction, tic experiments using the datasets listed in Table 2.
we first show a comparison of visual difference be- In order to facilitate the evaluation of the accuracy of
tween images captured under different system 3D reconstruction, the ground-truth 3D model is
configurations. All the images shown in Fig. 5 are scaled such that the maximal size of its bounding
captured by cameras placed at the same position box is equal to 1 unit. The first set of experiments
and with the same intrinsic camera parameters. The studies the influence of N v using dataset SC1 [see
comparison in Fig. 5(c) shows that an object appears Figs. 6(a)–6(c)]. The results show that RDist 
to be closer to the camera and the scene structure is FAdj performs significantly better than NRF and
distorted (the edge of the base is bent) when sub- RDist in terms of effectiveness, completeness, and
merged in water due to refraction of light. It also also the accuracy of 3D reconstruction. The perfor-
shows that changing the distance between the cam- mance of FAdj on this dataset is close to that of
era and its refractive interface [see Fig. 5(d)] or chan- RDist  FAdj. The results also suggest that better
ging the thickness of the refractive interface [see results can be obtained by using more images and
Fig. 5(e)] results in similar shift along the radial di- the SVP camera model can achieve high accuracy
rection. However, increasing the angle θc between of 3D reconstruction (e.g., the 3D error is below
the optical axis of a camera and the normal of 0.004 units for RDist  FAdj using 16 images). The
interface causes large distortion in the image [see second set of experiments investigates the influence
Fig. 5(f)]. Actually, as will be seen in the following of da using dataset SC2 . In all comparison, superior
subsections, a large θc is also the main reason leading performance is achieved by using RDist  FAdj and
to poor underwater 3D reconstruction using the SVP FAdj [see Figs. 6(d)–6(f)]. Note that although the ac-
camera model. curacy of 3D reconstruction using NRF and RDist is

Fig. 7. (Color online) Comparison of 3D reconstruction using different methods. The results of NRF and RDist  FAdj on dataset
cf 16∕1000∕100∕0°  are shown in the first and second rows, respectively. The results of FAdj and RDist  FAdj on dataset
cf 32∕1000∕100∕15° are shown in the third and last rows, respectively. From left to right: the initial pose of reconstructed point clouds
(green) with respect to the ground-truth model (blue); a side view of reconstructed point cloud and the ground-truth model after applying
the S2D point clouds alignment algorithm; a top view of reconstructed point cloud and the ground truth model after applying the S2D point
clouds alignment algorithm; mesh generated from underwater 3D reconstruction without texture mapping; texture-mapped mesh
generated from underwater 3D reconstruction. (Regions of interest are highlighted.)

7600 APPLIED OPTICS / Vol. 51, No. 31 / 1 November 2012


Table 4. Performance of the SVP Camera Model on Real Images set of experiments examines the influence of dg using
Dataset Method Eff (%) Comp (%) Acc (mm) dataset SC3 . The results are similar to those of the
previous set of experiments, except that RDist 
shell_Ring NRF 96.3767 94.5636 0.6409
FAdj consistently achieves about 1% better in
RDist 96.6886 96.4833 0.6018
FAdj 98.6018 98.4560 0.4159
completeness of 3D reconstruction than FAdj [see
RDist  FAdj 98.8064 98.6558 0.3875 Fig. 6(h)]. Again, no apparent influence of dg on 3D
well_Ring NRF 62.3939 56.7704 1.1511 reconstruction is observed from the results. The last
RDist 89.0766 84.1615 1.0287 set of experiments focuses on the influence of θc using
FAdj 98.7560 96.7316 0.5721 dataset SC4 . This dataset poses more challenges be-
RDist  FAdj 98.5369 96.3586 0.5088 cause severe distortion is caused by increasing θc.
arch_Ring NRF 72.7131 74.3449 0.9570 Unlike the results from the previous experiments,
RDist 92.3916 93.6748 0.8415 the performances of the four reconstruction methods
FAdj 97.3044 96.4522 0.7043
vary significantly from each other [see Figs. 6(j)–6(l)].
RDist  FAdj 97.2812 97.4996 0.6839
arch_Sparse NRF 81.8516 22.5566 0.8822
Overall, the performance of the four methods all
RDist 92.1610 29.5537 0.8463 decreases (at different rate) with increasing θc . How-
FAdj 97.9026 32.9819 0.7171 ever, RDist  FAdj works fairly well when θc is below
RDist  FAdj 98.3631 34.0221 0.6915 15°, making it a practical one because θc is within a
few degrees if not zero in most underwater imaging
systems.
high (less than 0.006 units), the quality of the result- To facilitate qualitative evaluation, some re-
ing 3D model is poor due to its low effectiveness and sults of underwater 3D reconstruction on two
completeness. Compared with RDist  FAdj, FAdj datasets are shown in Fig. 7. For the dataset
performs equally well in terms of effectiveness and cf 16∕1000∕100∕0°, the point cloud obtained using
completeness of 3D reconstruction, while RDist  NRF is very noisy. This can be seen from the high-
FAdj gives slightly more accurate results [see lighted regions in the alignment between the recon-
Fig. 6(f)]. No apparent influence of da on 3D recon- structed point cloud and the ground-truth model [see
struction is observed from the results. The third Fig. 7(a)]. For the dataset cf 16∕1000∕100∕0°, FAdj

Fig. 8. (Color online) Distribution of 3D errors for reconstruction on real underwater images. The 3D error of a reconstructed 3D point is
defined by Eq. (23). Errors larger than 1% of the size of the ground-truth model are collected in the last bin.

1 November 2012 / Vol. 51, No. 31 / APPLIED OPTICS 7601


also gives poor results as can be seen from the high- shell_Ring to 0.6915 mm for dataset arch_Sparse).
lighted regions both in the alignment of point clouds Note that the completeness of 3D reconstruction for
and in the reconstructed mesh [see Fig. 7(c)]. In com- dataset arch_Sparse is low for all four methods. This
parison, the results of RDist  FAdj on both datasets is because this dataset consists of only eight images
are of high visual quality, as shown in Figs. 7(b) and each camera can only see a part of the object in
and 7(d). the setup shown in Fig. 3(f). The overall performance
of RDist  FAdj is the best among the four methods
B. Real Data and the performance of FAdj is close to that of
With the success of the SVP camera model in under- RDist  FAdj, both of which coincide with the results
water 3D reconstruction using synthetic data, next on synthetic datasets. A more detailed analysis of the
we present results on real datasets listed in Table 3. distribution of errors in 3D reconstruction is shown
Since the physical dimensions of the objects used in in Fig. 8. The superior performance of RDist  FAdj
real datasets are known, we are able to measure the is clearly demonstrated, since the 3D errors concen-
accuracy of 3D reconstruction in millimeters. The re- trate in the lower value range for all four real
sults are summarized in Table 4, from which we see datasets. We also present visualization of 3D recon-
that the performance of each reconstruction method struction on real datasets in Fig. 9 for comparison.
varies on each real dataset because refractive distor- In all cases, the strategy RDist  FAdj successfully
tion depends on the scene structure. Nevertheless, compensates for refraction and reconstructs high-
the 3D reconstruction of RDist  FAdj is of high quality 3D models [see Figs. 9(a)–9(d)]. Without
accuracy (ranging from 0.3875 mm for dataset radial distortion correction and focal length

Fig. 9. (Color online) Comparison of 3D reconstruction using different methods. Shown in the first four rows are the results of
RDist  FAdj on the dataset shell_Ring, well_Ring, arch_Ring, and arch_Sparse, respectively. From left to right: a sample underwater
image from the dataset; the initial pose of reconstructed point clouds (green) with respect to the ground-truth model (blue); a side view of
the reconstructed point cloud and the ground-truth model after applying the new S2D point clouds alignment algorithm; mesh generated
from underwater 3D reconstruction without texture mapping; texture-mapped mesh generated from underwater 3D reconstruction. The
last row shows the results of NRF on dataset well_Ring. From left to right: the initial pose of reconstructed point cloud; two images of point
cloud alignment; two images of mesh generated from underwater 3D reconstruction. (Regions of interest are highlighted.)

7602 APPLIED OPTICS / Vol. 51, No. 31 / 1 November 2012


adjustment, refraction causes serve distortion 3D re- 7. T. Treibitz, Y. Y. Schechner, C. Kunz, and H. Singh, “Flat re-
fractive geometry,” IEEE Trans. Pattern Anal. Mach. Intell.
construction as shown in Fig. 9(e), where both the 34, 51–65 (2012).
alignment of point clouds and the reconstructed 8. P. Sturm and J. P. Barreto, “General imaging geometry for
mesh are noisy. central catadioptric cameras,” in Proceedings of European
Conference on Computer Vision (Springer, 2008), pp. 609–622.
6. Conclusions 9. Z. Kukelova, M. Bujnak, and T. Pajdla, “Closed-form solutions
Refraction of light has been recognized as one of the to minimal absolute pose problems with known vertical direc-
tion,” in Proceedings of Asian Conference on Computer Vision
major challenges in underwater vision. While most (Springer, 2010), pp. 216–229.
existing work on underwater vision focus on calibra- 10. A. Sedlazeck and R. Koch, “Calibration of housing parameters
tion, no previous work has systematically studied the for underwater stereo-camera rigs,” in Proceedings of British
influence of refraction on underwater 3D reconstruc- Machine Vision Conference (BMVA, 2011), paper 118.
11. G. Telem and S. Filin, “Photogrammetric modeling of under-
tion. In this paper we provide theoretical analysis of water environments,” ISPRS J. Photogramm. Remote Sens.
refractive distortion in images and make an impor- 65, 433–444 (2010).
tant observation that refractive distortion can be 12. A. Agrawal, S. Ramalingam, Y. Taguchi, and V. Chari, “A
well approximated using the SVP model after apply- theory of multi-layer flat refractive geometry,” in IEEE Con-
ing similarity transformation to the 3D scene. This ference on Computer Vision and Pattern Recognition (IEEE,
2012), pp. 3346–3353.
observation indicates the possibility of using the 13. S. Ramalingam, P. Sturm, and S. K. Lodha, “Theory and cali-
SVP model for accurate underwater metric 3D recon- bration algorithms for axial cameras,” in Proceedings of Asian
struction, which is also up to an arbitrary similarity Conference on Computer Vision (Springer, 2006), pp. 704–713.
transformation. We then systematically evaluate the 14. R. Swaminathan, M. D. Grossberg, and S. K. Nayar, “A
perspective on distortions,” in IEEE Conference on Computer
performance of the SVP model in underwater 3D Vision and Pattern Recognition (IEEE, 2003), pp. 594–601.
reconstruction using both synthetically rendered 15. Y. Swirski, Y. Y. Schechner, B. Herzberg, and S.
images and real images. The results reveals a rather Negahdaripour, “CauStereo: range from light in nature,”
surprising and useful yet overlooked fact that the Appl. Opt. 50, F89–F101 (2011).
SVP camera model with radial lens distortion correc- 16. L. Bartolini, L. De Dominicis, M. Ferri de Collibus, G. Fornetti,
M. Guarneri, E. Paglia, C. Poggi, and R. Ricci, “Underwater
tion and focal length adjustment can compensate for three-dimensional imaging with an amplitude-modulated la-
refraction and achieve high accuracy in underwater ser radar at a 405 nm wavelength,” Appl. Opt. 44, 7130–7135
3D reconstruction. Such an observation justifies the (2005).
use of the SVP model in underwater application for 17. C. Kunz and H. Singh, “Hemispherical refraction and
camera calibration in underwater vision,” in Proceedings of
reconstructing a reliable 3D scene and can also be MTS/IEEE Oceans (IEEE, 2008), pp. 1–7.
used to guide the selection of system parameters 18. S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R.
in the design of underwater 3D imaging systems. Szeliski, “A comparison and evaluation of multi-view stereo
reconstruction algorithms,” in Proceedings of IEEE Confer-
This work was partially supported by the Chinese ence on Computer Vision and Pattern Recognition (IEEE,
Scholarship Council (grant no. 2010611068), the 2006), pp. 519–528.
Hunan Provincial Innovation Foundation for Post- 19. G. Glaeser and H.-P. Schrocker, “Reflections on refractions,”
graduate (grant no. CX2010B025), the Excellent J. Geom. Graph. 4, 1–18 (2000).
20. B. Triggs, P. F. Mclauchlan, R. I. Hartley, and A. W. Fitzgibbon,
Graduate Student Innovative Project of the National “Bundle adjustment—a modern synthesis,” in Proceedings of
University of Defense Technology, the Natural the International Workshop on Vision Algorithms: Theory and
Sciences and Engineering Research Council of Practice (Springer, 1999), pp. 153–177.
Canada (NSERC), and the University of Alberta. 21. http://www.povray.org.
The thank the anonymous reviewers for their con- 22. http://graphics.stanford.edu/data.
23. Y. Furukawa and J. Ponce, “Accurate, dense, and robust multi-
structive comments. view stereopsis,” IEEE Trans. Pattern Anal. Mach. Intell. 32,
1362–1376 (2010).
References 24. C. Strecha, W. von Hansen, L. Van Gool, P. Fua, and
1. R. Hartley and A. Zisserman, Multiple View Geometry in U. Thoennessen, “On benchmarking camera calibration and
Computer Vision, 2nd ed. (Cambridge University, 2004). multi-view stereo for high resolution imagery,” in Proceedings
2. N. Snavely, S. M. Seitz, and R. Szeliski, “Modeling the of IEEE Conference on Computer Vision and Pattern Recogni-
world from internet photo collections,” Int. J. Comput. Vis. tion (IEEE, 2008), pp. 1–8.
80, 189–210 (2008). 25. P. J. Besl and N. D. McKay, “A method for registration of 3-D
3. V. Chari and P. Sturm, “Multiple-view geometry of the refrac- shapes,” IEEE Trans. Pattern Anal. Mach. Intell. 14, 239–256
tive plane,” in Proceedings of British Machine Vision Confer- (1992).
ence (BMVA, 2009). 26. D. G. Lowe, “Distinctive image features from scale-invariant
4. Y. Chang and T. Chen, “Multi-view 3D reconstruction for keypoints,” Int. J. Comput. Vis. 60, 91–110 (2004).
scenes under the refractive plane with known vertical direc- 27. M. Muja and D. G. Lowe, “Fast approximate nearest neighbors
tion,” in Proceedings of International Conference on Computer with automatic algorithm configuration,” in Proceedings of
Vision (IEEE, 2011). International Conference on Computer Vision Theory and
5. P. C. Y. Chang, J. C. Flitton, K. I. Hopcraft, E. Jakeman, D. L. Applications (INSTICC, 2009).
Jordan, and J. G. Walker, “Improving visibility depth in 28. B. K. P. Horn, “Closed-form solution of absolute orientation
passive underwater imaging by use of polarization,” Appl. using unit quaternions,” J. Opt. Soc. Am. A 4, 629–642 (1987).
Opt. 42, 2794–2803 (2003). 29. D. S. D. Chetverikov, D. Svirko, and P. Krsek, “The
6. W. Hou, S. Woods, E. Jarosz, W. Goode, and A. Weidemann, trimmed iterative closest point algorithm,” in Proceedings
“Optical turbulence on underwater image degradation in nat- of International Conference on Pattern Recognition
ural environments,” Appl. Opt. 51, 2678–2686 (2012). (IEEE, 2002), Vol. 3, pp. 545–548.

1 November 2012 / Vol. 51, No. 31 / APPLIED OPTICS 7603

You might also like