Professional Documents
Culture Documents
Richard L. Marks
Michael J. Lee
y Stephen M. Rock
z
Underwater issues
Visual sensing for underwater applications has received
little attention compared to other sensing strategies de-
spite the general acknowledgment of its many strengths.
This is primarily due to problems in underwater optical
imaging including limited range, nonuniform lighting,
and marine snow. The following sections describe these
issues and the e ect that each has on visual sensing.
Limited range
Several physical e ects limit the achievable range of un-
derwater visual sensing. A combination of the attenua-
tion of light in water with light absorption and scatter- Figure 1: Underwater image. Note the nonuni-
ing by suspended matter cause the amount of re ected form lighting and presence of marine snow.
light to exponentially decay as a function of distance. It
Visual sensing approach
We have chosen a visual sensing approach which ad-
dresses the issues listed above. The visual sensing ap-
proach described below involves a combination of two
basic ideas in computer vision{image ltering and im-
age correlation. Image ltering is used to ensure that
images will have certain characteristics. For example,
a Gaussian lter can be used to smooth an image by
removing spatial frequency information above a certain (a) (b)
level. Image correlation is used to measure the likeness
between sections of images, usually in order to establish
correspondence. Maximum (perfect) correlation occurs
when image sections are identical. The following sec-
tions provide a detailed description of the image lter- Figure 2: Laplacian of Gaussian lter. (a) One-
ing and correlation approaches we have chosen to utilize, dimensional case. (b) Two-dimensional case,
followed by a explanation of why each was chosen. used for images.
The nal part of the lter is the signum function.
Signum of Laplacian of Gaussian lter This e ectively binarizes the image, mapping all posi-
tive values of the Laplacian of Gaussian ltered image
The information present in typical images varies greatly. to white and all nonpositive values to black. The zero-
This makes it dicult for a single processing approach crossings of the Laplacian of Gaussian occur at the bor-
to perform successfully on all images. To address this, ders between black and white. Thus, the image edge
the information content of an image can rst be ltered information is still present but further processing can
to extract only information of particular interest. Sev- be done on areas rather than lines. In the presence of
eral common lters which have been used are: Gaus- noise, these areas can be tracked with greater accuracy
sian lters, to lowpass the spatial frequencies in images; than the actual edges [20].
Gabor lters, to bandpass the spatial frequencies; and Figure 3 demonstrates the three ltering steps de-
edge detectors, to extract the locations of sharp inten- scribed above.
sity transitions.
A lter with several interesting properties is the
signum of Laplacian of Gaussian lter:
,x2 ,y2
I (x; y)= sgn[5 e 2
0 2 2 2
2 I(x; y )] (1)
I(x; y) corresponds to the original image intensities,
and I'(x; y) corresponds to the output intensities. The
Gaussian convolution smoothes the image and acts as
a low-pass lter, limiting the range of spatial frequen- (a) (b)
cies. The frequency roll-o is adjusted by varying , the
width of the Gaussian. Increasing the width reduces the
range of frequencies that is passed by the lter.
The Laplacian portion of the lter is a double di eren-
tiation of the Gaussian-smoothed image (Figure 2). The
Laplacian is a high-pass lter which completely removes
both constant (DC) and linear-varying intensity infor-
mation. The Laplacian evaluates to zero at local max- (c) (d)
ima of the magnitude of the intensity gradient. Zero-
crossings of the Laplacian of Gaussian are often used
to detect edges in images [13]. The zero-crossings are Figure 3: Image ltering. (a) Original image. (b)
locally more stable in the presence of noise than most Gaussian ltered image. (c) Laplacian of Gaus-
other simple image properties, thus making them attrac- sian ltered image. (d) Signum of Laplacian of
tive features for tracking and pattern matching. Gaussian ltered image.
Correlation
Camera 1
The correspondence problem occurs in many computer
vision applications [15, p.220]. One of the most common Camera 2
techniques used to address this problem is correlation.
Correlation can be thought of as computing the like-
ness of image sections. One image section is correlated
with many other image sections to generate a correla-
tion surface. The correlation surface can be examined to (a) (b) (c)
Stereo disparity (range) Figure 6: Optical ow. (a) Image at time t. (b)
Image at time t + t. (c) Resulting optical ow.
One of the most common uses of correlation is stereo
ranging [5, p.189]. For example, two parallel cameras
perceive objects to be at di erent horizontal locations Feature tracking
in the images (Figure 5). The di erence between the
locations is called the disparity and is inversely related Correlation can also be used to track features as they
to the distance of the object from the cameras (range). move about the image. Feature tracking is similar to
Correlation for ranging is a one-dimensional problem{ optical ow calculation, except the search area is moved
matching sections must lie somewhere along a line which as the feature is moved. Optical ow is analogous to
is a function of the camera geometry (known as the Lagrangian uid dynamics analysis and feature tracking
epipolar line). For the parallel setup shown, the epipo- is analogous to Euclidean analysis.
lar lines for all sections are horizontal. This geometrical The particular feature tracking approach we chose in-
constraint greatly reduces the number of correlations re- volves tracking features with respect to a starting im-
quired for stereo ranging. age. All subsequent images are correlated with the rst
image. Thus, the feature which subsequent images are
correlating against does not drift and true positional
o set information can be obtained (see Figure 7). Note
that this approach implicitly assumes that the feature
will not change signi cantly in the subsequent images.
A great deal of research has gone into algorithms for
selecting \good" features. We have chosen to allow all
image sections to be considered as potential tracking
features. (a) (b)
Y
offset
X (c) (d)
offset
Correlation
window Figure 8: The signum of Laplacian of Gaussian
(a) (b) lter. (a) Original image. (b) Filtered image.
(c) Original image with simulated contrast re-
duction, nonuniform lighting, and marine snow.
(d) Filtered image. Note the similarity to (b).
Figure 7: Feature tracking with respect to an ini-
tial image. (a) Image at time tinitial . (b) Image at
time tsubsequent . Positional o set from the initial The binary correlation technique is a fast, highly par-
image is measured directly. allel operation. This allows many measurements to be
obtained between each camera frame and makes corre-
lation of large image sections feasible. Correlation can
Characteristics of approach for underwa- be used to obtain a variety of sensory data, including
ter sensing velocity, position, and range measurements. Thus, the
same implementation (namely hardware) can be used to
The combination of the speci c ltering and correla- gather many pieces of information. We are using special-
tion techniques described above is uniquely suited for purpose hardware developed by Teleos Research2 to per-
underwater visual sensing. The Laplacian of Gaussian form the signum of Laplacian of Gaussian correlation
lter reduces the negative e ects of many of the de- [21, 19].
scribed issues. It acts as a band-pass lter, rejecting
both high and low spatial intensity frequencies. By
properly adjusting the lter width of the Gaussian, the
high-frequency information introduced by the presence
Applications of visual sensing
of marine snow can be removed. In addition, lighting bi- There are many underwater applications which could
ases and gradual nonuniformities are e ectively reduced bene t from the use of visual sensing [14]. Because hu-
by the Laplacian. The lighting e ects are further mini- mans use vision as their primary sensor, many tasks are
mized by the binary quantization of the signum opera- conceptually tied to visual stimuli. The most obvious of
tion. The Laplacian and signum operations address the these are observational applications, for which the goal
problem of low ambient lighting and limited contrast as is to keep a particular object or scene centered in the
well. An impressive demonstration of the power of the image. An example of an observational application is
signum of Laplacian of Gaussian lter is shown in Fig- sh (moving object) following.
ure 8 for two images with di ering lighting, contrast, and
marine snow. The two ltered images correlate better
than 90 percent, and the inside quarters of the images lated by an intensity reduction factor which is a linear function of
correlate better than 95 percent.1 (r , r0 )2 for r > r0 , radius measured from image center. Marine
snow was simulated by 250 uniformly random-spaced solid white
1 Contrast was reduced by dividing image intensity by 4 and 3 3 pixel squares.
centering to middle range (gray). Nonuniform lighting was simu- 2 Teleos Research. Palo Alto, CA. keith@teleos.com.
Other types of applications use visual sensing to The sensing diculties are all facets the same basic
achieve di erent goals, such as vehicle positioning. problem{quickly discerning the sh from the rest of the
Terrain-relative station keeping, ocean oor mosaicking, scene. This is a speci c example of the segmentation
and mosaic navigation fall into this category. The fol- problem, a widely studied topic in computer vision [15,
lowing sections describe these four sample tasks in detail p.113]. We have investigated two segmentation strate-
and present the requirements/speci cations of each in- gies, each to be used in di erent tracking scenarios.
dividually. The speci c approaches we have developed These strategies use fundamentally di erent visual mea-
are also presented brie y. Experimental veri cation of surements to discern the object from the background{
the approaches has been done using OTTER, an un- range (disparity) and motion (optical ow). A more
manned underwater robot (shown in Figure 9) being detailed account of these strategies can be found in [10].
developed jointly by the Stanford Aerospace Robotics
Laboratory and the Monterey Bay Aquarium Research Range segmentation
Institute [25, 12]. Range segmentation is useful for midwater tracking
when no background terrain is within view. The sh can
Object following be separated from the rest of the scene by making the
following assumption: range measurements closer than
The ability to automatically follow objects is necessary a certain threshold must correspond to a part of the
for certain underwater robot missions. Automatic ob- sh, and larger measurements correspond to something
ject tracking is a component needed for a variety of which is not the sh. The e ectiveness of this simple
military and scienti c applications [2]. The particular segmentation scheme is dependent upon the sh being
application of object tracking we have investigated is large enough to produce several range measurements.
the remote observation of sh by marine scientists. The Spurious measurements also degrade the performance
goal of the sh-following application is to keep a single of such a scheme considerably. To help limit spurious
mobile undersea lifeform in the camera's eld of view so measurements beyond the e orts taken to validate each
that a scientist might be able to study it for an extended measurement individually, further reasonable assump-
period (Figure 9). tions about the sh can be made. For most cases, it is
valid to assume the 2-D projection of the sh is a single
closed polyhedron. This assumption can be used to l-
ter out noncontiguous measurements (which have a high
ψ probability of being spurious).
Motion segmentation
Motion segmentation can be used in situations involv-
y x ing background terrain which is suciently near to im-
z pede range segmentation. The key assumption behind
R
motion segmentation is that the object is moving no-
ticeably with respect to the background. Optical ow
measurements can be used to di erentiate between the
background and the sh in many cases. If most of the
image consists of background with a relatively constant
optical ow, an overall image velocity can be obtained.
Figure 9: Fish following. The robot keeps the Optical ow measurements inconsistent with this mo-
turtle centered in the camera's eld of view by tion can be assumed to be measurements corresponding
changing location and moving the pan/tilt. to the sh. This assumption is fairly weak; realistic sit-
uations can be contrived for which this breaks down.
The task of following a sh is dicult for several rea- However, this assumption holds for many cases in which
sons. The speeds at which many sh travel is large com- the background is relatively smooth and perpendicular
pared to typical vehicle sensor and actuator bandwidths. to the camera.
Fish are asymmetric objects without any well-behaved The e ectiveness of this segmentation scheme is also
geometrical properties. They can quickly change orien- dependent upon the size of the sh in the image. It must
tation, and some are even deformable. Natural camou- be large enough to a ect several optical ow measure-
age can make it dicult to distinguish sh from terrain, ments but still be small compared to the background.
and they often remain still so that their motion will not Spurious measurements can be identi ed by making the
alert predators. same assumption of contiguity as in range segmentation.
Tracking performance
The actuation bandwidth which can be achieved for an
underwater robot is limited by many operational param-
eters. Typical bandwidths fall well short of the speeds
exhibited by many sh. The e ective tracking band-
width can be improved by independently actuating the
cameras with respect to the robot [24]. The OTTER
robot possesses a high-speed cable-drive pan/tilt to ac-
complish improved object tracking performance (Figure
10).
Feature tracking
Figure 10: OTTER's pan and tilt system for a To avoid the possibility of drift due to motion integra-
stereo camera pair.. tion, a feature tracking approach can be used in place
of optical ow measurements. A suitably large section
Because of the directional nature of imaging, some of terrain with adequate texture must be chosen as the
amount of separate camera actuation is necessary for feature to be tracked. The section is tracked with re-
visual sensing to be most e ective. The requirement spect to an initial image which does not change with
of repositioning the entire robot in order to change the time. Thus, the initial image feature being tracked does
camera view is not acceptable for many applications. not change and correlation can be used to obtain a true
Also, in applications for which the desired goal is achiev- measure of its positional change. For the optical ow
ing a certain image for viewing, camera actuation may approach, some form of segmentation is needed to keep
be sucient. For many situations, robot positioning the station keeping reference from drifting.
is unnecessary and can actually be detrimental due to The feature tracking approach assumes that the pro-
excess power consumption and visual disturbance (e.g., jection of the terrain section will not change signi cantly
mud clouds from the thrusters agitating the ocean oor). in subsequent images. Changes can occur for many
reasons including camera motion and rotation, lighting
di erences, and occlusion. Image ltering can be used
Station keeping to reduce the impact these e ects have on correlation.
Terrain-relative station keeping is an example of another However, large angle and zoom changes will warp the
observational task. Many applications require the abil- terrain section enough that correlation will not provide
ity to accurately maintain position for photographing, useful data. To be e ective, the feature tracking re-
sampling, or performing other location-dependent func- quires a minimal amount of net camera motion in these
tions. In many cases the desired position is relative to dimensions.
terrain such as the ocean oor or a canyon wall (Figure
11). Direct measurement of this relative position can Translation, rotation, and zoom
produce a more accurate result than inferring position
from other sensing systems. A single feature can be tracked to obtain the vertical and
The problem of visual station keeping of an underwa- horizontal image o sets. Assuming the scene is static
ter vehicle has been investigated by several researchers and camera orientation and range to the scene remain
[18, 17, 16]. The approach presented in that work in- constant, these o sets can be used to determine a scaled
volves determination of the optical ow in a scene. Be- measurement of the robot's translation from the follow-
cause the scene is assumed to be static, the optical ow ing equations:
composite images made up of multiple smaller images
x = R sin(Offsetvert ) (2) which have been properly matched. Due to the limited
visual range in the ocean, mosaics are the only means
y = R sin(Offsethoriz ) (3) for creating single images of large areas. Background
These measurements use the coordinate system shown related to automatic mosaic creation can be found in
in Figure 9 and assume the cameras are xed to the ve- [11].
hicle and are pointed parallel to the z -axis. Note that To build a mosaic, a robot vehicle must move about
Offsethoriz and Offsetvert are angular image measure- and collect snapshots. To piece the images together into
ments and R is the distance between the camera and the the nal mosaic, some positional measure must be de-
feature being tracked. For small o sets, the approxima- termined for each image. Feature tracking correlation
tion sin x = x can be invoked to linearize the equations. can be used to obtain the relative positional o sets be-
Multiple features can be tracked to obtain additional tween images. In this respect, mosaicking is closely re-
measurements of both rotation and scaled zoom (or lated to station keeping. Station keeping consists of reg-
range) [8]. Figure 12 shows the case of two features, one ulating about a constant desired position with respect
of which is chosen at the center of the image. Assuming to a xed section of terrain. Mosaicking, however, in-
only twist rotation occurs, and the o sets, rotation, and volves measuring positional o sets and repeatedly se-
change in zoom are small: lecting new sections of terrain for tracking as the pre-
vious section moves out of view. Automatic mosaicking
can be achieved by controlling the robot to zero the er-
x = R (t~1 |^) (4) ror between a desired positional o set and the measured
y = R (t~1 ^{) (5) o set.
~ ~
= (t2 ,j~rt1j ) ~r k^ (6) Projection geometry
~ ~ The mosaic quality which can be achieved is largely a
z = (t2 ,j~rtj1 ) ~r (7) function of the projection geometry of the mosaicking
system. Most cameras can be modelled as pinhole cam-
The z equation computes a range measurement nor- eras which image using perspective projection. Mosaick-
malized to the range of the initial image. The true range ing multiple images obtained using perspective projec-
R can be obtained using stereo correlation, although tion is problematic for nonplanar scenes, although no
in many cases a rough estimate is adequate for station problems arise when orthographic projection is used.
keeping. Results of the accuracy of this approach are (Figure 13). The mosaic quality is directly related to
presented in [8]. how close the image projection geometry is to ortho-
graphic.
→
t2
→ →
r t1
ˆj
ˆi
(a) (b)
Figure 12: Two tracked features. t~1 and t~2 are 2-
vector measures of feature position o sets. ~r is
the 2-vector displacement between the two fea- Figure 13: Projection geometry and mosaicking.
tures. (a) Perspective projection. The rst and second
eye observe completely di erent data in the over-
lapping parts of the images. (b) Orthographic
Mosaicking projection. The rst and second eye observe the
same data in the overlapping parts of the images.
Video mosaicking of the ocean oor is an application for
which visual sensing is uniquely quali ed. Mosaics are
Perspective projection can also cause a problem for
feature tracking correlation. For station keeping, the
o sets between the initial image and subsequent images
are nominally small. However, for mosaicking this is
not the case. The o sets are on the order of the eld of
view of the camera. The perspective e ect can greatly
degrade correlation performance for terrains of signi -
cant relief.
Vehicle velocity
1 2 3 4
Individual frames
Single-column mosaics
Figure 14 shows how a robot can be used to generate
single-column mosaics. The following algorithm can be
used to gather images:
procedure make_single_column_mosaic()
{
Start:
get_image()
store_image()
cor_window_pos = Center_Top_Of_Image
repeat
x = pos_of_best_cor_in_live_image()
posError = Center_Bottom_Of_Image - x
if automatic_mode move_robot(posError)
until posError < epsilon
record_image_offset(x)
goto Start
}
The error between desired o set and current o set
is computed by the above algorithm; this can be used
for servoing to create automatic mosaics (hence the
move robot() command above). Note that the positional
o set of every new image is measured with respect to the
previous image only.
This algorithm has been tested experimentally for a
variety of platforms. Figure 15 shows a mosaic created Figure 15: A four-image single-column mosaic of
using Ventana, MBARI's remotely operated vehicle. the ocean oor.
Multiple-column mosaics separating them. The expected error from Equation (9)
The single-column algorithm can be extended to ob- for this worst-case scenario would be proportional to n,
tain images for mosaics of multiple columns by replac- or m2 , 1. The total separation between the images is
ing Center Top Of Image and Center Bottom Of Image O(m), while the error is O(m2 ).
with values dependent upon the image number. For in- A mosaicking strategy for which the worst-case error
stance, at the end of each column, Center Top Of Image is O(m) is shown in Figure 17. Instead of always measur-
could be replaced with Right Center Of Image and Cen- ing o sets between sequentially obtained images, o sets
ter Bottom Of Image with Left Center Of Image for a can be measured between an images and its adjacent im-
pattern like that in Figure 16. However, this simple ex- age from the previous column. The expected worst-case
tension of the single-column algorithm behaves poorly error for this approach is proportional to m + m, using
when slight errors occur in image o sets. This can be the same assumptions that were made about the errors
shown by examining the error propagation from image to obtain the results in Equation (9).
to image.
Live
2 3 Image
1 4
0 5 6
(a) (b)
Initial experiments
Although the problems for visual navigation may at rst
appear prohibitive, the experimental success we have
shown for creating mosaics is nevertheless inspiring. The
adjacent-column correlation strategy in particular faces
many of the same issues. Initial experiments are planned
for xed-location pan/tilt navigation using mosaics gen-
erated from pan/tilt motion. This will demonstrate the
ability to \look at that" while restricting the number of
degrees of freedom which must be controlled.
References [15] V.S. Nalwa. A Guided Tour of Computer Vision.
Addison-Wesley, 1993.
[1] J.L. Barron, D.J. Fleet, and S.S. Beauchemin. Pefor- [16] S. Negahdaripour and J. Fox. Undersea optical station-
mance of optical ow techniques. International Journal keeping. Improved methods. Journal of Robotic Sys-
of Computer Vision, 12(1):43{77, 1994. tems, 8(3):319{338, June 1991.
[2] R. Blidberg. Autonomous underwater vehicles. current [17] S. Negahdaripour, A. Shokrollahi, J. Fox, and S. Arora.
activities and research opportunities. Robotics and Au- Improved methods for undersea optical stationkeeping.
tonomous Systems, 7(2-3):139{150, August 1991. In IEEE International Conference on Robotics and Au-
[3] F. Chaumette, P. Rives, and B. Espiau. Positioning of tomation, volume 3, pages 2752{2758, 1991.
a robot with respect to an object, tracking it and esti- [18] S. Negahdaripour and C. Yu. Passive optical sensing
mating its velocity by visual servoing. In IEEE Inter- for near-bottom stationkeeping. In Proceedings of IEEE
national Conference on Robotics and Automation, vol- Oceans 1990, pages 82{87, 1990.
ume 3, pages 2248{2253, 1991.
[19] H.K. Nishihara. Prism: a practical realtime imaging
[4] B. Espiau, F.Chaumette, and P. Rives. A new approach stereo matcher. In SPIE, pages 134{142, 1983.
to visual servoing in robotics. IEEE Transactions on
Robotics and Automation, 8(3):313{326, June 1992. [20] H.K. Nishihara. Practical realtime imaging stereo
matcher. Optical Engineering, 23(5):536{545, October
[5] O. Faugeras. Three-dimensional computer vision: a ge- 1984. Also in Readings in Computer Vision: issues,
ometric viewpoint. MIT Press, 1993. problems, principles, and paradigms, edited by M.A.
[6] B.K.P. Horn. Robot Vision. MIT Press, 1986. Fischler and O. Firschein, Morgan Kaufmann, Los Al-
[7] J.S. Ja e. Sensors for underwater robotic vision: status tos, 1987.
and prospects. In IEEE International Conference on [21] H.K. Nishihara and N.G. Larson. Towards a real-
Robotics and Automation, volume 3, pages 2759{2766, time implemenation of the Marr-Poggio stereo matcher.
1991. In Proceedings of DARPA Image Understanding Work-
[8] R. Marks, M. Lee, and S. Rock. Automatic visual sta- shop, pages 114{120, April 1981.
tion keeping of an underwater robot. In Proceedings [22] S. Schneider. Experiments in the Dynamic and Strategic
of IEEE Oceans '94, Brest, France, September 1994. Control of Cooperating Manipulators. PhD thesis, Stan-
IEEE. ford University, Stanford, CA 94305, September 1989.
Also published as SUDAAR 586.
[9] R. Marks, S. Rock, and M. Lee. Analysis of strategies
for automatic multi-column video mosaicking. Paper in [23] A. Verri and T. Poggio. Against quantitative optical
progress. ow. In Proceedings of 1st International Conference on
Computer Vision, pages 171{180, 1987.
[10] R. Marks, S. Rock, and M. Lee. Automatic Object
Tracking for an Unmanned Underwater Vehicle using [24] H. H. Wang, R. L. Marks, S. M. Rock, M. J. Lee, and
Real-Time Image Filtering and Correlation. In Proceed- R. C. Burton. Combined Camera and Vehicle Track-
ings of IEEE Systems, Man, and Cybernetics, France, ing of Underwater Objects. In Proceedings of Interven-
October 1993. IEEE. tion/ROV '92, San Diego CA, June 1992.
[11] R. Marks, S. Rock, and M. Lee. Real-Time Video Mo- [25] Howard H. Wang, Richard L. Marks, Stephen M. Rock,
saicking of the Ocean Floor. In Proceedings of IEEE and Michael J. Lee. Task-Based Control Architecture
Symposium on Autonomous Underwater Vehicle Tech- for an Untethered, Unmanned Submersible. In Proceed-
nology, Cambridge, MA, July 1994. IEEE. ings of the 8th Annual Symposium of Unmanned Un-
tethered Submersible Technology, pages 137{147. Marine
[12] R.L. Marks, T.W. McLain, D.W. Miles, S.M. Rock, Systems Engineering Laboratory, Northeastern Univer-
G.A. Sapilewski, H.H. Wang, R.C. Burton, and sity, September 1993.
M.J. Lee. Monterey Bay Aquarium Research Insti-
tute/Stanford Aerospace Robotics Laboratory Joint Re-
search Program Summer Report 1992. Project report,
Stanford Aerospace Robotics Laboratory and Monterey
Bay Aquarium Research Institute, 1992.
[13] D. Marr and E. Hildreth. Theory of edge detection.
Proceedings of the Royal Society of London, (207):181{
217, 1980.
[14] David W. Miles and Glen A. Sapilewski. Uses of semi-
autonomous underwater vehicles: A scienti c survey.
Technical report, Stanford Aerospace Robotics Labora-
tory and Monterey Bay Aquarium Research Institute,
October 1992.