You are on page 1of 8

Positioning and Scene Analysis from Visual Motion Cues within Water Column

S. Negahdaripour and M.D. Aykin M. Babaee S. Sinnarajah and A. Perez


ECE Department Department of Computer science Gulliver Preparatory University of Miami Technical University of Munich Pinecrest, FL 33155 Coral Gables, FL 33146 Munich, Germany (sinn022{pere051}@students.gulliverschools.org) (nshahriar{m.aykin}@{u}miami.edu) (babaee@mytum.de)

AbstractOver the last dozen or more years, many applications of vision-based positioning and navigation near the sea bottom and surface have been explored. Mid-water operations have primarily relied on traditional positioning systems, namely INS, DVL, gyros, etc. This paper investigates the application of a vision system for mid-water operations by exploiting stationary features within the water column. The unique nature of these environment namely, the abundance of randomly distributed targets over a wide eld of view and range of depth are ideal for the application of well-known motion vision methods for 3-D motion estimation and scene analysis. We demonstrate through experiments with water tank and ocean data how various visual motion cues may be used for passive navigation, environmental assessment and target/habitat classication based on visual motion behavior.

I. I NTRODUCTION The ability to determine ones position and (or) achieve point-to-point navigation with precision is an important capability in the operation of robotics and (or) sensor platforms in underwater. The subsea platforms have traditionally deployed various sensors, including gyros, DVL, INS, etc. Many such positioning systems have their advantages, but also limitations and drawbacks. For example, off-the-shelf sensors are cheap, but are slow, inaccurate, and subject to signicant drift. More precise sensors are often pricey and or bulky, making them impractical for deployment on small low-cost platforms. The shortcomings, mainly inaccuracy and drift issues, can be reduced to some extent by fusing information (e.g., integrating and fusion of estimates from a number of sensors of the same or different modalities). Over the last 15 years, applications of computer vision techniques exploiting the visual cues in underwater (video) imagery have provided new approaches, with many being implemented through integration with information from traditional devices (e.g., [6], [9][11], [14][18]). More precisely, it has been shown that a vehicles trajectory and (or) motion can be estimated by tracking a number of stationary environmental/target features in image sequences. The visual cue comprises either the correspondences among the positions of a number of features over the imaged scene surfaces or the apparent image-to-image variations and optical ow. The key advantage of the computer vision techniques is the fact that they can work well under scenarios where auxiliary positioning

systems perform worse, mainly when the platform undergoes slow motions and drifts. This is due to the fact that these other devices typically estimate motion and (or) position by integrating the vehicles acceleration or velocity, which cannot be determined with accuracy at low speeds. In contrast, visual motion can be determined with the same accuracy, more or less independent of the motion size. For example, if the motion is too small to be reliably detected in a video sequence, every N -th frame may be processed. A positioning method employing visual motion estimation also suffers from the drift problem, becoming signicant over an extended period of operation; particularly where the frame-to-frame motions as the main source of information are integrated to determine the vehicles position and (or) trajectory. However, fusion with auxiliary devices that measure instantaneous orientation (e.g., pitch/roll sensor, magnetometer) can signicantly enhance accuracy [15]. Therefore, visionbased systems can complement, and enhance the performance in an integrated positioning system. Earlier work has primarily explored the near bottom or surface operations, developing the capabilities that enable: 1) mapping of benthic habitats, e.g., reefs as well as shipwrecks, in the form of large-area photo-mosaics; e.g., [9], [11], [14], [17], [18]; 2) autonomous positioning and local navigation for inspection of manmade structures (e.g., ship hulls, bridge pilings, and off-shore oil structures) [16]. Here, the surfaces of the target scene(s) to be imaged, mapped, documented, and (or) inspected offer abundant visual cues for establishing feature tracks and correspondences, the fundamental problem for self-motion detection and estimation. In this paper, we investigate the potential application of well-known visual motion estimation methods in support of operations within the water column. Here, the stationary suspended particles within the water column play the role of points on natural and man-made object surfaces (e.g., surface texture of natural objects, marine growth, structural features markings on ship hulls), and actually can provide much stronger and often ideal visual cues for the application of the vision-based techniques. The advantages come from the fact that 1) these particles are randomly distributed; 2) extend over the entire sphere of viewing directions; 3) rest at a large range of distances from the camera (vehicle).

These advantages can be captured and utilized effectively by one or more cameras that cover a relatively large eld of view, while employing various intelligent image capture strategies to enhance image contrast and information content, and to simplify the fundamental feature matching problem. To identify and discard non-stationary objects from computations, a formulation based on well-known robust estimation methods can be readily employed (e.g., RANSAC [7]). We should emphasize that most, if not all, of the methods and underlying technical material discussed in the paper have been previously applied in some terrestrial-domain application. The main contribution of this paper is to demonstrate (perhaps, in contrast to common belief) that water-column is an environment, very rich in visual motion cues, often more so than near sea oor or surface. Thus, it is very suitable to apply visual motion/stereo methods for 3-D scene analysis and reconstruction, as well as environmental assessment and target/habitat classication based on motion behavior, and the size and distribution of suspended particle. II. T ECHNICAL BACKGROUND The coordinates of a 3-D point P in the optical camera coordinate system at some reference (e.g., initial) position is denoted P = [X, Y, Z]T . For convenience, we often make use of the homogeneous coordinates, represented by P = [X, Y, Z, 1]. The image of P formed by perspective projection model, has coordinate p = (x, y, f ): p= f P Z (1)

The transformation between the coordinate systems at any two viewing positions can be expressed in terms of 3 translation parameters t and 3 rotation parameters, expressed in the form of a 33 rotation matrix R satisfying the orthogonality constraint RRT = RT R = I (I: 33 identity matrix). The point P in the new view has coordinates P : P = RP + t and maps onto new position p po C P; Thus, we obtain x y c P 1 c P 3 c P = 2 c P 3 = (6) C = C[R|t] (5) (4)

where ci (i = 1, . . . , 3) denote the rows of C . The displacement vector v = p p is the image motion of the 3-D point P . For small motions, it can be represented by [5] v = vr + vt = Ar + 1 At t Z (7)

Ar

xy/f (f + x2 /f ) y xy/f x = (f + y 2 /f ) 0 0 0 f = 0 0 0 f 0 x y 0 (8)

where f is the effective focal length of the camera, and Z is the so-called depth of P. Using homogeneous coordinates, it is convenient to express the perspective projection model in the linear form f 0 0 p CP; C = 0 f 0 (2) 0 0 1 where denotes up to scale equality. The real coordinates p (typically in [mm] units) are related to the computer coordinates representing the (column,row) coordinates pc = (c, r, 1) in an image as follows: 0 cx 1/sx 0 1/sy cy p = Mp pc = (3) 0 0 1/f where (sx , sy ) [mm] are the horizontal and vertical pixel sizes, and (cx , cy ) [pix] are the coordinates of the image center. The internal camera parameters (sx , sy , cx , cy , f ) can be determined by calibration, and are necessary to transform from image measurements to the 3-D world measurements. When only the images from an uncalibrated camera are available (e.g., as for the ocean data in our experiment), the 3-D information can be determined up to a projective transformation only.

At

where 3-D vectors = (x , y , z )T and t = (tx , ty , tz )T are the rates of rotational and translational motions, and vr = (vrx , vry , 0)T and vt = (vtx , vty , 0)T are the rotational and translation components of the image motion, respectively. It readily follows that only the translational component vt encodes information (in terms of depth Z) about the scene structure; that is, relative 3-D positions of spatial features. There is no structural cue in the absence of translation. Furthermore, the image motion v remains unchanged if Z and t are scaled by the same constant k. This well-known scale factor ambiguity of monocular vision conrms that translational motion and scene structure can be determined up to a scale only. Without loss of generality, the depth of a particular feature can be xed (as unit length), and then the translational motion and all other feature positions can be expressed in terms of this distance. III. M OTION E STIMATION In monocular vision, the estimation of motion is limited to the direction of translation due to the well-known scale-factor ambiguity. Without loss of generality, we typically determine the direction of translation t = t/abst.

A. Translation Estimation When the camera/vehicle simply translates in the water column, the image motion vectors simplify to f 0 x 1 1 0 f y t v = At vt = (9) Z Z 0 0 0 The displacement vectors intersect at a common point: xf oe f xf oe = yf oe = t tz = 0 (10) tz f The point xf oe is known as the focus of expansion/contraction (FOE/FOC) for forward/backward motion. Given that the camera translation is along the FOE/FOV vector, the estimation of the translation vector reduces to that of locating the FOE [4], [8], [12], [13]. Whether the motion is forward or backward can be readily established by whether the image motion vectors point away from or towards the FOE/FOV. If tz = 0, then the image motion vectors become parallel to the direction (tx , ty , 0), with the FOE/FOC (intersection point) moving towards innity. The process involves rst determining the computer coordinates (rf oe , cf oe ) of the FOE based on the image motion of a few stationary features, transforming to the image coordinates (xf oe , yf oe ) based on camera calibration parameters, and nally establishing the direction of motion from = (xf oe , yf oe , f ). It is noted that we require the camera t internal parameters, or the estimation is limited to the location of FOE/FOC in terms of computer coordinates. It readily follows from (9) that the up-to-scale depth of each feature can be determined from its image displacement. A solution based on least-square formulation is given by [5] Z= vtx (x xf oe ) + vty (y yf oe ) vt 2 (tz = 0) (11)

of three unknown rotational motion component . The image motions at a minimum of two feature points is sufcient to compute the rotation motion. A single point is sufcient if the rotation is limited to pitch and roll of the vehicle/camera. Furthermore, the approximation is not necessary for the motion computation. It simply allows for the observation that the image motion induced by the pitch and roll motion of the camera is roughly constant for features in the central region of the image (where x << f and y << f ). C. Arbitrary Motion The water column, as in deep space imaging [1], [2], is an ideal environment to readily estimate arbitrary camera motions with good accuracy. Here, objects extend over a large depth within the f.o.v., with some points lying at innite distance from the camera (Z ). Examination of (7) reveals that their image motions comprise solely of rotation component. Identifying such points and using their image motions, we rst estimate , compute the rotation eld vr over the entire image and subtract it out, with the remainder giving the displacements due to camera translation. The method in section III-A can then be applied to compute the camera translation. An alternative multi-step strategy can be adopted by noting that, over most of the central part of the image, the pitch and yaw motions x and y , respectively (scaled by f >> x and f >>> y) contribute more signicantly to the image feature displacements than the roll z does; the latter is signicant primarily in the image periphery (larger x and y) . This can be exploited to identify suitable distant features in the central region of the image (undergoing nearly constant image displacements) for the computation of pitch and yaw components. Subtracting out their induced image motion (vrx , vry ) = (f y , f x ), the remaining rotation z can be determined by utilizing distant features in the image periphery. Next, by subtracting the induced image motion (vrx , vry ) = (yz , xz ), we can nally compute the translational components from image motion vectors that intersect at the FOE/FEC (corresponding to stationary scene features). D. Moving Objects Moving targets have image motions than are different, typically both in magnitude and direction, in contrast to stationary objects that are constrained to move towards/away from the FOC/FOE. By computing the camera motion and removing the induced image motion, each independent objects motion can be analyzed. A particular implementation involves the following steps:

Alternatively, we can use Z= (x xf oe )2 + (y yf oe )2 vtx (x xf oe ) + vty (y yf oe ) tz = 0 (12)

For tz = 0, where the FOE is at innity, we simply use the up-to-scale solution for the FOE direction in the image motion equation. B. Pure Rotation The pure rotation of the camera can be readily determined from xy/f (f + x2 /f ) y 0 f y xy/f x f 0 x v= (f + y 2 /f ) 0 0 0 0 0 0 (13) where the approximation is valid for cameras with average eld of view (say up to about 50-60 [deg] in water since x << f and y << f over most of the image, but is not so for larger fovs (e.g., in the periphery for a sh eye lens). The above equation comprise two linear constraints in terms

Camera Motion Estimation 1) Utilize distant features in central image region to estimate pitch and yaw components. Compute and subtract out the image displacement due to these two components.

2) Estimate the roll (rotation about the optical axis) from the distant features in the periphery, and subtract out the contribution to image motion. 3) Compute the translation motion components using a certain number of features with image motion displacements that intersect at a common point (FOE/FOC). Depth Estimation for Stationary Features: Points with image motion magnitude vt below a certain threshold are considered as points at innity. The depths of other stationary features (with image motion vectors vt intersecting at the FOE) are computed from (11). Moving Object Detection: The image motion of each object, if stationary, must go through the FOE. This constraint is violated, when the object has a non-zero motion. Thus, we can identify and segment out these objects, in order to analyze their motion behaviors. At rst glance, it appears that we are haunted by the chicken-and-egg nature of motion estimation (by utilizing stationary objects) based on identifying stationary objects (by discriminating between moving and stationary features). However, the nature of the water-column environment, namely, the presence of stationary targets that are randomly distributed at a large range of distances comes to our rescue. In particular, steps 1 and 2 are critical: locating some (minimum of 2) stationary targets (at innity) with image motions induced by camera rotation. It goes without saying that when these points do not move in the image, then there is no camera rotation. We can classify objects based on their image motions and employ a RANSAC-based implementation: we randomly select 2 points for the two-step process to compute the camera rotation, and do image de-rotation to be left with translationinduced image motions only. If the majority of motion vectors intersect at a common point (FOE/FOC), then the estimated rotation is accurate (and so is the translation motion based on the FOE). Else, we repeat with a different sample. IV. F EATURE M ATCHING The correspondence problem is the primary complexity of feature-based motion and (or) stereo methods for 3-D motion and scene reconstruction. Many existing techniques (e.g., SIFT) incorporate the estimation of a homography (projective transformation), afne or similarity transformations to map points from one view to the next, to conne the search for the correct match to the vicinity of the projected position. To determine the transformation, a robust estimation method is applied. For example, RANSAC uses a random sample from some initial matches, assuming they comprise 50% or more inliers. These transformations are valid for simple motion models (e.g., camera rotation) and (or) points/objects lying roughly on a single plane. With randomly distributed features within the water column at different distances, many such planes can be dened, each passing through a small number of features; however, there is typically no dominant plane. Take any one plane and treating the features (nearly) lying on it as the

inliers for matching. These are signicantly outnumbered by outliers, leading no more than a handful of correct matches. Other complexities arise from signicant variations in feature appearances. In our problem, certain imaging strategies can be applied to simplify the correspondence problem: 1) A camera with small depth of eld is focused at some desired distance d. Thus, only a smaller number of features (namely those at and around distance d) appear in focus, simplifying the matching problem. 2) The camera is set at a low shutter speed (longer exposure time) allowing the feature dynamics to be recorded in one frame. The relatively continuous feature tracks on a blurred out background can be utilized to establish feature matches. Both strategies are also useful in facilitating the identication of distant objects, which enable the estimation of rotational motion. V. E XPERIMENTS We present results of various experiments on two data sets: 1) A calibrated water tank data with small spherical targets of different sizes, showing motion estimation and estimation of camera trajectory; 2) An uncalibrated ocean data, to demonstrate different technical aspects of this contribution. A. Tank Scene We simulate the water column environment with a watertank scene comprising several small spherical balls of different sizes at varying distances from the camera. The data consists of a total of 23 images, taken along 3 parallel tracks. On the rst track the camera moves away from the scene over 8 images. After translating right to the next track (roughly parallel to the scene), the camera then moves forward over the next 8 images. Following a NEdirection motion (forward and right) to reach the next track, the movement is backwards over the last 7 images. The motions are about 5 [in] along the track and 2.5 [in] from one track to the next. As selected results, we have depicted 3 sample images in Fig. 1(a), from the beginning and end of rst track, and the middle of last track. In (a), we have also superimposed in one image the estimated relative depth of the features (in meters) computed from (11) (recall that we can only compute relative depth from monocular motion cues). We have visually conrmed the consistency with the relative spatial arrangements of the features. In (a), we have shown the features (crosses) with their matches from the next frame (circles), which establish the frame-to-frame displacements. The red lines depict the directions of image displacement vectors that (roughly) intersect at a common point; this being near the image center, the camera motion is determined to be mostly in the forward/backward direction. For the cases where the camera moves from one track to the next (middle image), the image motion vectors are (roughly) parallel. The errors are signicant for distant targets with low SNR (small image motion relative to the inaccuracies of feature localization in one of two views). These

1.1
1.1 11 3295.7

1.4
1.41 1.4 14.1 4233.8

1.0
1.0 0.99 1 9.9 2977.9

1.3
1.34 1.3 13.4 4009.1

1.0
1.0 1.02 1 10.2 3045.4

0.9
0.92 0.9 9.2 2764.2

1.9
1.85 1.9 18.5 5550.3

1.1
1.14 1.1 11.4 3406.5

(a)
2 8 1.5 Y[in] 1 7 6 5 4 3 0.5 2 0 0.5 1 1 0 0.5 1 9 10 11 12 13 14 15 16 23 2221 20 19 18 17 X [in] 3 3.5 4

1.5

2.5

(b)
(a) Three selected images with features (+) and match positions (o) from next view. The red lines are the image motion directions, showing intersection at FOE (encircled +). These lines are parallel for side-way motion, where the camera moves form one track to the next. (b) Reconstructed camera trajectory for water tank experiment.
Fig. 1.

errors do not impact the results, since we apply a RANSACbased implementation to reject the outlier image motion lines. For completeness, we have estimated the camera trajectory by integrating the frame-to-frame motions; see (b). As stated, we can determine the translation motion and distances of scene targets up to a scale only (from monocular cues). However, once the initial motion scale is set (here we use OUR knowledge of the motion size for the experiment), the scaling can be established for subsequent images. A small motion in the y direction along each track is not unusual since the estimated camera motions (and track positions) are expressed in the camera coordinate system at the initial position. With a slight camera tilt (downward in this case), the positions along the trajectory will retain a non-zero Y component. Based on the estimated positions, this tilt is roughly tan1 2/35 3 [deg], small enough not to be noticed without precise calibration. VI. O CEAN DATA Fig. 2 shows the results from various experiments with one ocean data. This is recorded by a camera, mounted on the side of a submersible platform as it moves through the water column. Therefore, the dominant camera motion is sideways with some occasional rotational effects due to change in vehicle pitch and heading (yaw). In (a), we have depicted two images that are 9 frames apart,

starting at frame number 600 in the sequence. The circles show some features and diamonds are their positions in the next image (a). The image motions are dened by vectors between them. Next, adding all 11 intermediate frames, as depicted in (a), we can readily recognize the parallel motion tracks of these stationary features, all pointing in the NW direction (dots showing positions of the features from two earlier views mark the start and end of each track). Because image in (a) comprise many parallel tracks, the image gradients are perpendicular to these contours, and the large energy in the gradient direction can be readily observed in the Fourier domain (frequency content of this image). This property is exploited by the Frequency-based methods for optical ow computation; see [3]. The Fourier transform of (a), depicted in (a), conrms this fact. The motion vector directions, and consequently the vehicle motion can be deduced either from (a) and (a), (a), or (a). In (b), we have shown the same results from a different part of the sequence, where the camera motion is more horizontal (image motion vectors are nearly horizontal); frames 1170 to 1180 Comparing the motion in these two cases, (a-a) and (b-b) , we deduce that the vehicle moves at a higher pitch angle in the former case. The image in (c) is the rst view of a sequence dened by frame numbers 1210 and 1216; circles depict features

0 50 100 150 200 250 300 350 400 450 500 100 200 300 400 500 600 700

(a)

(a)

(a)
0 50 100 150 200 250 300 350 400 450 500 100 200

(a)

300

400

500

600

700

(b)

(b)

(b)

(b)

5 4 2 3 1 6

(c)

(c)

5 4 2 3 1 6

5 4 2 3 1 6

5 4 2 3 1 6

(d)
Fig. 2.

(d)

(d)

Various motion computation results. (a-a) pure camera translation with various stationary features. (b-b) pure camera translation with 4 stationary and 3 moving targets. (c,c) features in one frame (circle) and matches from the other frame (dot) during translation and rotation camera motion, with 8 integrated in-between frames. (d,d) Both views with features in each frame (circle) and the other frame (dot), with (d) also showing the image motion vectors. (d) rotation compensated view, with new feature positions (circle), matches (dot), and image motion lines intersecting at nite FOE. Features 1 and 2 are distant points used for estimation of the rotation (inducing a displacement of roughly (-29,-5) pixels).

in each view, while dots are the matches from the other view. The vehicle undergoes rolling motion while changing heading to make a turn. This induces camera pitch and yaw

motions (vehicle roll becomes camera pitch, and heading change becomes the camera yaw motion). The integration of every frame from 1210 to 1216 in (c), showing the feature

(a)

(a)

(a)

(b)
Fig. 3.

(b)

(b)

Another sequence (a,a), with image displacements comprising rotation component of roughly (-25,-5) pixels (b), the displacements after rotation compensation (b).
0

20

40

60

80

100

120

140

160

180 50 100 150 200

(a)

(a)

(a)
0 20

(a)

40

60

80

100

120

140

160

180 50 100 150 200

(b)

(b)

(b)
0 20

(b)

40

60

80

100

120

140

160

180 50 100 150 200

(c)
Fig. 4.

(c)

(c)

(c)

Various water-column objects and tracks depicting their speed and motion behavior.

tracks, depicts the rotational motion. With motion comprising both rotation and translation, the image motion vectors no longer intersect at a common point (the FOE) as depicted in (d). Using distant points 1 and 2, we have estimated the image motion of (-29,-5) pixels, due to the rotation. After rotation compensation, we construct the image in (d), which is simply translated relative to the rst view. The image motions now intersect at a nite FOE. Note that points 1 and 2 become stationary, after rotation stabilization. Fig. 3 is another example from nearby frames (1200 and 1205), with image motions containing both rotational and translational components; see (a). The squares show positions of each feature, the dots are the matches from the other view, and the dashed lines are the motion vectors. The integrated image, including all 6 frames from the sequence in (b), depicts the feature tracks. We have marked 3 selected points at innity in the central region of the image, where the image motion is dominantly induced by the pitch and yaw camera motions. From these points, we estimate the induced image displacement of roughly (-25,-5) pixels. After compensating for the estimated rotation, the new image in (a) depicts parallel image motions intersecting at innity (corresponding to camera motion parallel to the scene); this view is obtained by shifting the entire image by the constant image motion due to rotation. Distant features shown by cyan diamonds in (a) now have nearly zero motion, after rotation compensation. In Fig. 4, we have depicted 3 integrated view starting from frames 129 (80 frames), 370 (55), and 1362 (10 frames) of the sequence. From the Fourier transform (large central portion depicted to visually exaggerate the central portion), one can still deduce the average translation direction of the vehicle (which is perpendicular to the oriented yellow blob near the center). Furthermore, the motion tracks of the different watercolumn habitats provide rich cues about their motion behaviors and speeds. VII. S UMMARY Environmental features, namely suspended particles and habitats, can be exploited for vision-based positioning and navigation within water column. While any imaging system with sensitivity to emitted energy from particular suspended particles can be employed, the visual motion cues in optical images have been studied here. We have aimed to demonstrate that such environments can be more ideal for motion vision techniques, compared to most near bottom or surface applications, which have been previously explored. The development of a robust vision system can benet from effective use of active imaging, including lighting and imaging sensor arrays that are tuned to specic visual motion cues, e.g., focus at different ranges, shutter speed variation, directional lighting, etc. We are currently investigating a particular design for real-time processing. Furthermore, while we have focused on optical imaging in the visible band, same advantages extend to other imaging devices that can record any form of emitted energy from the natural water-column particles.

ACKNOWLEDGMENT We are deeply grateful to Drs. Scott Reed and Ioseba Tena from SeaByte Ltd, Edinburgh, Scotland (UK) who provided the ocean data. Murat D. Aykin is a rst year Ph.D. student supported by the Department of Electrical and Computer Engineering at the University of Miami. Mohammadreza Babaee is a M.S. student at the Technical Univ. of Munich, carrying out his dissertation at the Underwater Vision and Imaging Lab (UVIL), University of Miami. He is funded in part by a research Grant No. 2006384 from the US-Israel Binational Science Foundation (BSF). Shayanth Sinnarajah, who completed his junior year at Gulliver Preparatory in June11, is completing his third academic year of research internship at UVIL. Alejandro Perez, who graduated from Gulliver Preparatory in June11 (attending Harvard University in Fall11) has done roughly one month of summer internship at UVIL during last 3 summers. R EFERENCES
[1] http:/www.spacetelescope.org/news/heic0701/ [2] http:/www.dailygalaxy.com/my weblog/2009/04/78-billiona-h.html [3] Beauchemin, S.S., Barron, J.L., The computation of optical ow, ACM Computmg Surveys, 27(3), September, 1995. [4] A. Branca, E. Stella, G. Attolico, and A. Distante, Focus on expansion estimation by an error backpropagation neural network, Neural Computing and Applications, pp.142-147, 1997. [5] Bruss, A.R., Horn, B.K.P., Passive navigation, Computer Vision, Graphics, and Image Processing (CVGIP), 21(1), pp. 3-20, 1983. [6] Gracias, N., M. Mahoor, S. Negahdaripour, A. Gleason, Fast image blending using watersheds and graph cuts, Image and Vision Computing, Vol 27(5): pp. 597-607, 2009. [7] Fischler, M.A., Bolles, R.C., Random sample consensus: A paradigm for model tting with applications to image analysis and automated cartography, it Comm. of the ACM, vol 24:381395, June, 1981. [8] Jain, Ramesh; , Direct computation of the focus of expansion, IEEE T. Pattern Analysis Machine Intelligence, 5(1), pp.58-64, January, 1983. [9] Ludvigsen, M. and B. Sortland, G. Johnsen, H. Singh, Applications of geo-referenced underwater photo mosaics in marine biology and archeology, Oceonography, Vol 20(4), pp. 140-149, 2007. [10] Marks, R.L., H.H. Wang, M.J. Lee, S.M. Rock, Automatic visual station keeping of an underwater robot, Proc. IEEE Oceans 94, 1994. [11] Marks, R.L., S.M. Rock, and M.J. Lee, Real-time video mosaicking of the ocean oor, IEEE J. Oceanic Engineering, Vol 20(3), pp. 229-241, 1995. [12] Negahdaripour, S., Horn, B.K.P., A direct method for locating the focus of expansion, Comp. Vision, Graphics, Image Processing, 46(3), pp. 303326, June, 1989. [13] S. Negahdaripour, Direct computation of FOE with condence measures, Computer Vision Image Understanding, Vol 64 (3), pp. 323-350, November, 1996. [14] Negahdaripour, S., X. Xu, and A. Khamene, A vision system for realtime positioning, navigation and video mosaicing of sea oor imagery in the application of ROVs/AUVs, IEEE Workshop on Appl. Computer Vision, pp. 248-249, 1998. [15] Negahdaripour, S., C. Barufaldi, and A. Khamene, Integrated system for robust 6 DOF positioning utilizing new closed-form visual motion estimation methods for planar terrains, it IEEE J. Oceanic Engineering, Vol 31(3), pp. 462-469, July, 2006. [16] Negahdaripour, S., and P. Firoozfam, An ROV stereovision system for ship hull inspection, IEEE J. Oceanic Engineering, Vol 31(3), pp. 551564, July, 2006. [17] Nicosevici, N. Gracias, S. Negahdaripour, R. Garcia, Efcient 3D Scene Modeling and Mosaicing, J. Field Robotics, Vol 26(10), pp. 757-862, October 2009. [18] Rzhanov, Y., L.M. Linnett, and R. Forbes, Underwater video mosaicing for seabed mapping, Int. Conference on Image Processing, 2000.