You are on page 1of 5

Jellyfish Detection Based on K-FOE

Residual Map and Ring Segmentation



Xiufen Wang , Huiyuan Wang, Song Wang
School of Information Science and Engineering,
Shandong University, Jinan,
Shandong 250100, China


Abstract-Target detection from underwater videos is a hot and
difficult area in computer vision, especially when the camera has
ego-motion. A jellyfish detection system is proposed for
processing video streams captured by moving cameras mounted
on remotely operated vehicles (ROVs). The background motion
vector convergence point, or focus of expansion (FOE) is first
found by solving equations with optical flows and then modified
by Kalman filter prediction (K-FOE). Object templates are
initialized by binarizing the K-FOE residual map. Subsequently, a
ring segmentation subsystem is used to update the primitive
object mask according to the distance between the object mask
center and the K-FOE. All objects are extracted after their
updated masks are obtained. Experimental results on real video
data show that the proposed system can not only reduce fault
detection but also extract small size jellyfish objects well.
Keywords-Foreground detection; Kalman filter; Focus of
Expansion; Jellyfish detection
I. INTRODUCTION
The study of underwater targets plays an important role in
biological research, national defense and marine development.
In modern society, people study all kinds of animals in ocean
mostly through remotely operated vehicles (ROVs) rather than
traditional tow net approach. Manually processing these videos
tends to take several months or years. Therefore, detecting,
tracking and classifying the targets in underwater videos by
means of computers is a natural choice and also one of the key
topics in computer vision. As a typical application, jellyfish
detection will be our main focus in this paper.
Object detection is an essential task in surveillance video
analysis in computer vision. It is also a hard problem for
underwater videos captured by the sensors on the ocean-going
ROV where the cameras have ego-motion. Recently, some
researchers have made great effort to tackle this problem in the
study of autonomous underwater vehicles (AUV)
[1]
and deep-
ocean tiny animals such as jellyfish
[2-5]
.
Moving object detection or namely foreground detection
algorithms can be broadly categorized into static and dynamic
background ones. The videos to be studied here were captured
by cameras that only have forward translation motion. In this
case, an important point, focus of expansion (FOE), exists, to
which, most motion vectors of the image sequence appear to
diverge. It carries important motion information and can be
used to detect objects. The authors of [6] presented a moving
obstacle detection approach using FOE and its residual error.
Those of [7] separated independently moving objects (IMOs)
from moving background caused by camera motion. So FOE
can be used to segment and classify moving objects from the
background. In this paper, a novel approach to estimate FOE
with Kalman filter prediction named K-FOE is proposed.
Meanwhile, a jellyfish detection system combining K-FOE
residual map and ring segmentation is presented. We simulated
the system in Visual Studio 2005 with OpenCV 1.0 and on
320240 videos. The diagram of our approach is shown in Fig.
1.

II. OBJECT MASK
Fig. 2 shows how to find the object template. FOE is first
estimated by computing the optical flow and solving the
equations containing the information of FOE by singular value
decomposition (SVD) as in [7]. Then Kalman filter is applied
to predict its more accurate location. A residual error map of
the accurate FOE is obtained next. At last, the initial template
of the jellyfish object is acquired by binarizing the map.

A. The Definition of FOE
FOE is a point from which all optical flow vectors appear to
emanate in a static scene where the observer is moving.
Usually FOE is near the center of the image when the camera
system is moving directly forwards. However, FOE may not
always lie inside the image boundary because the lens is tilted.
The FOE of a continuous frame can be the representative of the
general trend of camera movement. Setting the 3-D coordinate
system in the way that the Z axis is parallel to the optical axis
of the camera and the X-Y axes are parallel to the image plane,
the FOE is shown in Fig. 3.

Object mask
Current
frame
Segmentation
system
Detection
result
Input video

Figure 1 The block diagram of the proposed approach

___________________________________
978-1-61284-307-0/11/$26.00 2011 IEEE
Optical flow
Optical flow
filter
Sovel equation
Initial
estimated FOE
Input video
Accurate
FOE
Residual map
Initial object
mask
Kalman Filter
prediction

Figure 2 The block diagram of object mask

x
o
y
z
x
y
v

Figure 3 Optical RZYHFWRUVFRQYHUJHWRWKH)2(

The point FOE 0, 0 ( ) x y is deQHG DV IROORZV LQ FRPSXWHU
vision:

0
0

X
Z
Y
Z
fT
x
T
fT
y
T
=
=
(1)
where X T , Y T and Z T are the 3D translational components
and f the focal length of the camera.
For a point ( ) , , P X Y Z in this 3-D space, its image
projection is ( ) , p x y . The motion vectors , x y v v of ( ) , p x y
in the motion HOGDUHGHULYHGDVIROORZVXQGHUDSXUHFDPHUD
translation:
Z X
x
Z Y
y
T x T f
v
Z
T y T f
v
Z

=
(2)
Substituting Eq.(1) into Eq.(2), then Eq.(2) becomes:
0
0
( )
( )
Z
x
Z
y
T
v x x
Z
T
v y y
Z
=
=
(3)
From Eq.(3), we have:
0 0 0 y x x y v x v y v y v x + = (4)
For the whole image:
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
0
0
1 1
0
1
0
1 1 1
-
-
y x y x
y x y x
y x y x
yk xk yk k xk k
yk yk yk k xk k
yN xN yN N xN N
v v v x v y
v v v x v y
v v v x v y
x
A
v v y v x v y
v v v x v y
v v v x v y
x
y
+ + + + + +
| | | |
| |

| |
| |
| |
| | | |
| |
=
|
| |

\ . \
| |

| |
| |
| |
| |

\ . \ .
# # #

# # #
=
|
.
b

, (5.a)
where
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
1 1 1 1 1 1
,
y x y x
y x y x
y x y x
yk xk yk k xk k
yk yk yk k xk k
yN xN yN N xN N
v v v x v y
v v v x v y
v v v x v y
v v v x v y
v v v x v y
v v v x y
A
v
+ + + + + +


| | | |
| |
| |
| |
| |
| |
= =
|


|
| |
| |
| |
| |
| |
\ . \ .

b
# # #

# # #
,
which is a linear system with two unknown parameters of the
FOE 0, 0 ( ) x y .

B. Optical Flow Computation
As mentioned above, optical flows for consecutive frames
are indispensable to compute FOE. Two classical optical flow
algorithms, Lucas-Kanade
[8]
and Horn-Schunck
[9]
, have been
widely used to find an approximation of the motion field in the
image. Sparse optical flow used in Lucas-Kanade algorithm
seems very advantageous as pointed out in [7]. However, an
essential drawback of this optical flow lies in its low accuracy
for low-textured regions. So Horn-Schunck algorithm is
selected here for succeeding processing because it introduces a
regularization term based on optical flow constraint equations
and global smoothness in motion field. FOEs calculated by
sparse and dense optical flow are compared in Fig. 4. In order
to view easily we draw the point as a circle of which the center
is FOE and the coordinates are put beside it.
Fig. 4 is one of continuous frames, it is observed that the
FOE from dense optical flow (196,128) is more accurate and
steady than that from sparse optical flow (191,150) when the
camera is moving invariably.
However, it is observed that the movements of some points
do not represent the actual camera motion. For example, the
optical flow that is very small or very large in either x-axis or
y-axis, or even in both x-axis and y-axis will departure from
convergence. Thus these points should be abandoned. So a
range [nmin , nmax] is set to filter these optical flows, which is
assigned empirically [0.5, 2.0] in our experiments. Only the
optical flow that is within this range can be reserved for
computing the residual map. Those out of this range will be
eliminated and set to 0. For example, if 2, 2 ( ) x y and
1, 1 ( ) k k x y + + are to be removed, the corresponding elements in
A

and b

are then set to 0 as shown in the following


equations.


1 1 1 1 1 1
3 3 3 3 3 3
0
0
0
0
0 0 0
0 0 0
y x y x
y x y x
yk xk yk k xk k
yN xN yN N xN N
v v v x v y
v v v x v y
x
A
v v y v x v y
v v v x v y
x
y
| | | |
| |
| |
| |
| |
| | | |
| |
= =
| |
| |

\ . \ .
| |
| |
| |
| |
| |

\ . \ .
b
# # #
# # #
, (5.b)

where
1 1 1 1 1 1
3 3 3 3 3 3
0 0 0
0 0
,
0
y x y x
y x y x
yk xk yk k xk k
yN xN yN N xN N
v v v x v y
v v v x v y
v v v x v
A
y
v v v x v y




| | | |
| |
| |
| |
| |
| |
= =
| |
| |
| |
| |
| |
| |
\ . \ .
b
# # #
# # #
.

C. Solving the Equations
Once we have the proper set of optical flow, the next step is
to solve Eqs.(5) for the rough estimation of FOE. It can be
considered as a linear problem and we choose SVD to solve
Eqs.(5) due to its robustness to roundoff errors and the reliable
performance as in [7]. Solving the equations by SVD leads to
mathematical optimization under the least-squares criterion.
This is achieved by minimizing the square error and finding the
best matched function of the data.

(a) Dense optical flow

(b) Sparse optical flow
Figure 4 Dense and sparse optical flow and FOE

D. Prediction by Kalman Filtering
The estimated FOE is still not so accurate and the location of
FOE changes a lot between consecutive frames. That is
because our modeling may not be able to match the camera
motion perfectly. So some correction is required. Some statistic
filters can be applied to predict system states and correct
deviation. Kalman filter
[10]
, as a recursive solution to the
discrete-data, is selected to modify the system.
The current state of the Kalman filter is the location of the
rough estimation of FOE. The measurement and prediction
locations of 0, 0 ( )
t
x y in FOE are two dimensional with no
external control. The process noise and measurement noise are
assumed to follow normal distributions. Thus the initial
parameters for the Kalman filter are set as follows (where A
K
is
the transition matrix, B
K
the control matrix, H
K
the
measurement matrix, Q
K
covariance matrix of the process
noise and R
K
the covariance matrix of the measurement noise):
1 0 1 0
, 0, ,
0 1 0 1
1 0 10 0
, ,
0 1 0 10
K K K
K K
A B H
Q R
| | | |
= = =
| |
\ . \ .
| | | |
= =
| |
\ . \ .

where Q
K
and R
K
are set based on experiment.

E. Initial Object Mask
0, 0 ( ) x y is found from (5.b) in which Aand bare already
known. Substituting it into (5.a) and multiplying the original
optical flow of the whole image A

will find b

. Since some
optical flows are abandoned as mentioned above, there
exists A A =

. So b

and b are different and the error


represents the difference between the background and the
moving targets. Furthermore, the residual map can be obtained
by
Residual_map = b b

. (6)
A linear scaling process can be used to normalize the
residual map into [0, 255] to resemble an 8-bit gray scale
image. Next, a linear contrast enhancement step further
enhances the contrast of the residual image and this is followed
by image thresholding. Morphological close and open 3h3
operators are then applied to fuse small breaks and exclude
small holes in the binarized residual image. Thus the initial
object mask is finally formed.

III. OBJECT SEGMENTATION
At this step, the binary target masks are operated by AND
with the 8-bit gray scale image of the current frame first. Then
we draw the targets with a 2- D rotary box, whose size, centre
and angular average gray-value represent the target. Different
distances from the target center to the FOE will produce
different intensities. Different rings with FOE as the same
center are drawn with different radius. Thus different annular
regions are generated and then ring segmentation is
implemented (Fig. 5(a)) because jellyfish in underwater images
are often brighter near the center than at the edges. Whether or

not the object emerges is dependent on the threshold, which is


used to update the object mask.
Three parameters for all objects are reserved: C_point,
center of the object mask; A_value, corresponding average
intensity of the current gray frame; Distance, the distance from
the object center to the FOE. These parameters determine the
region that an object belongs to. The pseudo-codes for
updating one object mask are illustrated in Fig. 5(b) and a
single target is extracted as shown in Fig. 5(c).

IV. EXPERIMENTAL RESULTS
There are some experimental results (Fig. 6). The images
from the upper left to the lower right are the original color
frame, the gray one, the residual map, the object mask, the
optical flow image with FOE point before (red) and after
(green) Kalman prediction, the segmentation regions, the new
object mask and the detection result.


(a) Segmentation regions

(b) The pseudo-code of updating one mask

(c) Block diagram of object mask updating
Figure 5 Object segmentation


(a) Color frame


(b) Gray frame


(c) Residual map


(d) Original object mask


(e) Optical flow and FOE


(f) Distance to FOE


(g) Updated object mask


(h) Detection result
Figure 6 Experimental results

In Fig. 6, the locations of FOE before and after kalman
prediction are (195,129) and (196,128) respectively. There is
only a long jellyfish in the scene, which is extracted perfectly
although two object masks are generated due to the noise
contained in the initial binary image of the residual map. The
mask at (220,88) is in region I, and its A_value is less than
threshold1, so in the updated mask it is removed. Thus the true
object is finally found.

V. CONCLUSIONS
This paper has presented a moving object detection
algorithm for underwater videos based on K-FOE residual map
and object mask updating with ring segmentation. The
experimental results on real videos showed that: the FOE
estimation is accurate when the camera has only pure
translation movement; the algorithm performed well with small
long jellyfish in low clarity against the complicated sea
environment. However, this may limit the use of the algorithm
on general cases: it can only detect the small animals with low
clarity but not submerged in noise of particles in water when
there is no insufficient optical density; there is no comparison
to other methods in this underwater circumstance. Future work
should focus on approaches to automatically set the thresholds
based on the image set to be analyzed.

ACKNOWLEDGMENT
This work was supported in part by the National Natural
Science Foundation of China under grant no. 60872119 and the
Natural Science Foundation of Shandong Province under grant
no.2009ZRB01675.

REFERENCES
[1] S Williams, O Pizarro, M Jakuba, N Barrett. AUV Benthic Habitat
Mapping in South Eastern Tasmania. In: Proceedings of the Field and
Service Robotics 7, Springer 2010, pp. 275284.
[2] J. Rife and S. M. Rock, Segmentation methods for visual tracking of
deep-ocean jellyVK XVLQJ D FRQYHQWLRQDO FDPHUD ,((( -RXUQDO RI
Oceanic Engineering, vol.28, no.4, pp. 595608, October 2003.
[3] A M. Plotnik and S M. Rock, Improving Performance of a Jelly-Tracking
Underwater Vehicle Using Recognition of Animal Motion Modes. In
Proceedings of the Unmanned Untethered Submersible Technology
Conference (UUST), Durham, NH, August 2003. AUSI
[4] D. Walther, D R. Edgington and C. Koch.Detection and tracking of objects
in underwater video[C]. IEEE International Conference on Computer
Vision and Pattern Recognition(CVPR).Washington D C:IEEE Computer
Society, vol. 1, July 2004. pp: 61-73.
[5] L Havasi, Z Szlvik, T Szirnyi. The Use of Vanishing Point for the
ClassiFDWLRQ RI 5HHFWLRQV )URP )RUHJURXQG 0DVNLQ 9LGHRV ,(((
Transactions on Image processing, vol.18, no.6, June 2009. 1366-1372.
[6] N. Takeda, M. Watanabe, and K. Onoguchi. Moving obstacle detection
using residual error of FOE estimation. In Proc. IEEE International
Conference on Intelligent Robots and Systems, vol. 4, 1996. pp: 1642
1647.
[7] Y Zhang, S J Kiselewich, W A. Bauson, and R Hammoud, Robust Moving
Object Detection at Distance in the Visible Spectrum and Beyond Using
a Moving Camera, CVPRW2006,131-138.
[8] B D Lucas and T Kanade, An iterative image registration technique with an
application to stereo vision. Proceedings of Imaging Understanding
Workshop, pp 121-130, 1981.
[9] B K P Horn, B G Schunck . Determining optical flow[J]. Artificial
Intelligence, 1981, 17 (1-3) : 185-204.
[10] R E Kalman. A New Approach to Linear Filtering and Prediction
Problems, Transactions of the ASME -Journal of Basic Engineering, Vol.
82, pp: 35-45, 1960.

You might also like