You are on page 1of 4

2015 International Conference on Computational Intelligence and Communication Networks

Automatic Fish Recognition and Counting in Video Footage of Fishery


Operations

Suhuai Luo Xuechen Li Dadong Wang Jiaming Li Changming Sun


School of Design, Communication and IT, CSIRO Digital Productivity Flagship, NSW, Australia
University of Newcastle, NSW, Australia Dadong.Wang@csiro.au, Jiaming.Li@csiro.au,
Suhuai.Luo@newcastle.edu.au, Changming.Sun@csiro.au
Xuechen.Li@uon.edu.au

Abstract—This paper presents an accurate and automatic proposed a classification tree based fish segmentation and
algorithm to recognize and count fish in the video footages of recognition method to extract fish from complicated
fishery operations. The unique character of the approach is background and distinguish their species. Sixty-six types of
that it combines machine learning techniques with statistical features were extracted. These features are a combination of
methods to fully make use the benefits of these algorithms. The colour, shape and texture properties in different parts of the
approach consists of three major stages including video data fish such as tail, head, top, bottom and the whole fish.
preparation such as noise deduction, preliminary fish Ravanbakhsh et al. [8] proposed a level set based fish
recognition with artificial neural network to classify image segmentation method. It combined level set and principal
areas into either fish or non-fish, and fine fish recognition and
component analysis (PCA), overcoming the weakness that
counting with statistical shape models. Experiment results of
level set method itself does not have shape prior. Yao et al.
tuna recognition and counting using the proposed method are
presented with performance validation and discussion. [9] proposed a new fish image segmentation method which is
the combination of the K-means clustering segmentation
Keywords-fish recognition; fish counting; machine learning; algorithm and mathematical morphology. The algorithm
statistical shape models realized the separation between the fish image and the
background in the condition of complex backgrounds.
I. INTRODUCTION Despite such large efforts on fish recognition, the
Active research has been conducted recently on performance of the state-of-the-art multimedia analysis
automatic fish recognition, segmentation and counting. In techniques on such a task is still far from meeting the real
2013, CSIRO Australia organized a workshop on fisheries world’s requirements in terms of identifying fish. Automatic
and environmental monitoring [1] which attracted extraction of fish from the video footage is challenging
contributions from Australia’s leading researchers and because of the following reasons. Firstly, the month-long
organizations such as the Wealth from Oceans Flagship of videos were acquired in an outdoor environment, including
CSIRO, leading Australian universities and the Australian both day and night times, bright and cloudy days. Therefore,
Government’s National Environmental Research Program. A the video background is very complicated. Secondly, the
competition of fish recognition is being organized by world shape of the captured fish is changing, depending on the
leading researchers [2] in seeking the solution to fish position and angle the video is captured. Thirdly, the texture
recognition. Evans [3] developed a fish recognition method and intensity distribution of fish is similar to those of the
based on an EM algorithm. It can recognize multiple fish in a background, causing high rate of false positive recognition.
single image. Spampinato et al. [4] proposed a rule based Fourthly, the video scene contains multiple objects, making
fish detection and tracking method. By setting rules the recognition more difficult. For example, in Fig. 1, the
manually, the system can filter the targets that meet the rules tuna is partially exposed and it needs to be separated from
automatically and track these objects in video. Hsieh et al. other objects including the deck, boxes, and human.
[5] proposed a fish measurement method to measure the To tackle the difficulties of fish segmentation, we
length of tuna. They employed Hough transform and propose an automatic fish recognition algorithm by
projective transform to conduct line detection in automatic combining machine learning techniques with statistical
mode and to correct projective distortion of the fish images. methods. The approach consists of three major stages,
Harvey et al. [6] developed a system to measure the length of including video data preparation (e.g., removing unrelated
tuna underwater. Two cameras were used to calibrate the frames) and noise deduction, preliminary fish recognition
length of fish on screen and its real length. Huang et al. [7]

978-1-5090-0076-0/15 $31.00 © 2015 IEEE 296


DOI 10.1109/CICN.2015.66
with artificial neural network (ANN) classifier [10], and fine original RGB color space. An artificial neural network (ANN)
fish recognition with statistical shape models (SSM) [11,12]. [10] is trained and used as the classifier to make a judgement
The rest of this paper is organized as follows. Section 2 of whether a pixel belongs to a fish or not.
describes the proposed method in detail; Section 3 gives the An ANN is one of the most widely used classifiers in
details of the experiments of applying the method to image recognition. The advantage of ANN is that it is a
recognize and count tunas in the video, including the highly nonlinear system. Therefore it is highly flexible and
experiment setting, performance validation and discussion; self-adaptive. It is more suitable to process massive original
followed by conclusions and discussions of future work in data that cannot be described by rules and equations
Section 4. comparing to other classical methods.
An Error Back Propagating (BP) ANN classifier is
II. METHOD employed to identify the fish from the background. The
The method consists of four parts: pre-processing the ANN has 3 layers. The first layer has 6 neurons, the second
video frames; machine learning based fish recognition; layer has 3 neurons, and the output layer has 1 neuron. The
statistical shape model based fish identification; and rule training data is selected in a sample video footage which
based fish counting. contains all kinds of fish that appear in the video. Label 1 is
given to fish and -1 is given to deck or other objects on the
A. Pre-processing deck.
The video data is acquired by an outdoor shipborne video The outputs of the ANN classifier indicate the possibility
system in an uncontrolled illumination environment. of pixels belonging to a fish. Fig. 2(a) and Fig. 2(b) show an
Therefore, the video quality is not ideal. The main problems example of input and output images of an ANN classifier. A
are 1) the resolution of video is low with shadows and a time threshold of 0 is used to select fish area as shown in Fig. 2(c).
and weather varied background; 2) the visual angle is not Fig. 2(d) shows the large connected objects which are
perpendicular; and 3) fish is often occluded by people that discussed in Section C.
walk around the deck.
The first two problems have significantly increased the
difficulties in identifiying the fish and can only be mitigated
by adjusting the video system and video acquisition
conditions. The impact of occlusion problem can be reduced
by using pre-processing methods.
Fig. 1 illustrates a scene of fishman occluding the fish on
the deck. Considering that the movement of the fishman is
(a) (b)
faster than that of the fish, the frames are smoothed by
averaging neighboring pixels to counteract the effect of
motion. In our approach, image smoothing operation is
applied on every points of the image to counteract the effect
of motion.

(c) (d)
Fig. 2 The input and output of an ANN classifier and fish recognition
result

C. Statistical shape model based fish identification


A statistical shape model (SSM) [12] is trained to
distinguish the real fish from other objects with the similar
color of fish.
An object can be described by  landmark points. They
are manually determined in a set of  training images. From
Fig. 1 An example of complicated fishboat deck including fish, deck,
objects, and human these collections of landmark points, a point distribution
model is constructed as follows. The landmark points
( ,  ), ⋯ , ( ,  ) are stacked in shape vectors
B. Machine learning based fish recognition
⃑ = ( ,  , ⋯ ,  ,  ) ∈ [1, ] (1)
The fish can be recognized by using machine learning ⃑
where is the shape of each object in the training set. PCA
algorithms. The colors of images are extracted as features for
is applied to the shape vectors ⃑ to extract the mean shape
classification. The colors of fish are variable. Taking tuna as
an example, the colors could be black, blue, silver, or yellow. ̅ , eigenvectors ⃑ which control the deformation of the
To represent these various colors consistently, HLS (hueǃ shape model, and the eigenvalue ⃑. The mean shape can be
lightnessǃsaturation) color space is employed instead of the described as

297

1
̅ =  ⃑ (2)


The shape in the training set can be approximated using the
mean shape and a weighted sum of these deviations
obtained from the first t modes:
⃑ = ̅ + ⃑ ∙ ⃑ (3)
and ⃑ = ⃑  ⃑ − ̅ (4)
where ⃑ = (  ⋯  ) is the matrix of the first t
eigenvectors, and ⃑ = (  ⋯  ) is a vector of weights.
Fig. 4 Fit result of SSM
The above equation allows us to generate new examples of
the shapes by varying the parameters  (j=1,2…t) within D. Rule based fish counting
suitable limits, so the new shape will be similar to those in After recognition, the number of fish can be counted. We
the training set. Since the variance of  over the training set create a rule to count the number of fish in video.
can be shown as  , the suitable limits are typically There are 6 cases in the fish recognition result.
determined as x No fish or fish-like objects in one frame;
−3 ≤  ≤ 3 (5) x One fish in one frame;
x Two fish in one frame;
since most of the samples lie within three standard
x One fish and one fish-like object in one frame;
deviations of the mean.
To cover 99% of the shapes in the training set, 4 principal x One fish-like object in one frame;
components are kept. Fig. 3 shows the average fish shape x Two fish-like objects in one frame.
We used neighboring frames to remove fish-like non-fish
(blue points) and its deformable limitation (red points) by
objects. In most cases, fish-like non-fish objects only appear
adjusting each principal component. in less than 3 neighboring frames at the same position. Real
100 100 fish will appear in more than 3 neighboring frames at the
same position. We use this rule to remove false positive
50 50
objects. After that, only the first 3 cases are left. We give
0 0 value 0, 1 and 2 to each case respectively.
-50 -50
The counting rules are defined as follow:
If there are more than 3 consecutive “1”, single fish
-100
-100 -50 0 50 100
-100
-100 -50 0 50 100
counting begins;
If there are more than 4 consecutive “0”, single fish
100 100 counting finishes, and fish number increases by 1;
If there are more than 3 consecutive “2” after fish
50 50
counting begins, double fish counting begins;
0 0 If there are more than 4 consecutive “1” or “0” after
-50 -50
double fish counting finishes, fish number increases by 1.
-100
-100 -50 0 50 100
-100
-100 -50 0 50 100
III. EXPERIMENT RESULTS AND DISCUSSION
The resolution of the video we used to test our
Fig. 3 Statistical shape model of fish recognition method is 640×480. The length is 1 hour. There
Fig. 2(d) shows the large connected objects in the deck are 7 fishes in the video, including tuna, swordfish and mahi-
area of fish recognition result obtained from the ANN mahi. The fish recognition result is shown in Fig. 5. The blue
classifier. There are two objects: one is fish and the other is line is the recognition result and red line is ground truth. The
some traces of human. Statistical shape model can be used to value of lines is the number of fish in the frames. The
identify the fish from other objects. Fig. 4 shows the recognition accuracy is 89.6%. Fig. 6 shows some of fish
application of SSM. When the model is applied on fish, the recognition results. It can be seen that fish with different
model can fit the shape after a number of iterations. To colors and shapes on deck are recognized correctly.
obtain robust results, we define that if 70% of the object and The counting result is 9. The first 2 fish are counted
fit result are overlapped, the object will be identified as a fish. incorrectly and the 6th fish is missed. The reason is that in
many frames, people occlude most part of fish and for a very
long time. The track of single fish will be broken in this case
(see Fig. 7(a)). Therefore, the first two fish cannot be
counted correctly. The 6th fish is small and very dark. The
color of it is very similar to the shadows on the deck.
Therefore it failed to be recognized (see Fig. 7(b)).

298
IV. CONCLUSION AND FUTURE WORK
In this paper, the proposed automatic fish recognition and
counting methods are presented. Machine learning and
statistical shape model methods are employed to recognize
fish from the deck in a video footage of fishery operations.
The recognition result is accurate when the fish appears in
full, or even when it is partially occluded. A rule based
counting method is developed to count the number of fish
presented in the video. The counting result is promising.
There are still lots of work to be done to improve the
method. For example, more features can be used to describe
Fig. 5 Fish recognition result and ground truth
and identify fish from the deck and other objects, and stereo
vision based methods can be explored by using more than
one video camera for recording. These will mitigate the
occlusion problems and increase the recognition accuracy.
REFERENCE
[1] D. Wang, K. R. Hayes, and L. Bischof, Workshop on Acoustics and
Automated Video Processing for Fisheries and Environmental
Monitoring, July 2013, CSIRO Report EP137390.
[2] http://www.lifeclef.org/
[3] F. H. Evans, "Detecting fish in underwater video using the EM
algorithm," presented at the Image Processing, ICIP 2003,
International Conference on. IEEE, 2003.
[4] C. Spampinato, E. Beauxis-Aussalet, S. Palazzo, C. Beyan, J. van
Ossenbruggen, J. He, B. Boom, and X. Huang, "A rule-based event
detection system for real-life underwater domain," Machine Vision
and Applications, vol. 25, pp. 99-117, 2013.
[5] C.-L. Hsieh, H.-Y. Chang, F.-H. Chen, J.-H. Liou, S.-K. Chang, and
T.-T. Lin, "A simple and effective digital imaging approach for tuna
fish length measurement compatible with fishing operations,"
Computers and Electronics in Agriculture, vol. 75, pp. 44-51, 2011.
[6] E. Harvey, M. Cappo, M. Shortis, S. Robson, J. Buchanan, and P.
Speare, "The accuracy and precision of underwater measurements of
length and maximum body depth of southern bluefin tuna (Thunnus
maccoyii) with a stereo–video camera system," Fisheries Research,
vol. 63, pp. 315-326, 2003.
[7] P. X. Huang, B. J. Boom, and R. B. Fisher, "Hierarchical
classification with reject option for live fish recognition," Machine
Vision and Applications, vol. 26, pp. 89-102, 2014.
Fig. 6 Fish recognition result for tuna, swordfish and mahi-mahi [8] M. Ravanbakhsh, M. R. Shortis, F. Shafait, A. Mian, E. S. Harvey,
and J. W. Seager, "Automated Fish Detection in Underwater Images
Using Shape-Based Level Sets" The Photogrammetric Record, vol.
30, pp. 46-62, 2015.
[9] H. Yao, Q. Duan, D. Li, and J. Wang, "An improved K-means
clustering algorithm for fish image segmentation. Mathematical and
Computer Modelling," presented at the Computer and Computing
Technologies in Agriculture 2011 and Computer and Computing
Technologies in Agriculture 2012, 2013.
[10] X. Yao. Evolving artificial neural networks. Proceedings of the
IEEE, 1999, 87(9): 1423-1447.
[11] S. Luo, J. Li, “Accurate Object Segmentation Using Novel Active
(a)
Shape and Appearance Models based on Support Vector Machine
Learning”, The 4th International Conference on Audio, Language
and Image Processing (ICALIP), Shanghai, China, July 7-9, 2014,
pp. 347-351.
[12] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, "Active
Shape Models-Their Training and Application," Computer Vision
and Image Understanding, vol. 61, pp. 38-59, 1995.

(b)
Fig. 7 The cases that are not recognized

299

You might also like