REPORT

UNDERWATER MARINE OBJECT DETECTION
USING DEEP LEARNING
A PROJECT REPORT
(19EC74C)
Submitted by
BANUPRIYA.R (1911005)
SUVALAKSHMI.S (1911019)
UMA RAJA SELVI.M (1911021)
in partial fulfillment of the requirements for the award of the degree
of
BACHELOR OF ENGINEERING
in
ELECTRONICS AND COMMUNICATION ENGINEERING
NATIONAL ENGINEERING COLLEGE, KOVILPATTI
(An Autonomous Institution Affiliated To Anna University, Chennai)
K.R.NAGAR, KOVILPATTI – 628 503
ANNA UNIVERSITY, CHENNAI – 600 025
NOVEMBER 2022
BONAFIDE CERTIFICATE
Certified that this report titled “UNDERWATER MARINE OBJECT

DETECTION USING DEEP LEARNING” is the bonafide work of
“BANUPRIYA R (1911005), SUVALAKSHMI S (1911019), UMA RAJA
SELVI M (1911021)” who carried out this workunder my supervision.
SIGNATURE SIGNATURE
SUPERVISOR HEAD OF THE DEPARTMENT
Dr.V.R.S.MANI, Dr.A.SHENBAGAVALLI,M.E.,Ph.D.,
Associate Prof(SG)/ECE Professor and Head,
Department of ECE, Department of ECE,
National Engineering College, National Engineering College,
K.R.Nagar, Kovilpatti–628503. K.R.Nagar, Kovilpatti-628503.
Submitted for the project viva voce examination held at NATIONAL

ENGINEERING COLLEGE, K.R.Nagar, Kovilpatti on ………………
INTERNAL EXAMINER EXTERNAL EXAMINER
ii
ACKNOWLEDGEMENT
First and fore most we express our wholehearted gratitude to the

Almighty for providing wisdom and courage to take over this work.
We wish to express our sincere thanks to the Director
Dr.S.Shanmugavel, B.Sc.,D.M.I.T.,Ph.D., who helped us in carrying our
work successfully.
We would like to express our sincere thanks to our principal

Dr.K.Kalidasa Murugavel, M.E., Ph.D., for providing us this opportunity to
do this work.
Our heartfelt acknowledgement goes to the Professor and Head of the

Department, Electronics and Communication Engineering,
Dr.A.Shenbagavalli, M.E., Ph.D., for her valuable and consistent
encouragement for carrying out the work.
Our gratitude is no less for our Supervisor Dr.V.R.S.Mani, Associate

Prof.(SG), Electronics and Communication Engineering Department for his
encouragement.
We hereby acknowledge the efforts of all Staff members, Lab Technicians of

Electronics and Communication Engineering Department, our Parents and our
Friends whose help was instrumental in the completion of our work.
iii
TABLE OF CONTENTS
CHAPTER NO CONTENTS PAGE NO
viii
ABSTRACT
v
LIST OF FIGURES
vi
LIST OF TABLES
vii
LIST OF SYMBOLS, ABBREVATIONS
AND NOMENCLATURE
1 INTRODUCTION 1
2 LITERATURE SURVEY 2
3 PROPOSED WORK 9
3.1 PROBLEM STATEMENT 9
3.2 OBJECTIVE 11
3.3 METHODOLOGY 13
3.4 PERFORMANCE EVALUATION 15
3.5 YOLO METHODOLOGY 19
24
4 RESULTS AND OUTPUT
26
5 CONCLUSION
27
REFERENCES
iv
LIST OF FIGURES
CHAPTER CONTENTS PAGE

NO NO
3.1 The naive version of the Inception module 12
3.2 Inception module with dimension reduction 12
3.3 Block diagram of the proposed methodology 14
3.4 Loss curves of training Inception V2 based 15

Faster R-CNN detector with training set of
ECUHO-2: (a) Classification loss, (b)
Classifier localization loss, (c) RPN
localization loss, (d) RPN objectness loss, (e)
Total loss and (f) clone loss
3.5 Loss curves of training the Inception V2 16

based faster R-CNN model with ECUHO-1:
(a) Classification loss, (b) Classifier
localization loss, (c) RPN localization loss, (d)
RPN objectness loss, (e) Total loss, (f) clone
loss, and (g) time required per step
3.6 Validation set 17
3.7 Test set 17
3.8 Training graphs 18
3.9 YOLO Architecture 19
3.10 Darknet53 Architecture 22
3.11 Sample F4K dataset image showing single and 22

multi-objects
3.12 Object Detection Methodology and bounding 23

box with objectness score
4.1 Obtained Output 24
v
LIST OF TABLES
CHAPTER NO CONTENTS PAGE NO
3.1 Comparison of YOLOV5 and Faster 11

RCNN
3.2 LOSSES FOR TRAINING ON ECUHO - 14

1 & 2 DATASETS
vi
LIST OF SYMBOLS, ABBREVIATIONS AND NOMENCLATURE:
1. UIQM Underwater Image Quality Measure
2. UICM Underwater Image Colorfulness Measure
3. UISM Underwater Image Sharpness Measure
4. UIConM Underwater Image Contrast Measure
5. CNN Convolutional Neural Network
6. R-CNN Region-based Convolutional Neural Networks
7. YOLO You Only Look Once
8. FPS Frames Per Second
9. FPN Feature Pyramid Network
10. ReLU Rectified Linear Activation Unit
11. FC Layer Fully Connected Layers
12. ECUHO ECU Halophila ovalis
vii
ABSTRACT
Deep learning, also known as deep machine learning or deep

structured learning based techniques, have recently achieved tremendous
success in digital image processing for object detection and classification.
However, state of the art object detection techniques cannot be applied in
underwater environments where the images may not be in good resolution or
may be dark, greatly affecting the object detection accuracy. As a result, they
are rapidly gaining popularity and attention from the computer vision
research community. The underwater environment is physically very different
from air and requires additional algorithms to improve the imagery. There has
been a massive increase in the collection of digital imagery for the monitoring
of underwater ecosystems, including seagrass meadows. This growth in
image data has driven the need for automatic detection and classification using
deep neural network based classifiers. In the proposed work, a deep
learning frame work is to be designed for the analysis of underwater images
and its performance is to be compared with other state of the art technique
viii
CHAPTER-1
INTRODUCTION
Oceans are like the lifeblood of Mother Nature, holding 97% of the
earth's water. They produce more than half of the oxygen and absorb most of
the carbon from our environment. Deep learning, the state of art machine
learning technology, provides potentially unprecedented opportunities for
many underwater objects. To make sense of texts, images, sounds etc., deep
learning transforms input data through more layers than shallow learning
algorithms. Deep learning is replacing hand crafted features, with efficient
algorithms for feature learning and hierarchical feature extraction. Deep
learning attempts to make better representations of an observation (e.g. an
image) and create models to learn these representations from large-scale data.
A survey on the current deep learning approaches for various marine object
detection and classification would help researchers understanding the
challenges and explore more efficient possibilities.
1
CHAPTER-2
LITERATURE SURVEY
1. TITLE OF THE PAPER: An Underwater Image Enhancement

Benchmark Dataset and Beyond
AUTHOR: Chongyi Li , Chunle Guo , Wenqi Ren , Runmin Cong ,
Junhui Hou , Sam Kwong and Dacheng Tao
PUBLISHED YEAR AND JOURNAL: 2019 & IEEE TRANSACTIONS
ON IMAGE PROCESSING
Usually, an underwater image is degraded by wavelength-dependent

absorption and scattering including forward scattering and backward
scattering These adverse effects reduce visibility, decrease contrast, and even
introduce color casts, which limit the practical applications of underwater
images and videos in marine biology and archaeology , marine ecological It is
thus unclear how these algorithms would perform on images acquired in the
wild and how we could gauge the progress in the field. To bridge this gap, we
present the first comprehensive perceptual study and analysis of underwater
image enhancement using large-scale real-world images. The benchmark
evaluations and the proposed Water-Net demonstrate the performance and
limitations of state-of-the-art algorithms, which shed light on future research in
underwater image enhancement.
2
2. TITLE OF THE PAPER: Human-Visual-System-Inspired Underwater
Image Quality Measures
AUTHOR: Karen Panetta, Chen Gao, Sos Agaian
PUBLISHED YEAR JOURNAL: 2015

&IEEE JOURNAL OF OCEANIC ENGINEERING
Underwater images suffer from blurring effects, low contrast, and grayed
out colors due to the absorption and scattering effects under the water.Many
image enhancement algorithms for improving the visual quality of underwater
images have been developed. Unfortunately, no well-accepted objective
measure exists that can evaluate the quality of underwater images similar to
human perception. The UIQM comprises three underwater image attribute
measures: the underwater image colorfulness measure (UICM), the underwater
image sharpness measure (UISM), and the underwater image contrast measure
(UIConM).These measures are also used on the AirAsia 8501 wreckage images
to show their importance in practical applications.
3
3.TITLE OF THE PAPER: Investigation of Vision-based Underwater
Object Detection with Multiple Datasets
AUTHOR: Dario Lodi Rizzini, Fabjan Kallasi, Fabio Oleari and Stefano
Caselli
PUBLISHED YEAR&JOURNAL: 2015&International Journal of

Advanced Robotic Systems
Underwater computer vision has to cope with distortion and attenuation

due to light propagation in water, and with challenging operating conditions.
Scene segmentation and shape recognition in a single image must be carefully
designed to achieve robust object detection and to facilitate object pose
estimation. The proposed method searches for a target object according to a
few general criteria that are robust to the underwater context, such as salient
colour uniformity and sharp contours. The datasets have been obtained using
stereo cameras of different quality, and diverge for the target object type and
colour, acquisition depth and conditions. The effectiveness of the proposed
approach has been experimentally demonstrated.
4
4. TITLE OF THE PAPER: Comprehensive Survey on Underwater
Object Detection and Tracking
AUTHOR: Girish Gaude1, Samarth Borkar
PUBLISHED YEAR&JOURNAL: 30/Nov/2018 & International Journal
of Computer Sciences and Engineering
The recent developments in underwater video monitoring system makes

automatic object detection and object tracking a significant and challenging
task. In such processing, the method involves preprocessing, feature extraction,
object classification, object detection and tracking. Different techniques are
proposed to detect moving objects on underwater videos such as background
subtraction model, frame difference model, and optical flow model is
discussed. Apart from this, one hybrid model is discussed using combining
background subtraction and three-frame difference to ensure a reliable moving
object detection from an underwater video with low contrast in a complicated
environment.
5
5. TITLE OF THE PAPER: Underwater Object Detection and
Reconstruction Based on Active Single-Pixel Imaging and Super-
Resolution Convolutional Neural Network
AUTHOR: Mengdi Li , Anumol Mathai ,Stephen L. H. Lau ,Jian Wei

Yam ,Xiping Xu ,andXin Wang 2ORCID
PUBLISHED YEAR&JOURNAL: 5 January 2021
Various approaches have been developed for imaging objects under low-
light conditions and for improving the quality of underwater images.
Polarimetric imaging systems were used to reduce the backscattering effect and
produced a good quality image. Although polarization correction can improve
the renovation quality, it also blocks part of the light incident on the detector,
thereby reducing the signal to noise ratio and complicating the problem of
imaging underwater. This system can overcome the limitations of obtaining
good-quality images in low-light conditions. The proposed SPI system
employed a combination of the CS algorithm and a neural network to enhance
the quality of the images to reduce sampling time and postprocessing of data.
Referring to the network structure of an CS-SRCNN, the employed network
algorithm successfully deduced the HR images from being trained with
multiple measurement data.
6
6. TITLE OF THE PAPER: Underwater Image Processing and Object
Detection Based on Deep CNN Method
AUTHOR: Fenglei Han,Jingzheng Yao , ,Haitao Zhu,and Chunhui Wang
PUBLISHED YEAR & JOURNAL: 22 May 2020
Due to the importance of underwater exploration in the development and

utilization of deep-sea resources, underwater autonomous operation is more
and more important to avoid the dangerous high-pressure deep-sea
environment. For underwater autonomous operation, the intelligent computer
vision is the most important technology.
7
7. TITLE OF THE PAPER: Hyperspectral imaging for underwater object
detection
AUTHOR: Mengdi Li , Anumol Mathai ,Stephen L. H. Lau ,Jian Wei

Yam ,Xiping Xu , and Xin Wang 2ORCID
PUBLISHED YEAR&JOURNAL: 5 January 2021
Various approaches have been developed for imaging objects under low-
light conditions and for improving the quality of underwater images.
Polarimetric imaging systems were used to reduce the backscattering effect and
produced a good quality image. Although polarization correction can improve
the renovation quality, it also blocks part of the light incident on the detector,
thereby reducing the signal to noise ratio and complicating the problem of
imaging underwater. This system can overcome the limitations of obtaining
good-quality images in low-light conditions. The proposed SPI system
employed a combination of the CS algorithm and a neural network to enhance
the quality of the images to reduce sampling time and postprocessing of data.
Referring to the network structure of an CS-SRCNN, the employed network
algorithm successfully deduced the HR images from being trained with
multiple measurement data.
8
CHAPTER-3
PROPOSED WORK
3.1 PROBLEM STATEMENT:
 DUAL PRIORITIES: OBJECT CLASSIFICATION AND

LOCALIZATION
Regional-based CNNs represent one popular class of

object detection frameworks. These methods consist of the generation of region
proposals where objects are likely to be located followed by CNN processing
to classify and further refine object locations.
 SPEED FOR REAL-TIME DETECTION
Object detection algorithms need to not only accurately classify

and localize important objects, they also need to be incredibly fast at prediction
time to meet the real-time demands of video processing. Several key
enhancements over the years have boosted the speed of these algorithms,
improving test time from the 0.02 frames per second (fps) of R-CNN to the
impressive 155 fps of Fast YOLO.
 MULTIPLE SPATIAL SCALES AND ASPECT RATIOS
ANCHOR BOXES
Multiple RoIs may be predicted at each position and are described
relative to reference anchor boxes. The shapes and sizes of these anchor boxes
are carefully chosen to span a range of different scales and aspect ratios. This
allows various types of objects to be detected with the hopes that the bounding
box coordinates need not be adjusted much during the localization task.
9
 MULTIPLE FEATURE MAPS
Single-shot detectors must place special emphasis on the issue of

multiple scales because they detect objects with a single pass through the CNN
framework. If objects are detected from the final CNN layers alone, only large
items will be found as smaller items may lose too much signal during down
sampling in the pooling layers.
 FEATURE PYRAMID NETWORK

The feature pyramid network (FPN) takes the concept of multiple
feature maps one step further. Images first pass through the typical CNN
pathway, yielding semantically rich final layers. Then to regain better
resolution, FPN creates a top-down pathway by up sampling this feature map.
 LIMITED DATA
The limited amount of annotated data currently available for object

detection proves to be another substantial hurdle. Object detection datasets
typically contain ground truth examples for about a dozen to a hundred classes
of objects, while image classification datasets can include upwards of 100,000
classes.
 CLASS IMBALANCE
Class imbalance proves to be an issue for most classification problems, and

object detection is no exception. Consider a typical photograph. More likely
than not, the photograph contains a few main objects and the remainder of the
image is filled with background
10
3.2 OBJECTIVE:
To develop a deep learning based Object detection algorithm for underwater

images
Table 3.1 Comparison of YOLOV5 and Faster RCNN
YOLO V5 FASTER RCNN

INFERENCE SPEED
DETECTION OF SMALL OR FAR

AWAY OBJECTS
LITTLE TO NO OVERLAPPING BOXES
MISSED OBJECTS
DETECTION OF CROWDEDOBJECTS
Faster R-CNN Based Object Detection
The latest addition to the CNN based object detection technique after R-CNN
and Fast R-CNN is named Faster R-CNN. Fast R-CNN is faster than R-CNN
but was not able to achieve real-time detection for video data using deep
neural networks. Both R-CNN and Fast R-CNN are based on the concept of
calculating region proposals which act as the tailback. Finding regions of
interest in both R-CNN and Fast R-CNN depends on selective search method
that uses the greedy algorithm to merge super pixels based on low-level
features, which is a slow operation
Faster R-CNN Based on Inception V2
Though the initial Faster R-CNN was based on VGG16 convolutional neural
network, the detection accuracy have been improved due to the extensive
experimentation and crafting of the architecture of the CNN’s and by
increasing the width and height of the network. On the series of improvement
for the object detection network, the new network codenamed ‘inception’ has
gained the best and highest accuracy in ILSVRC 2014.
11
Fig 3.1 The naive version of the Inception module
Fig 3.2 Inception module with dimension reduction
12
3.3 METHODOLOGY
Data collection
We have collected underwater images of Halophila ovalis growing in
shallow coastal waters of Western Australia and an experimental facility at
Edith Cowan University. Photographs were taken using a Fujifilm X-T30
mirror-less digital camera with XF 18-55mm F 2.8-4 R LM OIS lens. The
camera with the lens was encapsulated inside a compatible underwater housing.
Data labelling
All the images from both real and laboratory environments are then
labelled using ‘LabelImg’ software which is a python based graphical image
annotation tool. LabelImg uses Qt (an open-source widget toolkit) to create a
graphical user interface.
Training Faster R-CNN Inception V2 Detector
For training the detector, we have divided the whole ECUHO-1 dataset
into the training set and testing set. ECUHO-1 training set consists of 2,160
images, and the testing set consists of 539 images
13
Fig 3.3 Block diagram of the proposed methodology
Table 3.2 LOSSES FOR TRAINING ON ECUHO - 1 & 2 DATASETS
14
3.4 Performance Evaluation
Once the training was completed, the inception V2 based Faster R-CNN
object detector was tested and evaluated using COCO detection metrics
Fig 3.4 Loss curves of training Inception V2 based Faster R-CNN
detector with training set of ECUHO-2: (a) Classification loss, (b)
Classifier localization loss, (c) RPN localization loss, (d) RPN objectness
loss, (e) Total loss and (f) clone loss
15
Fig 3.5 Loss curves of training the Inception V2 based faster R-
CNN model with ECUHO-1: (a) Classification loss, (b) Classifier
localization loss, (c) RPN localization loss, (d) RPN objectness loss, (e)
Total loss, (f) clone loss, and (g) time required per step
The third important parameter is a false negative, which is a ground truth

bounding box that has not been detected.
16
RESULT AND DISCUSSION:
The precision and recall values of Inception V2 based Faster R-CNN

object detection model. From the table, we can see that while testing the model
trained with ECUHO-1 with the corresponding testing set, the mean average
precision is 0.26 (which is close to 0.28 that was achieved by the Faster R-CNN
Inception V2 object detector over COCO dataset). For IoU threshold of 0.50,
the precision increased to 0.668. Average recall over 100 objects in an image
frame is 0.357, for ten objects, it is 0.129.
Fig 3.6 VALIDATION SET
Fig 3.7 TEST SET

17
Fig 3.8 TRAINING GRAPHS
The performance comparison of Mask R-CNN and YOLOv5 aims to

produce the best detection and recognition models for Underwater Object
Detection. Based on experiments, YOLOv5 outperformed Mask R-CNN with
a mAP@[.5-.95] score of 0.987 and faster training times.
YOLO and Faster RCNN both share some similarities. They both uses a
anchor box based network structure, both uses bounding both regression.
Things that differs YOLO from Faster RCNN is that it makes classification and
bounding box regression at the same time. it make sense that YOLO wanted a
more elegant way to do regression and classification.
Faster RCNN on the other hand, do detect small objects well since it has
nine anchors in a single grid, however it fails to do real-time detection with its
two step architecture.
18
3.5 YOLO METHODOLOGY
A. Network architecture:
YOLO architecture is based on CNN as shown in Figure.
Fig 3.9 YOLO ARCHITECTURE
There are prior three versions of YOLO before YOLOv3. YOLOv1 was
the first implementation of single stage detector concept which uses reduction
layers of dimension 1x1 followed by convolutional layer of dimension 3x3 and
uses batch normalization and leaky ReLU activation function. The network
consists of 24 convolutional layer which extracts features and the two fully
connected (FC) layer that predicts bounding boxes and its class probability. The
final output obtained is a 7x7x30 tensor consisting of bounding boxes. This
model is trained to detect 49 objects, but produces high value of error in
localizing them.
The improved version of YOLOv1 is YOLOv2 which was built mainly

focusing on reduced localization error. YOLOv2 removed the end FC layers
and added batch normalization on all convolutional layers which made the
network resolution independent and obtained lower localization error.
YOLOv2 used darknet-19, that utilizes a network with 19 layers augmented
with additional 11 layers to detect objects.
19
Bounding box
Both the previous YOLO models can detect less than 20 classes, hence a more
advanced model YOLO9000 was developed which can detect and classify more
objects and classes. These models were then improved by adding more features
like residual blocks, skip connections and up-sampling and named as YOLOv3
which utilizes a 53 layered network which is trained on ImageNet database.
B. Bounding box forecasting:
YOLOv3 uses a single pipeline for feature extraction and hence the
whole image is passed on to the convolutional network and produces a square
output called grid on to which the bounding boxes are anchored. The grid cell
and anchor share a common centroid. The YOLO algorithm predicts location
offset against anchor box: tx, ty, tw, th, objectness scores, and class probability.
Objectness-score gives the confidence of object presence in the bounding box
and the and class probability defines the class which it belongs perfectly or not.
The predictions correspond to the bounding box coordinates with (Cx, Cy)
being the upper-left corner and Ph and Pw being the width and height and as
depicted in the Figure and calculated as given in (1), (2), (3), (4)
where bx is x-coordinate, by is y-coordinate, bw and bh are

the height and width. The measure of overlapping of ground truth and bounding
box, called objectness-score, is calculated by logistic regression. Value “1”
20
indicates the perfect overlap of bounding box and ground truth or overlap above
a threshold, whereas if the overlap is not perfect and below a threshold the
\value will be “0” and the bounding box is ignored. The objectness score
initially help to filter the perfect bounding box. Generally, those bounding
boxes with a objectness score greater than the threshold are filtered first and
then considered for further filtering process. Most of the object detection
algorithm faces the problem of detecting the same object different time in
different frames resulting in its poor performance. YOLO [1] [2] [3] uses non
maximal suppression (NMS) to solve the problem of multiple detection of same
images. NMS uses a special function called Intersection of Union or IOU, by
setting a minimum IOU threshold which is commonly set as 0.5. If B1 and B2
are two bounding boxes, the IOU is determined as the ratio of the intersection
of area of B1-B2 to the total area combining B1-B2.
C. Class prediction:
YOLOv3 uses a multi label classification. Here independent logistic

classifiers are used, instead of soft max function, to reduce the calculational
complexity which in turns improves the system performance. For example, in
complex situations like using an open image dataset, an object can be labelled
as a cat and an animal ie; there are many overlapping. SoftMax provides poor
performance as it predicts the presence of single class, which may not be the
desired result, and hence binary cross entropy is used in YOLOv3.
D. Predictions across scales:
Predictions are made by three different scales; 13x13, 26x26 and 52x52.
Features are extracted using feature pyramid network followed by darknet53.
The last stage of prediction is a 3-d tensor encoding the bounding box,
confidence value that gives objectness score, and probability of the object being
in a particular class.
21
Fig 3.10 Darknet53 Architecture
Fig 3.11 Sample F4K dataset image showing single and multi-objects
22
Fig 3.12 Object Detection Methodology and bounding box with
objectness score
E. Feature extractor
YOLOv3 uses Darknet53 network for feature extraction which is a

hybrid network derived from darknet19 and residual network. It has a total of
53 convolutional layers therefore called as darknet-53. YOLOv2 uses darknet-
19 for feature extraction and YOLOv3 uses darknet53 with 53 convolutional
layers for the same. Both YOLOv3 and YOLOv2 use batch normalization.
23
CHAPTER-4
RESULT AND OUPUTS
Fig 4.1 OBTAINED OUTPUT
{
"predictions":
[
{
"x": 486,
"y": 377.5,
"width": 52,
"height": 69,
"class": "echinus",
"confidence": 0.854
},
{
24
"x": 300,
"y": 478,
"width": 32,
"height": 62,
"class": "echinus",
"confidence": 0.842
},
{
"x": 393.5,
"y": 128,
"width": 27,
"height": 48,
"class": "echinus",
"confidence": 0.777
},
{
"x": 380,
"y": 490.5,
"width": 52,
"height": 43,
"class": "echinus",
"confidence": 0.686
},
{
"x": 413,
"y": 92,
"width": 16,
"height": 30,
"class": "echinus",
"confidence": 0.598
}
]
}
25
CHAPTER-5
CONCLUSION
In this work, recent approaches for detecting and classifying various

underwater marine objects using deep learning are discussed. Approaches are
categorized according to the targets of detection. Features and deep learning
architectures used are summarized. It was necessary to highlight all the
approaches of marine data analysis in a single paper so that it becomes easy to
focus on the possibilities of future work based on deep neural network method.
It has been found that more works have be done for coral detection and
classification using deep learning but no work has been done for the case of
seagrass which is equally vital for oceanic ecosystem. The effectiveness,
accuracy and robustness of any detection and classification algorithm can be
increased significantly if both colour and texture based features are combined.
Accumulation of hand-crafted features and neural network may bring better
results for seagrass detection and classification. Therefore, the opportunity
exists to develop an efficient and effective deep learning approach for
underwater seagrass imagery, which will be the focus of our future work.
26
REFERENCES:
[1] Beijbom, O., Edmunds, P. J., Kline, D.I., Mitchell, B.G., Kriegman, D.:
Automated annotation of Coral Reef Survey Images. In: IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, pp. 1170–
1177. IEEE Press, Rhode Island (2012)
[2] Campos, M.M., Codina, G.O., Amengual, L.R., Julia, M.M.: Texture
Analysis of Seabed
Images: Quantifying the Presence of Posidonia Oceanica at Palma Bay. In:
OCEANS 2013
- MTS/IEEE Bergen, pp. 1-6. IEEE Press, Bergen (2013)
[3] Erftemeijer, P.L.A., Lewis III, R.R.R.: Environmental Impacts of
Dredging on Seagrasses: A review. Marine Pollution Bulletin 52, 1553–1572
(2006)
[4] Hedley, J.D., Roelfsema, C.M., Chollett, I., Harborne, A.R., Heron,
S.F., Weeks, S.J., Mumby, P.J.: Remote Sensing of Coral Reefs for Monitoring
and Management : A Review. Remote Sens. 8(118), (2016)
[5] Hinton, G.E., Osindero, S.: A fast Learning Algorithm for Deep Belief
Nets. Neural Comput. 18(7), 1527-1554 (2006)
[6] Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet Classification with
Deep Convolutional Neural Networks. In: Advances in Neural Information
Processing Systems 25 (NIPS) (2012)
[7] Lavery, P.S., McMahon, K., Weyers, J., Boyce, M.C., Oldham, C.E.:
Release of Dissolved Organic Carbon from Seagrass Wrack and its
Implications for Trophic Connectivity. Marine Ecology Progress Series 494,
121–133 (2013)
[8] Li, X., Shang, M., Qin, H., Chen, L.: Fast Accurate Fish Detection and
Recognition of Underwater Images with Fast R-CNN. In: OCEANS 2015 -
MTS/IEEE Washington, pp. 1-5.
27
IEEE Press, Washington, DC (2015)
[9] Lu, H., Li, Y., Zhang, Y., Chen, M., Serikawa, S., Kim, H.: Underwater
Optical Image Processing: A Comprehensive Review. Computer Vision and
Pattern Recognition. (2017)
[10] Mahmood, A., Bennamoun, M., An, S., Sohel, F., Boussaid, F., Hovey,
R., Kendrick, G., Fisher, R.B.: Coral Classification with Hybrid Feature
Representations. In: IEEE Inter1national Conference on Image Processing, pp.
519-523. IEEE Press, Arizona (2016)
[11] Oguslu, E., Erkanli, S., Victoria, J., Bissett, W.P., Zimmerman, R.C.,
Li, J.: Detection of Seagrass Scars using Sparse Coding and Morphological
Filter. In: Proc. SPIE 9240, Remote Sensing of the ocean, Sea Ice, Coastal
Waters, and Large Water Regions, pp. 92400G, Amsterdam (2014)
[12] Qin, H., Li, X., Yang, Z., Shang, M.: When Underwater Imagery
Analysis Meets Deep Learning : A Solution at the Age of Big Visual Data in
OCEANS 2015 – MTS/IEEE Washington, pp. 1-5. IEEE Press, Washington,
DC (2015)
[13] Ravanbakhsh, M., Shortis, M., Shafait, F., Mian, A., Harvey, E., Seager,
J.: Automated Fish Detection in Underwater Images UsingShape‐ Based
Level Sets. The Photogrammetric Record 30(149), 46-62 (2015)
[14] Shiela, M.M.A., Soriano, M., Saloma, C.: Classification of Coral Reef
Images from Underwater Video using Neural Networks. Optics express 13(22),
8766–8771 (2005)
[15] Spampinato, C., Chen-Burger, Y.H., Nadarajan, G., Fisher, R.B.:
Detecting, Tracking and Counting Fish in Low Quality Unconstrained
Underwater Videos. In: 3rd International Conference on Computer Vision
Theory and Applications (VISAPP), pp. 514-519. Funchal, Madeira (2008)
[16] Stokes, M.D., Deane, G.B.: Automated Processing of Coral Reef Benthic
Images. Limnol. Oceanogr.: Methods. 7, 157-168 (2009)
28
[17] Teng, M.Y., Mehrubeoglu, M., King, S.A., Cammarata, K., Simons, J.:
Investigation of Epifauna Coverage on Seagrass Blades Using Spatial and
Spectral Analysis of Hyperspectral images. In: 5th Workshop on Hyperspectral
Image and Signal Processing: Evolution in Remote Sensing, pp. 25-28.
Gainsville, Florida (2013)
[18] Vanderklift, M., Bearham, D., Haywood, M., Lozano-Montes, H.,
Mccallum, R., Mclaughlin, J., Lavery, P.: Natural Dynamics: Understanding
Natural Dynamics of Seagrasses in North-Western Australia. Theme 5 Final
Report of the Western Australian Marine Science Institution (WAMSI)
Dredging Science Node. 39 (2016)
[19] Deng, L., Yu, d.: Methods and Applications. Foundations and Trends.
Signal Processing. 7, 197-387 (2013)
[20] Schmidhuber, J.: Deep Learning in Neural Networks: An Overview.
Neural Networks. 61, 85-117 (2015)
[21] Song, H. A., Lee, S.Y.: Hierarchical Representation Using NMF. In:
ICONIP 2013. LNCS, vol. 8226, pp. 466-473. Springer, Heidelberg (2013)
[22] Liu, M.: AU-aware Deep Networks for facial expression recognition. In:
10th IEEE International Conference and Workshops on Automatic Face and
Gesture Recognition, pp. 1-6, IEEE Press, Shanghai (2013)
[23] Ciresan, D., Meier, U., Schmidhuber, J: Multi-column deep neural
networks for image classification. Computer Vision and Pattern Recognition.
3642-3649 (2012)
[24] Elawady, M.,E.: Sparsem: Coral Classification Using Deep
Convolutional Neural Networks. Msc Thesis, Hariot-Watt University (2014)
[25] Villon, S., Chaumont, M., Subsol, G., Villeger, S., Claverie, T.: Coral
Reef Fish Detection and Recognition in Underwater Videos by Supervised
Machine Learning: Comparison between Deep Learning and HOG+SVM
Methods. In: 17 International Conference on Advanced Concepts for Intelligent
29
Vision Systems, Springer (2016)
[26] Py, Q., Hong, H., Zhongzhi, S.: Plankton Classification with Deep
Convolutional Neural Network. In: IEEE Information Technology,
Networking, Electronic and Automation Control Conference, pp. 132-136.
Chongqin (2016)
[27] Lee, H., Park, M., Kim, J.: Plankton classification on imbalanced large
scale database via convolutional neural networks with transfer learning. In:
IEEE International Conference on Image Processing (ICIP), pp. 3713-3717.
Phoenix (2016)
[28] Dai, J., Wang, R., Zheng, H., Ji, G., Qiao, X.: ZooplanktoNet: Deep
Convolutional Network for Zooplankton Classification. In: OCEANS, pp. 1-6.
Shanghai (2016)
[29] Hemminga, M.A., Duarte, C.M.: Seagrass Ecology. Cambridge
University Press, Cambridge (2004)
[30] National Data Science Bowl, https://www.kaggle.com/c/datasciencebowl
[31] Dieleman, S.: Classifying Planktons with Deep Neural Networks,
http://ben-
anne.github.io/2015/03/17/plankton.html
30

REPORT

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

REPORT

Uploaded by

Copyright:

Available Formats

UNDERWATER MARINE OBJECT DETECTION

USING DEEP LEARNING

in partial fulfillment of the requirements for the award of the degree

ELECTRONICS AND COMMUNICATION ENGINEERING

NATIONAL ENGINEERING COLLEGE, KOVILPATTI

(An Autonomous Institution Affiliated To Anna University, Chennai)

K.R.NAGAR, KOVILPATTI – 628 503

ANNA UNIVERSITY, CHENNAI – 600 025

Certified that this report titled “UNDERWATER MARINE OBJECT

SUPERVISOR HEAD OF THE DEPARTMENT

Associate Prof(SG)/ECE Professor and Head,

Department of ECE, Department of ECE,

National Engineering College, National Engineering College,

K.R.Nagar, Kovilpatti–628503. K.R.Nagar, Kovilpatti-628503.

Submitted for the project viva voce examination held at NATIONAL

INTERNAL EXAMINER EXTERNAL EXAMINER

First and fore most we express our wholehearted gratitude to the

We would like to express our sincere thanks to our principal

Our heartfelt acknowledgement goes to the Professor and Head of the

Our gratitude is no less for our Supervisor Dr.V.R.S.Mani, Associate

We hereby acknowledge the efforts of all Staff members, Lab Technicians of

CHAPTER NO CONTENTS PAGE NO

3.4 PERFORMANCE EVALUATION 15

3.5 YOLO METHODOLOGY 19

CHAPTER CONTENTS PAGE

3.1 The naive version of the Inception module 12

3.2 Inception module with dimension reduction 12

3.3 Block diagram of the proposed methodology 14

3.4 Loss curves of training Inception V2 based 15

3.5 Loss curves of training the Inception V2 16

3.6 Validation set 17

3.7 Test set 17

3.8 Training graphs 18

3.9 YOLO Architecture 19

3.10 Darknet53 Architecture 22

3.11 Sample F4K dataset image showing single and 22

3.12 Object Detection Methodology and bounding 23

4.1 Obtained Output 24

CHAPTER NO CONTENTS PAGE NO

3.1 Comparison of YOLOV5 and Faster 11

3.2 LOSSES FOR TRAINING ON ECUHO - 14

1. UIQM Underwater Image Quality Measure

2. UICM Underwater Image Colorfulness Measure

3. UISM Underwater Image Sharpness Measure

4. UIConM Underwater Image Contrast Measure

5. CNN Convolutional Neural Network

6. R-CNN Region-based Convolutional Neural Networks

7. YOLO You Only Look Once

8. FPS Frames Per Second

9. FPN Feature Pyramid Network

10. ReLU Rectified Linear Activation Unit

11. FC Layer Fully Connected Layers

12. ECUHO ECU Halophila ovalis

Deep learning, also known as deep machine learning or deep

learning technology, provides potentially unprecedented opportunities for

algorithms. Deep learning is replacing hand crafted features, with efficient

algorithms for feature learning and hierarchical feature extraction. Deep

learning attempts to make better representations of an observation (e.g. an

detection and classification would help researchers understanding the

challenges and explore more efficient possibilities.

1. TITLE OF THE PAPER: An Underwater Image Enhancement

Usually, an underwater image is degraded by wavelength-dependent

AUTHOR: Karen Panetta, Chen Gao, Sos Agaian

PUBLISHED YEAR JOURNAL: 2015