You are on page 1of 8

Marine Creature Recognition in Aerial Video

Authors: Martin Jiang, Marcelo Jimenez, Angela Kerlin

Overview
Over the course of the summer, the 2021 SURP team has been continuing the work started by
the SharkSpotting team in developing a convolutional neural network capable of detecting
marine creatures from aerial footage. The purpose of this summer’s research was to extend or
build a neural network model that would identify sharks and other marine creatures in still
images. The long term goal of the project is to develop an application that can be used by
marine science researchers and students to assist them in their research. Another long term
goal we have for this project is to build an app that could help lifeguards or any individual detect
marine creatures. The final model was able to identify sharks with high accuracy, but struggled
with differentiating distinct marine creatures.

Background
The task of identifying marine creatures from drone footage is not only difficult, but time
consuming due to glare, murky waters, and variable lighting conditions. Even for researchers
devoted to marine science it can be a time-consuming task to identify creatures in the water.
This is why the SURP teams have been working towards an AI-based solution to identify
creatures in the water, in hopes of creating an application that could be used by researchers
and students.

In the summer of 2020, the Shark Spotting team began the project by labelling about 4700
images to be used as the training dataset for the CNN. They also implemented a Faster-RCNN
model using Pytorch, and started a web-application that used Streamlit, a Python library used to
connect the model to an interactive front end.

Approach
Due to the open-ended nature of our project and the different directions our research and
contributions could go, we began the summer by considering the different options we could take
for our summer research. After a few weeks of discussions and carefully weighing a few
different options, our team ultimately decided to move forward with creating a new
implementation of the model that used YOLOv4 instead of Faster-RCNN, due to its capability to
run real-time detection along with boasting similar performance in terms of accuracy to the
Faster-RCNN model. Along with this, we also decided to continue developing a web application
tying together the AI models with a frontend for marine scientists and students to use.

YOLOv4 is a fairly new algorithm, and was developed by Alexey Bochkovskiy in 2020 (though
the original YOLO paper was published in 2015). Compared to the previous Faster-RCNN
model, the YOLO architecture is much simpler, and runs much quicker due to conducting
classification and bounding box regression at the same time.

In addition to the model, this summer we completed a functional web application to enable
scientists and students to make use of the model. The Shark Spotting team from last summer
laid the groundwork for the user interface with a proof-of-concept application using Streamlit, a
python frontend library. We continued this work in Streamlit to create a multi-image processing
app using the Faster RCNN model also created last summer. However, the limitations of
Streamlit soon became apparent. Images were slow to load, and every user interaction with the
webpage reloaded the entire page rather than just updating the relevant portion. Due to these
limitations, we switched our frontend implementation from Streamlit to Dash by Plotly, while still
utilizing much of the backend groundwork. This transition began concurrently with the
development of the new YOLO model, and as a result the web app still uses the model created
by the Shark Spotting team rather than the YOLO model.

Because of the ongoing global COVID-19 pandemic and the surging Delta variant, our research
was conducted remotely. We used Slack, Discord, and Google Docs for communication and
documentation, and used a Google Colab Pro subscription along with Github for
implementation.

Implementation

Labeling
The labelling process was done by last year’s SURP Team. The previous dataset included
labels for the following types of creatures: Shark, Person, Sealion, Dolphin, Boat. Our team did
not expand in labelling since we focused more on training the YOLOv4 model to the original
dataset first. We did not expand on last year’s model since it included a variety of Marine
Creatures. This assumption did not consider that most of the labeled images were of Sharks,
which led to our model to perform better detection when the images were of sharks, but this was
not the case for the other classes. A suggestion for future groups could be to train the model
using an equal number of images per class or expand the current dataset with images of marine
creatures other than sharks.

Model
The implementation of the YOLOv4 model and training was conducted on Google Colab, and
can be accessed from the surpmarinecreatues@gmail.com Google Drive. Although we were
also considering other cloud computing options such as Amazon's AWS instances, we
ultimately decided to use Google Colab because we felt like it allowed a more streamlined
development, implementation, testing, and verification process.
Challenges in Switching from Faster-RCNN to YOLO
There were many obstacles in the transition from Faster-RCNN to YOLO, so there was a lot of
self-teaching involved to overcome these obstacles and ultimately create a working solution.
One big obstacle we ran into during the initial stages of the transition was the reformatting of
labels and accessing images from LabelBox. With the current labelling workflow and dataset
setup, the dataset is exported from LabelBox as a JSON file, with each image and associated
label provided in a link that points to somewhere on LabelBox’s servers. YOLO however,
requires the training dataset to be accessible on the local machine in a single directory, and also
required a different format for the labels. We spent a few weeks looking at different options on
how we could convert the JSON file to the images and labels we needed, but due to the size of
our dataset, most commercial solutions required a license in order to use. Eventually, we found
a Python script on Github that allowed us to convert the JSON to the associated images and
labels. Another issue we ran into was the corruption of the dataset folder. Initially, when we tried
uploading a compressed version of the dataset to Google Drive and unzipping it on the Colab
virtual machine, we ran into zipfile corruption errors. We were unsure of how the zipfile may
have become corrupted, but suspect that it could have happened while downloading/uploading
the dataset to Google Drive, or using the Python script to convert the JSON file. We suspected
that there also might just be an issue unzipping large files on Colab’s virtual machines, as
others have experienced similar issues with no clear solution that worked. After testing several
possible solutions, we ultimately decided to upload the uncompressed dataset to google drive,
and copy it over onto the virtual machine session. Although this solution is more time consuming
(it can take 1-2 hours to completely move all the files over), it was a key step in being able to
train the model on Colab.

Training
We trained our model using the default YOLOv4 architecture, following the recommendations
listed on the github repo. We set the input dimensions to 608x308, used a batch size of 64, with
a subdivision of 32, and set the max number of training iterations to 10000, though the model
began to overfit at around 3000 iterations. Further experimentation with the configuration file
could be used to further optimize the model.

Frontend
The frontend implementation is entirely written in python and stored in the project github. After
much debate early in the summer, we decided to convert the existing Streamlit application into a
Dash application. Dash is a python library that enables fast creation of dashboards written
entirely in python. In addition to the comparatively faster speed of Dash, the library is also
designed with machine learning projects in mind and has support for Plotly components and
Bootstrap, a well known CSS framework that enables responsive components. For information
on how to run the frontend, see the project github readme.
Website and File Structure
The website is split into three main pages: Home, a general description of the project; Process
Image where the user can upload one or more images then view the model generated labels;
and Self Label Image where the user can upload an image and label it themselves. Each page
can be accessed through the Bootstrap navbar component at the top of the page. When the
user clicks a link, the page is updated in the “pageSelect” callback (see Callbacks section)
which will return the correct page layout.

Each page layout is defined in an individual file. In addition to the layout files, we also have
“app.py”, “cfg.py”, and “utilities.py”. The configuration file contains global variables used
between apps including an object to handle running the model and the dash app object. This
dash object is used to define callbacks and ultimately run the app. The app file handles setting
the app’s layout attribute and actually calls “run_server()” which starts the dash server. Finally,
the utilities file contains useful functions shared between apps such as “uploadedToPil”, a
function that converts the default format of an uploaded image in dash, a byte 64 stream string,
into a more convenient format, a Pillow image object. This function is essential because the
model expects images to be in a Pillow format. Although it is not currently used, this file also
contains a function to convert the Pillow image back into a base64 encoded string in order to
display the image in a Html.Img() component.
Plotly Graphs
Initially in order to display the model generated labels, we were using a computer vision library
called CV2 to overlay the labels onto images. However, displaying plain images did not allow
the user to interact with labels. After the switch to Dash, this problem was solved with Plotly
components. These components include graphs with built in controls that allow the user to
annotate images as shown below. The bar in the top right has buttons to download, zoom,
create labels, and delete labels. We decided to use this tool to display labeled images to allow
users to change or delete labels the model generates, or draw their own labels on the image.

Callbacks
Dash uses a system of callbacks to handle
user interaction with the web page. This
callback system has a steep learning curve
which slowed the progress of the conversion
from Streamlit. The image on the right exhibits
the callback structure of the multi image
processing and display. Ovals indicate a
component of the website and rectangles are
callback functions. These functions are
triggered when an input (indicated with blue
arrows) are changed and return some attribute
of the output component (green arrows). In
addition, they may require some state, or
attribute, of components indicated with a red
arrow. For example, when the user presses
the next image button, the function
“updateIndex” is called which outputs the new
index to “imageIndex”, which is a dcc.store
component, which functions like a user side
cache. Because this function will modify the
existing index, it requires the current state of
“imageIndex”. Additionally, it needs to know
the quantity of images uploaded, so it also
requires the state of “uploadImages.”

Each component can only be updated by one


callback and the order they are called are
defined by a dash generated dependency tree
which cannot be modified, and as a result, the structure of the callbacks can become fairly
circular. For example, the plotly graph component, “displayProcessedImage,” can only be
updated by the “displayImages” callback, but it should be updated whenever a user uploads a
new set of images, new labels are generated by the model, or when the labels are modified by
the user.
Model Integration
As we worked on the YOLO model and the user interface simultaneously, we ran out of time to
integrate the new model with the new UI. We began initial work to integrate the YOLO model
with the frontend, however YOLOv4 operates very differently from the Faster RCNN model from
last year. On the main github link, we attached the YOLOv4 darknet submodule, however it
does not currently work with the frontend.

As a result, the current UI instead uses a Faster RCNN model from last year. The functionality
for the old model is wrapped in the “PyTorchModel” class. To run it, you must train the model,
save the weights, and put it in the “UI” directory. From there, you can create a model object by
giving it the filename of the saved weights and run predictions on it by calling the predict method
with a Pillow type image.

Experiments
Model
After training the model over the course of several days (about 3000 iterations), it was able to
attain a mean average precision (mAP) of about 80%. Specifically for sharks the average
precision (AP) was 97.78%, for people 79.27%, for Sea Lions 84.36%, and for Dolphins
92.83%. We were also able to run the model on new footage that was collected by Patrick Rex,
which can be found in the videos folder on the surmmarinecreatues@gmail.com drive account.

Demo
We demoed our progress to Patrick Rex, a master’s student at CSU Long Beach Shark Lab. He
provided useful feedback on both the app and the model’s behaviour. Overall, he was pleased
with our progress and is excited for future developments.
Results

Object Detection with the YOLO model

The YOLO network is able to conduct inference on video much quicker than the previous
Faster-RCNN model. The images shown above are just a few example screen captures from
the YOLO model running inference on unseen video footage. The model tends to categorize
things it’s unsure of as sea lions, as their colors and shape are very similar to both sharks and
dolphins. Some possible solutions for this include modifying the model parameters, or including
more images of sea lions and dolphins within the training dataset. Although the model struggles
in correctly classifying creatures in footage, it will at least detect creatures within the water,
which can already save marine scientists time when analyzing hours of footage.
Conclusions
Overall, this summer we were able to achieve many of our goals including creating a new and
improved model to detect sharks and constructing an effective tool that marine scientists and
students may use to automatically spot sharks, dolphins, and other marine creatures. While we
made significant progress this summer, we intend to keep developing the frontend, host it
remotely so anyone can access it, and refine the model. Angela and Marcelo will continue this
work in their senior project this fall. We will continue working with Patrick Rex from CSU Long
Beach for additional help in labelling, app design and data collection.

Acknowledgments
Thank you to the Santa Rosa Creek foundation and the Center for Coastal Marine Sciences for
sponsoring this year’s project. We would also like to thank Chris Lowe and Patrick Rex from
CSU Long Beach Shark Lab for their helpful feedback and invaluable datasets. Finally, we
would like to thank the 2020 Shark Spotting Team for laying the groundwork of the project and
master’s student Daniel Moore for assisting in our work.

Useful Links
● Github: https://github.com/chelojimenez/MarineCreatures
● Colab:https://colab.research.google.com/drive/1lCGn2LHrPUJYX7uY5N6Di2qAW1mjiVJ
Q?usp=sharing

References
1. Bochkovskiy, A. (2020, May 15). YOLO v4 - Neural Networks for Object Detection.
GitHub. https://github.com/AlexeyAB/darknet.
2. Plotly. (n.d.). Dash overview. Dash by Plotly. https://plotly.com/dash/.
3. Kurfess, F. J., Nolan, G., Gounder, K., Skae, C., Daly, C., & Tan, D. (2021, July). Deep
Learning at a Distance: Remotely Working to Surveil Sharks. ASEE 2021. American
Society of Engineering Education (ASEE) 2021 Annual Conference, Long Beach, CA,
U.S.A. https://www.asee.org/public/conferences/223/papers/34739/view

You might also like