Professional Documents
Culture Documents
PROJECT DOCUMENTATION
Submitted by:
JIMMIE MUNYI
TLCM/MG/1094/05/17
1|Page
Table of Contents
Declaration..............................................................................................................................................4
Dedication...............................................................................................................................................5
Acknowledgment.....................................................................................................................................6
ABSTRACT............................................................................................................................................7
CHAPTER ONE..........................................................................................................................................8
INTRODUCTION.......................................................................................................................................8
1.3Objectives...........................................................................................................................................9
1.4 Justification......................................................................................................................................10
CHAPTER TWO.......................................................................................................................................11
LITERATURE REVIEW..........................................................................................................................11
2.1 Introduction.....................................................................................................................................11
2.2.3 Smart Home Design for Disabled People Based on Neural Networks......................................11
2|Page
2.3.5 Transfer Learning.....................................................................................................................14
CHAPTER THREE...................................................................................................................................15
PROJECT DESIGN...................................................................................................................................15
CHAPTER FOUR.....................................................................................................................................20
SYSTEM DESIGN....................................................................................................................................20
CHAPTER FIVE.......................................................................................................................................32
5.1 DISCUSSION..................................................................................................................................32
5.2 RECOMMENDATION...................................................................................................................32
5.3 CONCLUSION...............................................................................................................................32
REFERENCES..........................................................................................................................................33
REFERENCES..........................................................................................................................................33
APPENDIX A...........................................................................................................................................34
3|Page
Questionnaire.........................................................................................................................................34
APPENDIX B............................................................................................................................................35
Estimated budget...................................................................................................................................36
Work Plan..................................................................................................................................................35
4|Page
Declaration
I, Jimmie Munyi, hereby confirm that this is my original work and that this research project is
my original work and has not been presented for an award of a degree or any similar purpose in
any other institution. I certify that the intellectual content of this project is the product of my
work and that all the assistance received in preparing this thesis and sources have been
acknowledged.
Student: Supervisor:
5|Page
Dedication
This project is dedicated with profound admiration and appreciation to God almighty for giving
me strength and breathe and my beloved parents and my siblings for their moral support. I also
want to thank our lecturers and our project supervisors for giving me such an opportunity to be
able to learn a lot of things while also having a lot of patience when teaching me.
6|Page
Acknowledgment
I would like to express my special thanks of gratitude to the Lord my God for this far we have
walked along; his divine guidance, care and enabling throughout the project period. Special
regards to my parents for funding my studies and providing all my necessities throughout this
period. I would also like to extend my sincere gratitude to all that helped to the successful
completion of the project.
7|Page
ABSTRACT
This research project aims to use Deep Learning and Computer Vision to teach a computer to
understand sign language using the different alphabets of sign language and then create a real-
time webcam based application that is designed to translate sign language that deaf people can
use to input information into the system using the method of communication they are already
familiar with. The real-time webcam based application will be the simulation of the capabilities
of my project. Over the last 10 years, Deep Learning and Artificial Intelligence has exploded
with it been applied everywhere and research papers being updated everytime something new is
discovered in that area. However, most of the research and progress in Deep Learning is within
the Deep Learning architectures themselves, and most effort in making applications of Deep
Learning is within the current hype areas like Self Driving Cars and Transfomer-based Natural
Language Processing. To make sure that progress in Deep Learning is enjoyed by all human
beings, we should ensure that we do not overlook any group. One such group that is commonly
left out is the disabled group. Little research is ongoing to make life easier for them using Deep
Learning. I will design the research approach through mixed types of research techniques. The
general design of the research and the methods used for data collection will primarily be
observation and utilization of currently available datasets. The first part giving a highlight about
the dissertation design. The second part discussing about qualitative and quantitative data
collection methods. The last part illustrating the general research framework.
8|Page
CHAPTER ONE
INTRODUCTION
Over the last 10 years the use cases of Deep Learning have exploded and almost everyone is
adopting some form of Deep Learning in their work and companies. However, the concept of
Deep Learning and Neural Networks have been around since 1943, but researchers did not invest
much in them until 2010, when a group of researches solved the ImageNet challenge using
neural networks. Another reason that has greatly influenced the explosion is, now, we have
powerful computers and Graphical Processing Units (GPUs) and availability of huge and
numerous datasets that are vital in training Neural Networks.
However, one drawback of the explosion is that much focus in research of Deep Learning is
centered around small areas that people believe is vital like Self Driving Cars and it is justified
because breakthroughs in these areas can revolutionize how we live as a species. If we want the
benefits of Deep Learning to be enjoyed by all, we should be careful to broaden the research in
all areas. One commonly left out group in technological improvement, not just AI is the disabled
group. This research tries to bridge that gap, focusing on one particular group, the deaf people.
9|Page
Deaf people communicate using sign language. Deep Learning has already proved that it can
understand images and videos using Computer Vision. Applications such as Face Recognition
are common now. We would only have to extend this area, and instead of teaching the computer
to understand faces, we could teach it to understand sign language.
This research project looks into trying to see if we can bring in the benefits gained from the
advancement to the deaf.
I envision a system using deep learning and computer vision, to teach a computer to understand
sign language using the different alphabets of the sign language to help the deaf people to be able
to input information to the system using the method of communication they are already familiar
with. The system will have a real-time inference capability that recognizes sign language from
the webcam.
1.3 Objectives
10 | P a g e
To use the computer system that has been taught to classify sign language on a real-time
system that can classify sign language on the go, as soon as you illustrate it.
1.4 Justification
The system will be efficient enough to be used because it can classify images real-time
hence communication can occur with a deaf person.
The system will aid the deaf to work with computers. A system can be further developed
where they input data into a computer using Sign Language which they are already
comfortable with instead of the other methods that can prove challenging to them.
The system can be implemented even within a tight budget since once the Convolution
Neural Network is trained all you need is a camera device that capture the sign language
and translation can be done. Even a cheap computer webcam can be used, which is
actually used in this project.
For this project, I cover designing of the Convolution Neural Network, training it and using it to
create the real-time inference system. The model will live in a remote are and the final real-time
system will communicate with it using a REST API.
11 | P a g e
CHAPTER TWO
LITERATURE REVIEW
2.1 Introduction
This section of this study presents a review of literature related to the study as sourced from
various scholars, publications and relevant professional journals. There are many research
reports, articles and books written on understanding the use of academic information sharing in
higher learning institution across the world. Studies have been done to explore challenges of
adaptation of Neural Networks in various Computer Vision tasks.
Since Sign Language Classification is a special application of the use of Neural Networks in
Computer Vision, in this Literature Review I will present a review of related literature to the
study of Neural Networks in general, Convolution Neural Networks and Computer Vision. The
review has been done in accordance with the research objectives.
2.2.2 Smart Home Design for Disabled People based on Neural Networks
Ali Hussein, Mehdi Adda and Mima Atieh published this research paper looking into how smart homes
can be designed for people living with disabilities in 2014.
12 | P a g e
2.3 Review of Theoretical Literature
2.3.1 Deep Learning
Deep learning is a computer technique to extract and transform data–-with use cases ranging
from human speech recognition to animal imagery classification–-by using multiple layers of
neural networks. Each of these layers takes its inputs from previous layers and progressively
refines them. The layers are trained by algorithms that minimize their errors and improve their
accuracy. In this way, the network learns to perform a specified task.
Deep learning has power, flexibility, and simplicity. That's why we believe it should be applied
across many disciplines. These include the social and physical sciences, the arts, medicine,
finance, scientific research, and many more.
Here's a list of some of the thousands of tasks in different areas at which deep learning, or
methods heavily using deep learning, is now the best in the world:
13 | P a g e
Robotics: Handling objects that are challenging to locate (e.g., transparent, shiny, lacking
texture) or hard to pick up
What is remarkable is that deep learning has such varied application yet nearly all of deep
learning is based on a single type of model, the neural network.
14 | P a g e
2.3.3 Convolution Neural Networks
Convolution Neural Networks are the defacto and most utilized type of Neural Networks in
Computer Vision and Classification. They are modeled using inspiration from how the human
eye operates. They remove the need for manual, time consuming feature extraction by
practitioners which is what was utilized before Convolution Neural Networks were conceived.
They provide a more scalable approach to image classification leveraging ideas from matrix
multiplication from linear algebra for the identification of patterns in images. They require a lot
of computational power and that is why we normally train them using Graphical Processing
Units (GPUs). They usually have three main layers that enable them to do their work:
Convolution Layer, a Pooling Layer and a Fully Connected Layer. The Convolution Layer is
where the most of the computation occurs and is the core building block of the Convolution
Neural Network. The pooling layer does dimensionality reduction of the parameters in the input.
Finally, the Fully Connected Layer performs the task of classification based on the features
extracted through the previous layers and their different filters. There are many types of
Convolution Neural Networks, but the one I am going to utilize in this project is the Deep
Residual Networks or the ResNet.
Transfer Learning is the reuse of a pre-trained model on a new problem. Since training of Neural
Networks can take a lot of time, transfer learning can help when creating new models since it
leverages knowledge it gained from previous experimentation to the current project. Transfer
Learning is going to be vital in this research project.
15 | P a g e
CHAPTER THREE
PROJECT DESIGN
There are a number of approaches used in this research method design. The purpose of this
chapter is to design the methodology of the research approach through mixed types of research
techniques. The research approach also supports the researcher on how to come across the
research result findings. In this chapter, the general design of the research and the methods used
for data collection are explained in detail. It includes three main parts. The first part gives a
highlight about the collection of data and how to manipulate the data so that it is more useful.
The second part discusses about the design of the Convolution Neural Network and how the data
collected will be fed into the model. The last part illustrates the creation of the final system that
uses the trained model to predict new sign language signals from the web-cam. This last system
will be the simulation of my Research Project.
Willing participants will have their hand taken pictures as they illustrate with their
fingers the Sign Language. Before participating however, they need to sign the
Permission to Participate in Project Questionnaire that is available in Appendix A.
Signing this document will show that they are willing for the pictures taken of their
hands to be used in the project. The pictures will be taken in a standard white
background to remove the effects of different backgrounds affecting the quality of the
model.
16 | P a g e
3.1.2 Use of currently available datasets
On top of the data I collect from people, I will use the American Sign Language Dataset
from Kaggle that can be found here: https://www.kaggle.com/grassknoted/asl-alphabet.
It contains 87,000 images of 200x200 pixels that have 29 classes: 26 classes that
represent the letter A-Z and 3 classes representing space, delete and nothing. The
nothing class will be useful to the model to predict when no sign language is being
shown. The Dataset has a license of GPL2 meaning we can use it for free in our
projects.
Data Augmentation are techniques used to increase the amount of data by adding
slightly modified copies of already existing data or newly created synthetic data from
existing data. It acts as a regularizer and helps to reduce over fitting when training a
Deep Learning Model. It is going to be particularly useful to make our system universal
in that when we train it using pictures that are from right handed people, it can
generalize well to people who use their left hand to illustrate sign language.
17 | P a g e
The design of this Convolution Neural Network will be implemented in fastai and PyTorch and
Python.
18 | P a g e
3.3 DESIGN OF THE FINAL SYSTEM
The final system here will be used for the simulation of the capabilities of this project. When the
code is ran, it will use the webcam to record a person illustrating the sign language and then it
will use the trained model discussed in the previous part using REST API and it will interpret the
sign language being illustrated real-time. The final system will be built using the OpenCV
Framework and Python.
Model in Cloud
Storage
19 | P a g e
3.4 SYSTEM REQUIREMENTS
20 | P a g e
CHAPTER FOUR
SYSTEM DESIGN
4.0 INTRODUCTION
This chapter presents the analysis and representation of data after data collection concerning
my Sign Language System. The purpose is to describe the data collected and see how the data
can be used to build the final system. The finding of the analysis relates to the research
questions.
To build this project, I utilized a public dataset, the American Sign Language Dataset from
Kaggle that can be found here: https://www.kaggle.com/grassknoted/asl-alphabet. It contains
87,000 images of 200x200 pixels that have 29 classes: 26 classes that represent the letter A-Z
and 3 classes representing space, delete and nothing. The nothing class will be useful to the
model to predict when no sign language is being shown. The Dataset has a license of GPL2
meaning we can use it for free in our projects.
21 | P a g e
Here is an image of all the letters in the American Sign Language and their corresponding
alphabetic representation.
22 | P a g e
Here are some of the images that were used to train my model, that come from the public dataset
as well as data collected from local sources
23 | P a g e
4.2 Creating and Training the Convolution Neural Network
Using the data collected that looks like the shown sample above, I trained a Resnet-18 Model
using Transfer Learning which enables us to get good results with less training data and compute
power.
The model was trained using the public GPUs provided by Google on Colab.
The main training loop and the results achieved were as follows:
After only 4 epochs of training with transfer learning, I got a validation accuracy of 99.99%
which shows the benefits of using transfer learning instead of training from scratch.
24 | P a g e
4.2.2 Model Testing
Model Validation is usually not the best method for testing the effectiveness of your model
because the model sees the validation data after every epoch and may start memorizing it. To
solve this, we usually use a separate test set put aside and only used once, after we have trained
our model to our liking.
After achieving a high accuracy of 99.99% in training, I tested my now trained model against a
test dataset that I had set aside.
This method takes in the test images and the model we want to test against and the labels of the
images, runs inference and returns the percentage of the accuracy of the test set
25 | P a g e
I then ran inference on my trained model and got the following results:
Again, I got a good accuracy of 100% on the test set, which confirms that the model actually
learned to distinguish different sign language symbols rather than memorizing them.
After I was satisfied with my training results, it was now time to build my final system, which
would take input from the user in video streaming and give a real-time interpretation.
For the final system, I utilized open-cv which can be used to get streaming videos from the
webcam and classify which sign language is being illustrated.
26 | P a g e
Some basic setup for the environment and the model to run
27 | P a g e
The main loop that runs the model on the webcam and gives a prediction based on the sign
language illustrated:
28 | P a g e
4.4 System Output
The system predicts what sign language is illustrated while it runs on the webcam. One places
his or her hand in the blue box shown so that the model can predict what he or she is illustrating.
The above predicts nothing because there is no Sign Language illustrated in the blue box.
29 | P a g e
Here are some outputs when I actually illustrate different Sign languages in the blue box
30 | P a g e
31 | P a g e
A video version of the system working can be seen in this YouTube Video I uploaded to
demonstrate my system: https://www.youtube.com/watch?v=-nggi8EwfOA
32 | P a g e
CHAPTER FIVE
CONCLUSIONS AND RECOMMENDATIONS
5.1 DISCUSSION
This Chapter describes discuss the objectives of the system stipulated in earlier chapters,
limitation of the system conclusion and recommendation of the System.
5.2 RECOMMENDATION
A recommendation of the study is that, further studies to be done in improving the Sign
Language System. The system is ideal for the intended purpose. However, it would perform
better if the following recommendations and suggestions are considered:
I. Improving the system to work on letters and numbers instead of just letters.
II. Porting the system to other convinient platforms like a Mobile App, which would be
more accessible to everyone.
III. Improving the robustness of the system. The system experiences some challenges
when run in dark backgrounds.
5.2 CONCLUSION
Recognition of Sign Language using Computer Vision can prove to be very useful. A certain
use-case that comes to mind in, when video calling with a person who uses Sign Language to
communicate, a system can be running on the background that recognizes the sign language
being illustrated and translating them to English words for the person on the other end. This
would be beneficial from both parties.
In general, using advancements in technology such as Deep Learning to help the disabled people
should be an area of concern for researchers and practitioners.
33 | P a g e
REFERENCES
1. Howard, Jeremy, and Sylvain Gugger. 2020. Deep Learning for Coders with Fastai and
Pytorch: AI Applications Without a Phd. 1st ed. O’Reilly Media, Inc.
2. Harrison Kinsley, Daniel Kukiela, “Neural Networks from Scratch in Python”.
3. Bradski, G. 2000. “The OpenCV Library.” Dr. Dobb’s Journal of Software Tools.
4. Clark, Alex, and Contributors. n.d. “Python Imaging Library (Pillow Fork).”
https://github.com/python-pillow/Pillow.
5. Deng, J., W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. “ImageNet: A Large-
Scale Hierarchical Image Database.” In CVPR09.
6. Elkins, Andrew, Felipe F. Freitas, and Veronica Sanz. 2019. “Developing an App to
Interpret Chest X-Rays to Support the Diagnosis of Respiratory Pathology with Artificial
Intelligence.”
7. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. “Deep Residual Learning
for Image Recognition.” CoRR abs/1512.03385. http://arxiv.org/abs/1512.03385.
8. He, Tong, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. 2018. “Bag
of Tricks for Image Classification with Convolutional Neural Networks.” CoRR
abs/1812.01187. http://arxiv.org/abs/1812.01187.
9. Huang, Gao, Zhuang Liu, and Kilian Q. Weinberger. 2016. “Densely Connected
Convolutional Networks.” CoRR abs/1608.06993. http://arxiv.org/abs/1608.06993.
10. Jeremy Howard, Sylvain Gugger, and contributors. 2019. SwiftAI. fast.ai, Inc.
https://github.com/fastai/swiftai.
11. Kluyver, Thomas, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias
Bussonnier, Jonathan Frederic, Kyle Kelley, et al. 2016. “Jupyter Notebooks – a Publishing
Format for Reproducible Computational Workflows.” Edited by F. Loizides and B.
Schmidt. IOS Press.
34 | P a g e
APPENDIX A
Permission to participate in Project
The following refers to a study into the implementation of my project. The questions are
intended for the user to give permission for the data I collect from them to be used in the project.
I will be collecting hand images of people performing Sign Language in order to use them for
training the Convolution Neural Network.
1. Gender
Male Female
2. Age
3. Name:
4. Are you willing to let the images taken of your hand be used in the following research
project ?(Tick necessary response)
Yes
No
5. Signature:
We promise to ensure anonymity of the data we collect from you and to respect your privacy.
35 | P a g e
APPENDIX B
Estimated budget
Item Quantity Price (ksh)
Printing 10 100
TOTAL 4600
36 | P a g e
Work Plan
This is the workplan used throughout the project, it is estimated to take 15 weeks.
Implementation 3 weeks D
Documentation 2 weeks E
37 | P a g e