Professional Documents
Culture Documents
A Project Report On Gesture Recognition Using Matlab
A Project Report On Gesture Recognition Using Matlab
TECHNOLOGY
Project Report on
GESTURE RECOGNITION USING MATLAB
SIGNATURE SIGNATURE
Mrs. Papiya Dutta Mr. Rajender Yadav
HEAD OF THE DEPARTMENT GUIDE
GYAN GANGA COLLEGE OF TECHNOLOGY
CERTIFICATE
We would like to express my sincere gratitude to Dr. R.K Ranjan, Principal and Mrs. Papiya Dutta, H.O.D of
Electronics and Communication Department of Gyan Ganga College Of Technology for providing me with an
opportunity to do our minor project on “GESTURE RECOGNITION USING MATLAB”.
This project bears on imprint of many people. We sincerely thank our project guide Mr. RAJENDER YADAV,
Assistant Professor, Department of Electronics & Communication, Gyan Ganga College of Technology,
Jabalpur whose help, stimulating suggestions and encouragement, helped to coordinate the project especially in
writing this report. We would also like to acknowledge with much appreciation the crucial role of the officials
and other staff members of the institute who rendered their help during the period of project work. Last but not
the least we wish to avail this opportunity to appreciate and give a special thanks to every team member for
their supportive contribution, project enhancing comments and tips that improved the presentation skills, report
writing and brought about clarity in the software work.
Place: Jabalpur
Date:
TABLE OF CONTENTS
Index Page No.
Chapter 1:- Introduction to Hand Gesture Recognition 1
1.1 Introduction……………………………………………... 1
1.2 Motivation………………………………………………. 1
1.3 Gesture Analysis…………………………………………. 1
Chapter 2:- Objectives & Tools 3
2.1 Introduction……………………………………………… 3
2.2 Objectives………………………………………………... 3
2.3 Tools…………………………………………………….. 3
Chapter 3:- Literature Review & Algorithm 4
3.1 MATLAB Overview…………………………………… 4
3.2 Literature Review on Gesture Recognition…………….. 8
3.3 Neural Networks………………………………………... 9
3.4 Neuron Model…………………………………………... 10
3.5 Perceptron……………………………………………….. 11
3.6 Image Database………………………………………….. 12
3.7 Orientation Histogram…………………………………… 13
3.8 Operation…………………………………………………. 15
3.9 Algorithm………………………………………………… 17
Chapter 4:- Results & Discussion 19
APPENDIX I: - Commands 21
APPENDIX II: - Coding 23
References 28
CHAPTER 1
INTRODUCTION TO HAND GESTURE RECOGNITION
1.1 INTRODUCTION:
This project is to create a method to recognize hand gestures, based on a pattern recognition technique
developed by McConnell; employing histograms of local orientation. The orientation histogram will be used as
a feature vector for gesture classification and interpolation.
Computer recognition of hand gestures may provide a more natural-computer interface.
Hand gesture recognition is an important area of computer vision and pattern recognition field. Gestures are the
way by which one can communicate non-verbally.
Gesture recognition is a field, in which there is large number of innovations. Gestures can be defined as a
physical action, which is used to convey the information. There are various input – output devices for
interacting with the computer, but now days emphasis is given ,how to make human – computer interaction
more easy going, and for that purpose hand gesture recognition comes in light. Hand can be used as an input
device, by making its gesture understandable to computer, and for this purpose, this project aims at recognizing
the various hand gestures.
1.2 MOTIVATION:
Hand gesture recognition is done in this project by aiming basic shapes made by hand. Communication in our
daily life is generally vocal, but body language has its own significance, like hand gestures, facial expressions
and sometimes they play an important role in conveying the information. Hand gesture would be an ideal
option for expressing the feelings, or in order to convey something, like representing a number. It has many
areas of application like sign languages are used for various purposes and in case of people who are deaf and
dumb, sign language plays an important role. Gestures are the very first form of communication. So this area
influenced us very much to carry on the further work related to hand gesture recognition.
2
CHAPTER 2
OBJECTIVES & TOOLS
2.1 INTRODUCTION:
This project on “Gesture Recognition Using MATLAB” emphasizes on easy & swift communication using
minimum tools which can be easily accessible. The main objectives & tools used for this project are discussed
in the nest sections.
2.2 OBJECTIVES:
2.3 TOOLS:
HARDWARE:
SOFTWARE:
Windows XP/7
MATLAB 7.01
3
CHAPTER 3
LITERATURE REVIEW & ALGORITHM
MATLAB is an interactive system whose basic data element is an array that does not require dimensioning.
This allows solving many technical computing problems, especially those with matrix and vector formulations,
in a fraction of the time it would take to write a program in a scalar non-interactive language such as C or
FORTRAN.
The reason we have decided to use MATLAB for the development of this project is its toolboxes. Toolboxes
allow learning and applying specialized technology. Toolboxes are comprehensive collections of MATLAB
functions (M-files) that extend the MATLAB environment to solve particular classes of problems. It includes
among others image processing and neural networks toolboxes.
In 2004, MATLAB had around one million users across industry and academia. MATLAB users come from
various backgrounds of engineering, science, and economics. MATLAB is widely used in academic and
research institutions as well as industrial enterprises.
4
GRAPHICS AND GRAPHICAL USER INTERFACE PROGRAMMING
MATLAB supports developing applications with graphical user interface features. MATLAB includes
GUIDE (GUI development environment) for graphically designing GUIs. It also has tightly integrated graph-
plotting features. For example the function plot can be used to produce a graph from two vectors x and y. The
code:
x = 0:pi/100:2*pi;
y = sin(x);
plot(x,y)
A MATLAB program can produce three-dimensional graphics using the functions surf, plot3 or mesh.
[X,Y] = meshgrid(-10:0.25:10,-10:0.25:10);
f = sinc(sqrt((X/pi).^2+(Y/pi).^2));
mesh(X,Y,f);
axis([-10 10 -10 10 -0.3 1])
xlabel('{\bfx}')
ylabel('{\bfy}')
zlabel('{\bfsinc} ({\bfR})')
hidden off
5
This code produces a wireframe 3D plot of the two-dimensional unnormalize
sinc function:
[X,Y] = meshgrid(-10:0.25:10,-10:0.25:10);
f = sinc(sqrt((X/pi).^2+(Y/pi).^2));
surf(X,Y,f);
axis([-10 10 -10 10 -0.3 1])
xlabel('{\bfx}')
ylabel('{\bfy}')
zlabel('{\bfsinc} ({\bfR})')
This code produces a surface 3D plot of the two-dimensional unnormalized sinc function:
6
Figure 3: Surface 3D plot of 2D sinc function
In MATLAB, graphical user interfaces can be programmed with the GUI design environment (GUIDE) tool.
MATLAB can call functions and subroutines written in the C programming language or FORTRAN. A
wrapper function is created allowing MATLAB data types to be passed and returned. The dynamically
loadable object files created by compiling such functions are termed "MEX-files" (for MATLAB executable).
Libraries written in Perl, Java, ActiveX or .NET can be directly called from MATLAB, and many MATLAB
libraries (for example XML or SQL support) are implemented as wrappers around Java or ActiveX libraries.
Calling MATLAB from Java is more complicated, but can be done with a MATLAB toolbox which is sold
separately by MathWorks, or using an undocumented mechanism called JMI (Java-to-MATLAB Interface),
(which should not be confused with the unrelated Java Metadata Interface that is also called JMI).
As alternatives to the MuPAD based Symbolic Math Toolbox available from MathWorks, MATLAB can be
connected to Maple or Mathematica.
7
3.2 Literature Review on Gesture Recognition:
OBJECT
RECOGNITION
Figure 4
8
SHAPE RECOGNITION - If the hand signals fell in a predetermined set, and the camera views a close-up of
the hand, example-based approach may be used, combined with a simple method top analyze hand signals
called orientation histograms. These example-based applications involve two phases; training and running. In
the training phase, the user shows the system one or more examples of a specific hand shape. The computer
forms and stores the corresponding orientation histograms. In the run phase, the computer compares the
orientation histogram of the current image with each of the stored templates and selects the category of the
closest match, or interpolates between templates, as appropriate. This method should be robust against small
differences in the size of the hand but probably would be sensitive to changes in hand orientation.
SUPERVISED LEARNING: Supervised learning is based on the system trying to predict outcomes for
known examples and is a commonly used for training method. It compares its predictions to the target answer
and "learns" from its mistakes. The data start as inputs to the input layer neurons. The neurons pass the inputs
along to the next nodes. As inputs are passed along, the weighting, or connection, is applied and when the
inputs reach the next node, the weightings are summed and either intensified or weakened. This continues until
the data reach the output layer where the model predicts an outcome. In a supervised learning system, the
predicted output is compared to the actual output for that case. If the predicted output is equal to the actual
output, no change is made to the weights in the system. But, if the predicted output is higher or lower than the
actual outcome in the data, the error is propagated back through the system and the weights are adjusted
accordingly.
This feeding error backwards through the network is called "back-propagation."
UNSUPERVISED LEARNING: Neural networks which use unsupervised learning are most effective for
describing data rather than predicting it. The neural network is not shown any outputs or answers as part of the
training process--in fact, there is no concept of output fields in this type of system. The advantage of the neural
network for this type of analysis is that it requires no initial assumptions about what constitutes a group or how
many groups are there. The system starts with a clean slate and is not biased about which factors should be
most important.
Consist of sets of adaptive weights (numerical parameters that are tuned by a learning algorithm
10
Figure 6: Neuron
The scalar input p is transmitted through a connection that multiplies its strength by the scalar weight w, to
form the product wp, again a scalar. Here the weighted input wp is the only argument of the transfer function f,
which produces the scalar output a. The neuron on the right has a scalar bias, b. The bias is much like a weight,
except that it has a constant input of 1. The transfer function net input n, again a scalar, is the sum of the
weighted input wp and the bias b. This sum is the argument of the transfer function f. Here f is a transfer
function, typically a step function or a sigmoid function, that takes the argument n and produces the output a. w
and b are both adjustable scalar parameters of the neuron. The central idea of neural networks is that such
parameters can be adjusted so that the network exhibits some desired or interesting behavior.
Thus, the network can be trained to do a particular job by adjusting the weight or bias parameters, or perhaps
the network itself will adjust these parameters to achieve some desired end. All of the neurons in the program
written in MATLAB have a bias.. A
3.5 PERCEPTRON:
The perceptron is a program that learns concepts, i.e. it can learn to respond with True (1) or False (0) for
inputs presented to it, by repeatedly "studying" examples presented to it.
The structure of a single perceptron is very simple. There are two inputs, a bias, and an output. Both the inputs
and outputs of a perceptron are binary - that is they can only be 0 or 1. Each of the inputs and the bias is
connected to the main perceptron by a weight. A weight is generally a real number between 0 and 1. When the
input number is fed into the perceptron, it is multiplied by the corresponding weight. After this, the weights are
all summed up and fed through a hard-limiter. Basically, a hard-limiter is a function that defines the threshold
values for 'firing' the perceptron. For example, the limiter could be:
11
For example - If the sum of the input multiplied by the weights is -2, the limiting function would return 0. Or
if the sum was 3, the function would return 1.
To have the gestures same regardless of where they occur with the images boarders, position is ignored
altogether, and a histogram is tabulated of how often each orientation element occurred in the image. Clearly,
this throws out information and some distinct images will be confused by their orientation histograms. In
practice, however, one can choose a set of training gestures with substantially different orientation histograms
from each other.
One can calculate the local orientation using image gradients. In this project two 3 – tap x and y derivative
filters have been used. The outputs of the x and y derivative operators will be dx and dy. Then the gradient
direction is atan (dx, dy). The edge orientation is used as the only feature that will be presented to the neural
network. The reason for this is that if the edge detector was good enough it would have allowed testing the
network with images from different databases.
13
Another feature that could have been extracted from the image would be the gradient magnitude using the
formula below –
√
Where,
a= dx
b= dy
This would lead though to testing the algorithm with only similar images. Apart from this the images before
resized should be of approximately the same size. This is the size of the hand itself in the canvas and not the
size of the canvas. Once the image has been processed the output will be a single vector containing a number
of elements equal to the number of bins of the orientation histogram.
Figure shows the orientation histogram calculation for a simple image. Blurring can be used to allow
neighboring orientations to sense each other.
Step1
The first thing for the program to do is to read the image database. A for loop is used to read an entire folder of
images and store them in MATLAB‟s memory. The folder is selected by the user from menus. A menu will
firstly pop-up asking whether to run the algorithm on test or train sets. Then, a second menu will pop-up for the
user to choose which ASL sign he wants to use.
Step2
Resize all the images that were read in Step1 to 150x140 pixels. This size seems the optimal for offering
enough detail while keeping the processing time low.
Step3.
Next thing to do is to find the edges. As mentioned before 2 filters were used.
For the x direction x = [0 -1 1]
Step 4
Divide the two resulting matrices (images) dx and dy element by element and then take the atan ( tan−1 ). This
will give the gradient orientation.
Step 5
Then the MATLAB function im2col is called to rearrange the image blocks into columns. This is not a
necessary step but it has to be done to display the orientation histogram. Rose creates an angle histogram,
which is a polar plot showing the distribution of values grouped according to their numeric range. Each group
is shown as one bin. Below are some examples. While developing the algorithm those histograms are the
fastest way of getting a good idea how good the detection is done.
15
Orientation histogram of a_1 Orientation histogram of a_2
Figure 10
Here we can see the original images that generated the histograms above in the same order –
Figure 11
16
3.9 Algorithm
17
1. Read Text Files from Disk:
For multidimensional arrays RGB image is being considered. To access a sub-image, the syntax is,
subimage=RGB(20:40,50:80,:85);
For optimal number determination of neurons, always the highest among the input and output is taken.
Since we have taken 5 test images in our project, number of targets to be matched are 5.
Pre-processing of image is done for enhancement of image and also for getting results with minimum
error. In the proposed algorithm the image is pre-processed using RGB color model. Three primary
colors red(R), green (G), and blue (B) are used. The main advantage of this color space is its
simplicity..
4. Initialize Learning Layer:
This is a method for initialization of weights of neural networks to reduce the training time.
5. Train Perceptron:
A perceptron learns to distinguish patterns through modifying its weights. In the perceptron, the most
common form of learning is by adjusting the weights by the difference between the desired output and
the actual output.
6. Plot Error:
This step plots the graph of vector with error bars. The error is calculated by subtracting the output A
from target T.
7. Select Test Set:
Test images are selected so that they can be matched with the trained images to obtain the desired
output.
8. Display Output:
Finally, the output is displayed showing the similarity or difference between trained & test images
about their orientation histogram.
18
CHAPTER 4
RESULTS & DISCUSSION
4.1 RESULT:
4.2 DISCUSSION:
CONCLUSION:
We proposed a simple hand gesture recognition algorithm, followed by various steps like pre-processing,
image converted into RGB , so that varying lightening conditions will not cause any problem. Then smudge
elimination is done in order to get the finest image. These pre-processing steps are as important as any other
step. After performing the pre-processing on the image, the second step is to determine the orientation of the
image, only horizontal and vertical orientation is considered here and images with uniform background is
taken.
The strength of this approach includes its simplicity, ease of implementation, and it does not required any
significant amount of training or post processing as rule based learning is used. It provides the higher
recognition rate with minimum computational time.
The weakness of this method is that certain parameters and threshold values are taken experimentally that is it
does not follow any systematic approach for gesture recognition, and many parameters taken in this algorithm
are based on assumption made after testing number of images.
In this system we have only considered the static gesture, but in real time we need to extract the gesture form
the video or moving scene.
To realize the ultimate goal of humans interfacing with machines on their own natural terms gestures are
expressive, meaningful body motions involving physical movements of the fingers, hands, arms, head, face, or
body with the intent of:
1) Conveying meaningful information interacting with the environment
2) Gesture recognition is an extensively developed technology available designed to identify human position,
action, and manipulation. Gestures are used to facilitate communication with digital applications.
3)Among the various ways of gesture recognition like Hand, Face and Body Gesture Recognition, Hand
Gesture Recognition is efficient technique to o recognize human gestures due to its simple and greater
accuracy features.
19
FUTURE SCOPE:
The future scope lies in making this algorithm applicable for various orientations of hand gestures, also
different classification scheme can be applied. Gesture recognition could be used in many settings in the future.
The algorithm can be improved so that images with non uniform background can also be used, this will
enhance the human computer interaction.
Visually impaired people can make use of hand gestures for human computer interaction like
controlling television, in games and also in gesture to speech conversion.
Georgia Institute of Technology researchers have created the Gesture Panel System to replace
traditional vehicle dashboard controls. Drivers would change, for example, the temperature or sound-
system volume by maneuvering their hand in various ways over a designated area. This could increase
safety by eliminating driver‟s current need to take their eyes off the road to search for controls.
During the next few years, according to Gartner's Fenn, gesture recognition will probably be used
primarily in niche applications because making mainstream applications work with the technology will
take more effort than it's worth.
Hand recognition system can be useful in many fields like robotics, computer human interaction and so
make hand gesture recognition offline system for real time will be future work to do.
Support Vector Machine can be modified for reduction of complexity. Reduced complexity provides us
less computation time so we can make system to work real time.
Facial Gesture Recognition Method could be used in vehicles to alert drivers who are about to fall
asleep.
20
APPENDIX I
COMMANDS:
1. echo on: The commands in a script M-file will not automatically be displayed in the Command Window. To
2. clc: Clear command window clears all input and output from the command window display, giving a clear
screen.
3. pause: Each time MATLAB reaches a pause statement, it stops executing the M-file until the user presses a
key. Pauses should be placed after important comments, after each graph, and after critical points where your
script generates numerical outputs. Pauses allow viewer to read and understand results.
6. fid=fopen(‘train.txt’,’rt’): Opens the file with the type of access specified by permission. „rt‟ means Read
Text (t-text mode).
7. fscanf (file scan format): Read data from device, and format as text.
8. P1=fscanf(fid,’%f’,[19,inf]); :
A=fscanf(str,’format’,sizeA)
Read data from string and converts it according to format. Format is a C language conversion specification.
Conversion specification involves the % character and the conversion characters d, i, o, u, x, X, f, e, E, g, G, c
and s.
inf: Column vector with the number of elements in the file (default).
21
%f: It denotes floating point numbers. Floating point fields can obtain any of the following (inf, -inf, NaN or -
NaN).
[19]: Neural network interface. (Face segmentation using a Gaussian mixture model of skin or deformable
models of body / face parts)
12. Determine optimal number of neurons: For multidimensional arrays RGB image is being considered.
subimage=RGB(20:40,50:80,:85);
For optimal number determination of neurons, always the highest among the input and output is taken. So we
have,
S1=85
CODING:
echo on
clc
pause
clc
fid=fopen(„train.txt‟,‟rt‟);
P1=fscanf(fid,‟%f‟,[19,inf]);
P=P1;
%%Open some text file using code to write and fetch the required information about image
fid=fopen(„testA.txt‟,‟rt‟);
TS1=fscanf(fid,‟%f‟,[19,inf]);
Fid=fopen(„target8.txt‟,‟rt‟);
T=fscanf(fid,‟%f‟,[8,inf]);
%%It has been found that the optimal number of neurons for the hidden layer is 85
S1=85;
S2=5;
[W1,b1]=initp(P,S1);
[W2,b2]=initp(S1,T);
Pause
23
A1=simup(P,W1,b1); %First layer is used to preprocess the input vectors
TP=[1:500];
pause
clf reset
figure(gcf)
Setfsize(600,300);
[W2,b2,epochs,errors]=trainp(W2,b2,A1,T,TP);
pause
clc
ploterr(errors);
pause
If M==1
TS=TS1;
else
disp(„Wrong Input’);
a1=simup(TS,W1,b1);
a2=simup(a1,W2,b2);
echo off
%%Create a Menu
clc
If F==1
24
If K==1
loop=5
for i=1:loop
string=[„test\A\‟num2str(i)‟.tif‟];
Rimages{i}=imread(string);
end
end
end
end;
%%For training
If F==2
If L==1
for i=1:loop
string=[„train\A\‟num2str(i)‟.tif‟];
Rimages{i}=imread(string);
end
end
end
T{i}=imresize(Timages{i},[150,140]);
dx{i}=convn(T{i},x,‟same‟);
dy{i}=convn(T{i},y,‟same‟);
gradient{i}=dy{i}./dx{i};
25
theta{i}=atan(gradient{i});
cl{i}=im2col(theta{i},[1,1],‟distinct‟);
N{i}=(cl{i}*180)/3.14159265359;
C1{i}=(N{i}>0)&(N{i}<10);
S1{i}=sum(C1{i});
C2{i}=(N{i}>10.0001)&(N{i}<20);
S2{i}=sum(C2{i});
C3{i}=(N{i}>20.0001)&(N{i}<30);
S3{i}=sum(C3{i});
C4{i}=(N{i}>30.0001)&(N{i}<40);
S4{i}=sum(C4{i});
C5{i}=(N{i}>40.0001)&(N{i}<50);
S5{i}=sum(C5{i});
C6{i }=(N{i}>50.0001)&(N{i}<60);
S6{i}=sum(C6{i});
C7{i }=(N{i}>60.0001)&(N{i}<70);
S7{i}=sum(C7{i});
C8{i }=(N{i}>70.0001)&(N{i}<80);
S8{i}=sum(C8{i});
C9{i }=(N{i}>80.0001)&(N{i}<90);
S9{i}=sum(C9{i});
C10{i }=(N{i}>90.0001)&(N{i}<100);
S10{i}=sum(C10{i});
C11{i }=(N{i}>-89.9)&(N{i}<-80);
S11{i}=sum(C11{i});
C12{i }=(N{i}>-80.0001)&(N{i}<-70);
26
S12{i}=sum(C12{i});
C13{i }=(N{i}>-70.0001)&(N{i}<-60);
S13{i}=sum(C13{i});
C14{i }=(N{i}>-60.0001)&(N{i}<-50);
S14{i}=sum(C14{i});
C15{i }=(N{i}>-50.0001)&(N{i}<-40);
S15{i}=sum(C15{i});
C16{i }=(N{i}>-40.0001)&(N{i}<-30);
S16{i}=sum(C16{i});
C17{i }=(N{i}>-30.0001)&(N{i}<-20);
S17{i}=sum(C17{i});
C18{i }=(N{i}>-20.0001)&(N{i}<-10);
S18{i}=sum(C18{i});
C19{i }=(N{i}>-10.0001)&(N{i}<-0.001);
S19{i}=sum(C19{i});
D{i}=[S1{i} S2{i} S3{i} S4{i} S5{i} S6{i} S7{i} S8{i} S9{i} S10{i} S11{i} S12{i} S13{i} S14{i} S15{i}
S16{i} S17{i} S18{i} S19{i}];
close(W);
27
REFERENCES
[1] Klimis Symeonidis, “Final Report-Hand Gesture Recognition using Neural Networks”
[4] Christopher M.Bishop, “Neural Networks for Pattern Recognition”, Clarendon Press.Oxford 1995
[6] Thomas Holleczek, Daniel Roggen, “MATLAB based GUI for Gesture Recognition with Hidden
Markov Models”
28