You are on page 1of 13

Hand Gesture Recognition System

CHAPTER 1
INTRODUCTION

1.1 INTRODUCTION
In our day to day life, communication plays important role for conveying information from
one person to another person. But it becomes very difficult for the people who are deaf and dumb
to communicate with normal people. Sign language is the only one way to communicate with them.
But normal people are unaware of sign language. So there is only one way and that is to covert
sign language into text & speech & vice versa. That is known as sign recognition. Sign language
is a combination of body languages, hand gestures and facial expressions. Among those hand
gestures are provides majority of the information and hence majority of the research is going on
decoding the hand gestures.

Normal people can communicate their thoughts and ideas to others through speech. The only
means of communication method for the hearing impaired community is the use of sign language.
The hearing impaired community has developed their own culture and methods to communicate
among themselves and with ordinary person by using sign gestures. Instead of conveying their
thoughts and ideas acoustically they convey it by means of sign patterns.

Sign gestures are a non-verbal visual language, different from the spoken language, but
serving the same function. It is often very difficult for the hearing impaired community to
communicate their ideas and creativity to the normal humans. This system was inspired by the
special group of people who have difficulties communicate in verbal form. It is designed with the
ease of use for the deaf or hearing impaired people.

The use of sign language is not only limited to individuals with impaired hearing or speech to
communicate with each other or non-sign-language speakers and it is often considered as a
prominent medium of communication. Instead of acoustically conveyed sound patterns, sign
language uses manual communication to convey meaning. It combines hand gestures, facial
expressions along with movements of other body parts such as eyes, legs, etc.
Hand Gesture Recognition System

Some of the challenges experienced by speech and hard of hearing people while
communicating with normal people were social interaction, communication disparity, education,
behavioral problems, mental health, and safety concerns. The ways in which one can interact with
computer are either by using devices like keyboard, mouse or via audio signals, while the former
always needs a physical contact and the latter is prone to noise and disturbances . Physical action
carried by the hand, eye, or any part of the body can be considered as gesture.

The essential aim of building hand gesture recognition system is to create a natural interaction
between human and computer where the recognized gestures can be used for conveying
meaningful information. The sign language is a very important way of communication for deaf-
dumb people. In sign language each gesture has a specific meaning. So therefore complex
meanings can be explained by the help of combination of various basic elements. Sign language
is a gesture based language for communication of deaf and dumb people. It is basically a non-
verbal language which is usually used to deaf and dumb people to communicate more effectively
with each other or normal people.

Sign language contains special rules and grammar's for expressing effectively. Basically there
are two main sign language recognition approaches image-based and sensor based. But lots of
research is going on image based approaches only because of advantage of not need to wear
complex devices like Hand Gloves, Helmet etc. like in sensor based approach. Gesture recognition
is gaining importance in many applications areas such as human interface, communication,
multimedia and security. Typically Sign recognition is related as image understanding. It contains
two phases: sign detection and sign recognition. Sign detection is an extracting feature of certain
object with respect to certain parameters.

This project proposes a system for recognizing signs used in ASL and interpreting them.
American Sign Language (ASL) is a natural language that serves as the predominant sign
language of deaf communities. Each sign in ASL is composed of a number of distinctive
components, generally referred to as parameters. A sign may use one hand or both. The Figure 1.1
shows the American Sign Language.
Hand Gesture Recognition System

Figure 1.1 American Sign Language

1.2 OVERVIEW OF IMAGE PROCESSING

The basic definition of image processing refers to processing of digital image, i.e removing
the noise and any kind of irregularities present in an image using the digital computer. The noise
or irregularity may creep into the image either during its formation or during transformation etc.
For mathematical analysis, an image may be defined as a two dimensional function f(x,y) where
x and y are spatial (plane) coordinates, and the amplitude of f at any pair of coordinates (x, y) is
called the intensity or gray level of the image at that point. When x, y, and the intensity values of
f are all finite, discrete quantities, we call the image a digital image. It is very important that a
digital image is composed of a finite number of elements, each of which has a particular location
and value. These elements are called picture elements, image elements, pels, and pixels. Pixel is
the most widely used term to denote the elements of a digital image.

Various techniques have been developed in Image Processing during the last four to five
decades. Image Processing systems are becoming popular due to easy availability of powerful
personnel computers, large size memory devices, graphics software etc. Image Processing is used
in various applications such as: Remote Sensing, Medical Imaging, Non-distructive Evaluation,
Forensic Studies, Textiles, Material Science, Military, Film Industry, Document Processing,
Graphic arts, Printing Industry.

The term digital image processing generally refers to processing of a two-dimensional


picture by a digital computer. In a broader context, it implies digital processing of any two-
dimensional data. A digital image is an array of real numbers represented by a finite number of
Hand Gesture Recognition System

bits. The principle advantage of Digital Image Processing methods is its versatility, repeatability
and the preservation of original data precision.

Image Acquisition Segmentation Representation and


Description

Preprocessing Knowledge Base Recognition and Result


Interpretation

Figure 1.2 : Block Diagram Of Image Processing System

As shown in the figure 1.7, the first step in the process is image acquisition by an imaging
sensor in conjunction with a digitizer to digitize the image. The next step is the preprocessing step
where the image is improved being fed as an input to the other processes. Preprocessing typically
deals with enhancing, removing noise, isolating regions, etc. Segmentation partitions an image
into its constituent parts or objects. The output of segmentation is usually raw pixel data, which
consists of either the boundary of the region or the pixels in the region themselves.

Representation is the process of transforming the raw pixel data into a form useful for
subsequent processing by the computer. Description deals with extracting features that are basic
in differentiating one class of objects from another. Recognition assigns a label to an object based
on the information provided by its descriptors. Interpretation involves assigning meaning to an
ensemble of recognized objects. The knowledge about a problem domain is incorporated into the
knowledge base. The knowledge base guides the operation of each processing module and also
controls the interaction between the modules. Not all modules need be necessarily present for a
specific function. The composition of the image processing system depends on its application. The
frame rate of the image processor is normally around 25 frames per second.
Hand Gesture Recognition System

The various Image Processing techniques are:

Figure 1.8: Image Processing Techniques

1.2.1 Image Enhancement:

Image enhancement operations improve the qualities of an image like improving the
image’s contrast and brightness characteristics, reducing its noise content, or sharpen the details.
This just enhances the image and reveals the same information in more understandable image. It
does not add any information to it.

1.2.2 Image Restoration:

Image restoration like enhancement improves the qualities of image but all the operations
are mainly based on known, measured, or degradations of the original image. Image restorations
are used to restore images with problems such as geometric distortion, improper focus, repetitive
noise, and camera motion. It is used to correct images for known degradations.

1.2.3 Image Analysis:


Hand Gesture Recognition System

Image analysis operations produce numerical or graphical information based on


characteristics of the original image. They break into objects and then classify them. They depend
on the image statistics. Common operations are extraction and description of scene and image
features, automated measurements, and object classification. Image analyze are mainly used in
machine vision applications.

1.2.4 Image Compression:

Image compression and decompression reduces the data content necessary to describe the
image. Most of the images contain lot of redundant information, compression removes all the
redundancies. Because of the compression the size is reduced, so image can be efficiently stored
or transported. Lossless compression preserves the exact data in the original image, but Lossy
compression does not represent the original image but provides excellent compression.

1.2.5 Image Synthesis:

Image synthesis operations create images from other images or non-image data. Image
synthesis operations generally create images that are either physically impossible or impractical to
acquire.

1.3 STAGES INVOLVED IN HAND GESTURE RECOGNITION METHOD

The proposed system has the following steps for hand gesture recognition:
1. Image Pre-Processing.
2. Segmentation
3. Feature Extraction.
4. Training and testing phase.
5. Classification.
6. Sign Recognition.
1.3.1 Image Pre-Processing
Input image through web cam is captured and it can be used to store as dataset for training
or as input image to recognize the character. The image is captured and stored in any supported
format specified by the device.
Hand Gesture Recognition System

Pre-processing is required on every image to enhance the functionality of image


processing. Captured images are in the RGB format. The pixel values and the dimensionality of
the captured images is very high. As images are matrices and mathematical operations are
performed on images are the mathematical operations on matrices. So we convert the RGB image
into Gray image using “rgb2gray” function and thus converting the Gray image into Binary image.
Image segmentation technique is used on the Binary image to detect the hand region.

Figure 1.2 : RGB to gray conversion.

1.3.2 Segmentation
Segmentation is done to divide image into two regions, background and the foreground
containing region of interest i.e. hand region. The segmented image has the hand region with the
pixel value ‘1’ and the background as the ‘0’. The image is then used as a mask to get the hand
region from the RGB image by multiplying the black and white image i.e. binary image with the
original RGB image. The image is resized to reduce size of the matrix used for the recognition
process. The images are then converted into column matrix for feature extraction.

Figure 1.4 : Gray to Binary Conversion


Hand Gesture Recognition System

1.3.3 Feature Extraction

Feature extraction is the most significant step in recognition stage as the size of the data
dimensionality is reduced in this stage. Here, features are extracted using Principle Component
Analysis (PCA), where we can get Eigen Values and Eigen Vectors. To calculate this Eigen Values
and Eigen Vectors, column matrix of all the images is formed and concatenated to form single
matrix, mean of this matrix is calculated and it is subtracted for normalization. The mean of the
matrix is calculated as:

μ = ∑𝑀
𝑛=1 𝑇𝑛 where, M is the number of column matrix.

Mean from each of the column vector of the database Ti is subtracted as:

Temp = A- μ

where, A is column matrix & μ is the mean.

1.3.4 Training and testing phase

Detection of human hand in white background will enhance the performance of image pre-
processing, in terms of accuracy and speed. Training phase and the recognition phase are two Sign
recognition phases.

Training phase: Training phase is the first phase, where the generation and storing of
dataset is done. Number of images per character stored is directly proportional to the accuracy of
the system. During generation of the training set the images are preprocessed. Column matrix is
generated for each of the images of dataset and concatenated matrix is formed. Using this matrix
Eigen vector is calculated which is used to get feature vector. This feature vectors are used further
for classification process.

Testing phase: This stage is the second stage which requires the input gesture, which is to
be recognized. Input image is normalized with the mean calculated of the dataset, and Eigen
vectors are used for projecting the input image on the dataset. Maximum score using Euclidean
Distance is calculated and the gesture is recognized to display the recognized character to
corresponding input gesture.
Hand Gesture Recognition System

1.3.5 Classification

Euclidean Distance is used to classify the gestures of hand. Feature vectors calculated for
the dataset and the Feature vector calculated for the input image are used to classify the character
related to the input gesture. Classification of character is based on the maximum score. Maximum
score is the maximum matching score i.e. minimum distance of input gesture with the dataset
image of the corresponding character.

𝑛
Ed = 2 ∑ (𝑞𝑖,𝑗 − 𝑝𝑖,𝑗 )2
𝑛=1

Where, j =1,2,…,m, n=total pixels in single image m= total number of images in the
dataset.

1.3.6 Sign Recognition

Detection of human hand in white background will enhance the performance of image pre-
processing, in terms of accuracy and speed. Training phase and the recognition phase are two Sign
recognition phases.

Training phase: Training phase is the first phase, where the generation and storing of
dataset is done. Number of images per character stored is directly proportional to the accuracy of
the system. During generation of the training set the images are preprocessed. Column matrix is
generated for each of the images of dataset and concatenated matrix is formed. Using this matrix
Eigen vector is calculated which is used to get feature vector. This feature vectors are used further
for classification process.

Testing phase: This stage is the second stage which requires the input gesture, which is to
be recognized. Input image is normalized with the mean calculated of the dataset, and Eigen
vectors are used for projecting the input image on the dataset. Maximum score using Euclidean
Distance is calculated and the gesture is recognized to display the recognized character to
corresponding input gesture.
Hand Gesture Recognition System

1.4 MOTIVATION

Sign language is widely used by individuals with hearing impairment to communicate with
each other conveniently using hand gestures. However, non-sign-language speakers find it very
difficult to communicate with those with speech or hearing impairment since it interpreters are not
readily available at all times, this motivated us to work on a system which can help non-sign-
language speakers in recognizing gestures.

Presently, there are many hardware sensors based hand gesture recognition techniques,
problem with most of them is there are wearable hardware sensors which is dependent on the
hardware and also prone to wear and tear. The life of such gadgets is limited. Similar concept is
proposed, but the concept of image processing is introduced. Also most of the technologies which
exists today is for the abled people, our solution can be targeted towards differently abled people
of the society.

1.5 PROBLEM STATEMENT

To design and implement an efficient Automatic system to convert sign language to speech
and text to sign language.

Input: Image of a hand gesture (sign).

Processing:

 The processing consist of determination of key features of the image followed by


training.
 Euclidean distance is used to classify the gestures of the hand.

Output : Text output and Audio output for a given sign.

1.6 SCOPE OF THE PROJECT

The scope of the project are,

 As a translating device for Mute people.


Hand Gesture Recognition System

 The system can be used at public places like Airports, Railway Stations, Counters of Banks,
Hotels, etc.

 System requires low power.

1.7 OBJECTIVES

The Objectives of the project are,

 To develop an automatic system to convert sign language to speech in a real time


environment.

 To propose a system that allows recognition of one or two hands sign language.

 To propose a system that can be trained to work for alphabets and words.

 To develop a respond back system to convert text to sign language in response to sign
language to speech system.

 To create analysis graphs based on the performance of the results obtained such as one or
two hand efficiency.

 It doesn’t provide optimal transmission which causes halo effect and colour distortion
problems.

1.8 ADVANTAGES

 Most of the technology is targeted for abled people, this system is for differently abled
people.

 Create solution where hardware involvement is less as it is targeted for differently abled
people like Deaf and dumb.

 Create a real time processing system.

 Affordable and easy to use system : MATLAB


Hand Gesture Recognition System

1.9 DISADVANTAGES

 Irrelevant object might overlap with the hand. Wrong object extraction appeared
if the object is larger than the hand.
 Ambient light affects the color detection threshold

1.11 ORGANIZATION OF THE REPORT

The report has been organized into the below chapters.

Chapter 1-Introduction: This chapter presents a brief description about Hand Gesture
Recognition System for Deaf and Dumb.

Chapter 2-System requirement specification: As the name suggested the second chapter
consisting of specific requirement, software and hardware requirements that used in this
project. Also we summarize this chapter at the end.

Chapter 3-High Level Design: This chapter contains design consideration, architecture
of the proposed system, and use case diagram.

Chapter 4- Detail design: The chapter 4 explains about the detail functionalities and
description of each module.

Chapter 5-Implementation: The chapter 5 explains about implementation requirements,


programming language selection and also coding lines for programming language used in
the project.

Chapter 6- Testing: The unit test cases for each of the module is discussed in this chapter
along with the referenced snapshots.

Chapter 7- Results and Discussions: This chapter gives the experimental results and
analysis of the proposed system and process.

Chapter 8 - Conclusion and Future work: This chapter gives the conclusion of the
project and the future enhancement.
Hand Gesture Recognition System

1.12 SUMMARY

The first chapter describes the Hand gesture recognition method in image processing. The
motivation of the project is discussed in section 1.4. Problem statement of the project explained in
section 1.5. And the Scope of the project is described in the section 1.6 and objectives are presented
in the section 1.7. The advantages of the project is discussed in section 1.8 and drawbacks are
presented in the section 1.9. Finally section 1.10 gives details of the literature survey reviews, the
important papers referred.

You might also like