You are on page 1of 44

IMPLEMENTATION OF VIRTUAL MOUSE

A PROJECT

Submitted in partial fulfilment of the


Requirements for the award of the degree

Of

BACHELOR OF TECHNOLOGY

By

Chhavi Kaushik 09518


Jasleen Kaur 10509
Vatsala Sharma 10513
Suresh 10557

Under the guidance

Of

Mr. Rajeev Kumar

COMPUTER SCIENCE AND ENGINEERING DEPARTMENT

NATIONAL INSTITUTE OF TECHNOLOGY

HAMIRPUR – 177005 (INDIA)

DECEMBER, 2013

 
COPYRIGHT © NIT HAMIRPUR (HP), INDIA, 2013

 
NATIONAL INSTITUTE OF TECHNOLOGY, HAMIRPUR
HIMACHAL PRADESH

CANDIDATE’S DECLARATION

I hereby certify that the work which is being presented in the project titled
“Implementation of Virtual Mouse” in partial fulfillment of the requirements for the
award of the Degree of Bachelor of Technology and submitted in the Computer Science
and Engineering Department, National Institute of Technology Hamirpur, is an authentic
record of my own work carried out during a period from August, 2013 to Dec, 2013
under the supervision of Mr. Rajeev Kumar (Assistant Professor), Computer Science
and Engineering Department, National Institute of Technology, Hamirpur.

The matter presented in this project report has not been submitted by me for the
award of any other degree of this or any other Institute/University.

Chhavi Kaushik Jasleen Kaur Vatsala Sharma Suresh


(09518) (10509) (10513) (10557)

This is to certify that the above statement made by the candidate is correct to the
best of my knowledge.

Date : (Mr. Rajeev Kumar)


Assistant Professor

The project Viva – Voce Examination of this group, has been held on:

Signature of Supervisor(s) Signature of External Examiner

                                                                                i 
 
ACKNOWLEGDEMENT

This project was a collective effort which fructified as a result of the endeavors put in by
every single person related to the project. This project work would remain
unaccomplished without acknowledging all those who have contributed towards it.

We would like to take this opportunity to express our heartiest gratitude and
sincere thanks to Mr. Rajeev Kumar, Assistant Professor, Computer Science and
Engineering Department National Institute of Technology Hamirpur (HP) for his
revered guidance throughout the project work which made this task a pleasant job and to
facilitate us with the departmental resources that proved to be very essential in exercising
all the aspects of the project.

The project work has given us a lot of experience and we hope that the
suggestions provide would help us to use this experience in our future projects.

Chhavi Kaushik 09518

Jasleen Kaur 10509

Vatsala Sharma 10513

Suresh 10557

                                                                                ii 
 
ABSTACT

Our project work involves an approach to implement a real-time hand gesture recognition
based “Virtual Mouse” that uses only a web camera and image processing techniques.
The applications of real time hand gesture recognition in the real world are numerous,
due to the fact that it can be used almost anywhere where we interact with computers.
Gesture recognition is a topic in computer science and language technology with the goal
of interpreting human gestures via mathematical algorithms. It provides interface with
computers using gestures of the human body, typically hand movements.

Hand gesture recognition is one obvious way to create a useful, highly adaptive interface
between machines and their users.  Hand gesture recognition technology would allow for
the operation of complex machines using only a series of finger and hand movements,
eliminating the need for physical contact between operator and machine.

We mark up the important features of hand: fingertips, palm center by computation


geometry calculation, provide real-time interaction between gesture and the system. The
most important and challenging part of it will be determining using only camera input
where our hand is and what action is being done, such as a left click.

                                                                                iii 
 
TABLE OF CONTENTS

Candidate’s Declaration………………………………………………………………..i

Acknowledgment ……………………………………………………………………...ii

Abstract………………………………………………………………………………..iii

Table of Contents……………………………………………………………………...iv

Chapter1 Introduction…………………………………………………………………1

1.1 Overview……………………………………………………………………..1

1.2 Project Description…………………………………………………………..1

1.3 System Description………………………………………………………….2

Chapter 2 Literature Review………………………………………………………….4

2.1 Software Information……………………………………………………..5

Chapter 3 Design Setup and Methodology…………………………………………..6

3.1 Preprocessing and Convex Hull Methods………………………………….6

3.1.a) Preprocessing Using Background Subtraction…………………………...6

3.1.b) Preprocessing Using Skin Color Detection………………………………7

3.1.2 Convex Hull……………………………………………………………..15

Convex Polygon…………………………………………………………15

                                                                                iv 
 
Definition of Convex Hull………………………………………………16

Three Coin Algorithm………………………..…………………….……17

Convexity Defect………………………………………………………..22

3.2 Palm Tracking and Finger Tip Detection……………………………..24

3.2.1 Minimum Enclosing Circle……………….…………………………25

Find Circle through Three Points……………..……………….………. 25

3.2.2 Finger Tip Detection………………………………………….……..27

3.2.3 Distance Threshold for Fingertips…………………………….…….28

3.3 Mouse Input Handling………………………………………….……..28

About Mouse Input……….……………………………………….……29

Mouse Cursor………………………………………………………….29

Mouse Capture………………………………………………………...29

Mouse Messages……………………………………………………...30

Chapter 4 Results and Discussions………………………………………………..31

1. Background-Foreground Segmentation Based………………………..31


2. Skin-Color Detection Based…………………………………………..32

Chapter 5 Conclusion and Further Scope…………………………………………33

References……………………………………………………………...35
                                                                                v 
 
List of Figure

1.1 System Architecture Overview…..……………………………………………3

3.1.1 Stereo picture of HSV color model………………………………………….9

3.1.2 Dilation Operation………………………………………………………….10


a. Set A……………………………………………………………………..10
b. Square Structuring element B……………………………………….......10
c. Dilation of A by B…………………………………………………….....10

3.1.3 Erosion Operation………………………………………………………….11


a. Set A
b. Square Structuring element B…………………………………………...11
c. Erosion of A by B……………………………………………………….11

3.1.4 Mask obtained after applying erosion and dilation to binary image……….11

3.1.5 Position of p1, p2 and p3 relative to the start point………………………..12


3.1.6 Move to p1 and change direction…………………………………………..13
3.1.7 Move to p2…………………………………………………………………13

3.1.8: Move to p3………………………………………………………………...13

3.1.9: Rotate when p1, p2 and p3 are none………………………………………14


3.1.10……………………………………………………………………………..14

(a) p1, p2 and p3 are none


(b) First rotation
(c) Second rotation

(d) Third rotation

                                                                                vi 
 
3.1.11…………………………………………………………………………….16

(a) Not a convex polygon

(b) Convex polygon

3.1.12 An example of convex hull…………………………………………….16


3.1.13 Any intersection of convex set is also a convex set…………………...17
3.1.14………………………………………………………………………….17
(a) Convex polygon
(b) Not a convex polygon
3.1.15 Graham’s sorting………………………………………………………18

3.1.16 An example of input points…………………………………………....18


3.1.17 Place three coins…………………………………………………...…..18
3.1.18 Move the end coin next to front coin……………………………...…..19
3.1.19 Move end coin next to front coin…………………………………...…19
3.1.20 Remove middle point……………………………………………….....19
3.1.21 Move end point next to front point………………………………...….20
3.1.22 (a-c) move end point next to front point…………………………...….20
3.1.23 Remove middle point……………………………………………….....21
3.1.24 Terminates when visit start point again, make a right turn…………....21
3.1.25 Convex hull of the input points……………………………………......21
3.1.26 Yellow parts represent the convexity defects……………………...….22
3.1.27 Starting and ending points of a convexity defect………………...……22
3.1.28 Convexity defect area……………………………………………..…..23
3.1.29 There are three convexity defects…………………………………..…23
3.1.30 The depth point of a convexity defect…………………………...……24
3.2.1 Center of circle formed by three points…………………………..…….26
3.2.2 Angle of a contour point…………………………………………….….28
4.1 Decay of foreground image when hand is static…………………….……31
4.2 Background similar color has been detected……………………….…….32

                                                                                vii 
 
CHAPTER 1: INTRODUCTION

1.1 Overview

Ever since the conception of computer systems, how to make human-machine


interaction possible and how to make it more efficient has always been an important
field of study. With the development of other fields of Computer Science, alternatives
for the traditional hardware used as input devices are being researched. One of these
areas of research is gesture recognition, where hand movements are used to control
computers. The key problem here is how to make computers understand these
gestures.

Currently available approaches for this are divided into “Data-Glove based”
and “Vision Based” approaches. The Data-Glove based methods use specialized
sensor devices placed on the hand, usually forming a glove of sorts (hence the name).
These sensors digitize hand and finger motions into multi-parametric data and these
extra sensors also make it easy to collect hand configuration and movement. The
results acquired this way show high accuracy but the cost of these devices is very
high. In contrast, the Vision Based methods require only a camera, thus realizing a
natural interaction between humans and computers without the use of any extra
devices. These systems tend to complement biological vision by describing artificial
vision systems that are implemented mostly in software. This approach is the cheapest,
and the most lightweight.

1.2 Project Description

In our project, we implemented software using the second approach


“Computer Vision” that handles the 2-D real- time video from a webcam and analyzed
it frame by frame to recognize the hand gesture at each frame. We used the image
processing techniques to detect hand poses from the image by passing the image
through many filters and finally apply our own calculations on the binary image to
detect the gesture.
The software must satisfy several conditions, including real time function. This is
needed for full interactivity and intuitiveness of the interface. This is measured


 
through fps (Frames per Second) which essentially provides information about the
refresh rate of the application. If the refresh rate is long, then there is a delay between
the actual event and the recognition. If gestures are performed in rapid succession, the
event may not be recognized at all. This is why real rime function is crucial.

Another required condition is flexibility, and how well it integrates with new
applications as well as existing applications. In order to be a candidate for practical
applications, the software must be able to accommodate external programs easily. This
is for the benefit of the application developer and the user. Finally, the software must
be reasonably accurate enough to be put to practical use.

To achieve all of these requirements, we took a huge number of training sets


of hand gestures for different people and in different environments and apply several
conditions to them in order to detect the correct gesture. Besides we have to choose a
suitable programming language to deal with the real time tracking for the hand, and
after a long search about this topic we decided to implement our software image
processing part with the help of Intel’s Open Source Computer Vision (OpenCV)
Library in C/C++ environment using Microsoft Visual Studio 2010 due to its
advantages in real-time image processing.

1.3 System Description

Our system and its computation are based on the contour of hand and fore-arm.
In our system, we assume that the input contour is the contour of hand and fore-arm.
We are not performing any hand or arm recognition. The original input data will be
the RGB image captured by a web camera. We perform some preliminary processing
to generate a binary image which provides enough information of hand contour.
The binary image will be used to calculate the contour and the convex hull of
the contour. The palm position can be initially estimated by the information we
extracted from convex hull, the fingertips position can be detected from the hand
contour. Then the system will make some modification of the initially estimation to
make our results more accurately. The computation cost of our system is low since
only computation geometry algorithms are performed; and there’s no need to apply
classifier and no pre-trained data base is needed.


 
Given figure shows the system achitecture.

Figure 1.1: System Architecture Overview


 
CHAPTER 2: LITERATURE REVIEW

As many hand-held devices such as laptops, phones, tablets etc. are becoming
increasingly popular. Traditional input devices such as mouse and keyboard are not
present in many mobile devices. Alternatives for these have been adopted.
A Virtual Technology which works on ‘Vision Based’ gesture recognition is
one possible alternative for future devices. There are many possible approaches to
develop this virtual technology, and it is a very active field of study. Some of the
approaches that can be adopted are:

1. Skin Based Detection


This is the simplest approach and works well in a static environment. The
system is given a range of colors that constitute the color of skin. A binary
mask for the pixels having color values belonging to this range is obtained. If a
plain background is there, this system can clearly recognize a user’s hand. But
it is highly susceptible to change in illumination and lighting conditions. This
system is not very dynamic.

2. Foreground-Background Segmentation
In this approach we first determine what pixels belong to the background. This
is done through comparison of subsequent frames of the video feed. The pixels
that are static in a series of consecutive frames belong to the background. In
this way a background image is determined. The varying pixels form the
foreground and a binary mask is obtained from them for further processing.

3. Using Haar Classifier Technique


This is a pattern recognition technique where we first train a classifier to
recognize a hand. The Haar classifier based on Haar-like features is a classifier
commonly used in face recognition systems, but can also be used for gesture
recognition.

In this project, the first two of these techniques have been implemented to
develop ‘Virtual Mouse’ System.


 
2.1 Software Information

Microsoft Visual Studio C/C++ environment.

Microsoft Visual Studio is an integrated development environment (IDE) from


Microsoft. Visual Studio includes a code editor supporting IntelliSense as well as code
refactoring. The integrated debugger works both as a source-level debugger and a
machine-level debugger. Visual Studio supports different programming languages by
means of language services, which allow the code editor and debugger to support (to
varying degrees) nearly any programming language, provided a language-specific
service exists. Built-in languages include C/C++ (via Visual C++), VB.NET (via
Visual Basic .NET), C# (via Visual C#). Support for other languages such as Python,
and Ruby among others is available via language services installed separately.

OpenCV 
OpenCV stands for Open Source Computer Vision Library. It is a collection of C
functions and few C++ classes that implement some popular algorithms of Image
Processing and Computer Vision. OpenCV is cross-platform middle-to-high level API
that consists of a few hundreds (>300) C functions. It does not rely on external
numerical libraries, though it can make use of some of them at runtime.


 
CHAPTER 3: DESIGN, SETUP AND METHODOLOGY

3.1 PRE-PROCESSING AND CONVEX-HULL METHODS


Now we will introduce how to extract the interested area from image. We use a
single web camera to capture a series of images. We can extract the interested area
from by two methods one is background subtraction and other is skin color detection
using HSV color space. A binary image will be produced and morphology processing
will be performed. Erosion will eliminate the noises while the dilation will fill up the
defects to smooth the contour of interested area.
Theo Pavlidi’s algorithm is used to obtain the contour sequence. Instead of using
connected-component, we can determine the contour we are interested in by choosing
the longest sequence. The convex-hull can be found by the Three-Coin algorithm it’s a
set of points which can wrap the contour like a rubber band. By comparing the
convex-hull area and contour area, convexity defect area can be calculated. Convexity
defect means the area which is in the convex-hull area but not in the contour area.
Convexity defects help us to find the features of a contour. The depth of a defect area
is the most important feature in this thesis. Palm position can be detected by a set of
defection depths. The start points and end points and also help us to find the convex
points in the contour. But it would be invalid in some conditions in hand gesture
estimation.

3.1.1 .a) Preliminary Processing Using Background –Foreground


Segmentation

The following are the steps taken to detect and tract the hand:

1. Background Construction:
Given the feed from the camera, the 1st thing to do is to remove the background.
We use running average over a sequence of images to get the average image which
will be the background too.

So as and when we keep receiving image, we construct the new background average
image as:

αCurBG[i][j] = αCurBG[i][j]+(1-α)CurFrame[i][j]


 
This equation works because of the assumption that the background is mostly static.
Hence for those stationary item, those pixels aren't affected by this weighted averaging
and αx + (1-α) x = x. Hence those pixels that are constantly changing isn't a part of the
background, hence those pixels will get weighed down. Hence the stationary pixels or
the background gets more and more prominent with iteration, while those moving gets
weighed out. Thus after a few iterations, you get the above average which contains
only the background. In this case, even my face is part of the background as it needs to
detect only my hands.

2. Background Subtraction

A simple method to start with is we can subtract the pixel values. However this
will result in negative values and values greater than 255, which is the maximum value
used to store an integer. Instead we use an inbuilt background subtractor based on a
Gaussian Mixture-based Background/Foreground Segmentation Algorithm.
Background subtraction involves calculating a reference image, subtracting each new
frame from this image and thresholding the result which results is a binary
segmentation of the image which highlights regions of non-stationary objects. This
reference image is the background image constructed.

3.1.1 .b) Preliminary Processing using skin color detection

The color model captured from a web camera is composed of RGB values, but it
will be influenced by the light very easily, we must convert the RGB color space into
another color space which is not sensitive to light variation. In order to make the
system achieve real-time processing and adapt to most of the environments, we choose
the color model with simple converting formulas and low sensitivity. Here we choose
HSV color space to extract human skin areas. The skin color areas will be represented
in binary image. Hence we will perform morphology processing including erosion and
dilation. Noises will be eliminated by erosion and the contour of skin color area will
be smoothed by dilation.


 
1. Skin color detection using HSV color space

The HSV model comprises three components of a color: hue, saturation, and
value. It can distinguish the value from the hue and saturation. A hue represents the
gradation of a color within the optical spectrum, or visible spectrum of light.
Saturation is the intensity of a specific hue, which is based on the purity of a color.
The practicability of the HSV model can be referred to two main factors: 1) the value
separated from the color image, and 2) both the hue and saturation related to human
vision. Hence, for developing a human vision-based system, the HSV model is an
ideal tool. The chromatic component (hue and saturation) and the luminance
component (value) of the HSV color model are defined by the stereo picture as shown
in Figure 3.1.1. All colors composed of three so-called primary colors (red, green, and
blue) are inside the chromatic circles. The hue H is the angle of a vector with respect
to the red axis. When H=0 the color is red. The saturation is the degree of a color that
has not been diluted with the white. It is proportional to the distance from the location
to the center of the circle. The longer distances are away from the center of a
chromatic circle, the more saturation colors will be.
The value V is measured by the line which is orthogonal to the chromatic circle
and passes through the center of the circle. It tends to be white along this center line to
the apex.
The relationship between the HSV and RGB models is expressed below:


 
Figure 3.1.1: Stereo picture of HSV color model

Also S is given by

        
   and V is given by

          

We distinguish the skin color from non-skin color regions by setting upper and
lower bound thresholds.

Morphology Processing

After generating of the binary image, we further perform morphological


operations on them to smooth hand segments and it also can eliminate some noises.
Two basic morphological operations are dilation and erosion. These two
morphological operations are described below


 
For two sets in 2 Z, A and B, the dilation of A by B denoted A B is defined as:

This equation is based on obtaining the reflection of B about its origin and shifting this
reflection by z. Then the dilation of A by B yields the set of all displacements, z, such
that ˆB and A overlap with at least one element. Therefore, above Eq. may be
rewritten as:

Set B is commonly referred to a structuring element in the dilation operation as well as


in other morphological ones. Figure 3.1.2 illustrates a simple result of dilation. Figure
3.1.2 (a) shows a simple set, and Figure 3.1.2 (b) shows a structuring element and its
reflection. Figure 3.1.2 (c) shows the original set for reference and the solid line
indicates the limit beyond which any further displacements. Therefore, all points
inside this boundary constitute the dilation of A by B.

Figure 3.1.2 Dilation operation (a) set A (b) square structuring element B (c)
dilation of A by B

Given two sets A and B in Z2 , the erosion of A by B denoted A B is defined as :

In words, the above equation manifests that the erosion of A by B is the set of all
points z such that B, translated by z, is contained in A. Figure 3.1.3 demonstrates a

10 
 
simple result of erosion where, Figure 3.1.3 (c) shows the original set for reference,
and the solid line stands for the limit beyond which any further displacements. In
consequence, all points inside this boundary constitute the erosion of A by B.

Figure 3.1.3 Erosion operation (a) set A (b) square structuring element B (c)
erosion of A by B.

The erosion and dilation operations are both the conventional and popular methods
used to clean up anomalies in the objects. The combination of these two operations
can also achieve good performance. Therefore, a moving object region is
morphologically eroded (once) then dilated (twice). Mask after applying erosion and
dilation on binary image is shown in the figure 3.1.4

Figure 3.1.4 Mask obtained after applying erosion and dilation to binary image

Contour Finding

After the previous processing, we should be able to obtain a binary image. The white
color pixels represent for skin color regions. The next step would be contour finding.
A contour is a sequence of points which are boundary pixels of a region. The contour
of those regions will be found so that we can disregard those small areas and focus on

11 
 
fore-arm area we are interested in. It can be done by comparing length of their
contour. The longest one is the one we are looking for. Only boundary pixels will be
visited. We use Theo Pavlidis algorithm to find contour. It works very well on 4-
connected which the fore-arm contour always tending to be.

The algorithm begins with a start point. Locating a start point can be done by
scanning each row of pixels from the button left corner. The scan of each row starts
from the leftmost pixel proceeding to the right. When a white pixel is encountered,
declare that pixel as start pixel. After deciding the start point, there are 3 pixels which
we are interested in. That will be P1, P2, P3 shown in Figure3.1.5. P1 is the upper-left
pixel with respect to the start. P2 is the upper pixel with respect to the start. And P3 is
the upper-right pixel with respect to the start.

Figure 3.1.5: Position of p1, p2 and p3 relative to the start point

When locating the start pixel, there are four cases. The first thing to do is to
check P1. If P1 is a white pixel, declare P1 to be your next start pixel and move one
step forward followed by one step to your current left to land on P1 as shown in Figure
3.1.6. The next P1, P2 and P3 will be changed according to the location and change of
orientation.

Figure 3.1.6: Move to p1 and change direction

12 
 
Only if P1 is a black pixel, then we check P2. If P2 is a white pixel, we declare
P2 to be the next start pixel and move one step forward to land on P2 as shown in
Figure 3.1.7. The next P1, P2 and P3 will be changed according to the location and
change of orientation.

Figure 3.1.7: Move to p2

If P1 is black pixel and P2 as well, we check P3. If P3 is a white pixel, we declare


P3 to be the next start pixel and move one step to your right followed by one step to
your current left to land on P2 as shown in Figure 3.1.8. The next P1, P2 and P3 will
be changed according to the location and change of orientation.

Figure 3.1.8: Move to p3

If none of P1, P2, and P3 are white pixel, the orientation will be rotated 90
degrees clockwise while standing on the same pixel as shown in Figure 3.1.9.
Afterwards we do the same procedure all over again.

13 
 
Figure 3.1.9: Rotate when p1, p2 and p3 are none

If the orientation has been rotated 3 times continuously without finding any white
pixel in P1, P2 or P3, it means that we are locating on an isolated pixel. The pixel is
not connected to any white pixels. This situation will cause the algorithm to terminate
as shown in Figure 3.1.10.

(a) p1, p2 and p3 are none (b) First rotation

(c) Second rotation (d) Third rotation

Figure 3.1.10

When the search is finished, couple of contours could be found in the


image. If the longest contour in the image is larger than a certain threshold, we choose
the contour to be the one we are interested in. Usually there will be only one or zero
large contour in the image otherwise the skin color segmentation might not be
effective. By performing the operation, noises in the image will be ignored.

14 
 
3.1.2 Convex Hull

Finding a convex hull is a well-known problem in computation geometry. Let us


imagine that there are several nails on the wall, a rubber band is used to surround those
nails. Only the peripheries will touch the rubber band while the nails inside won’t be
able to affect the rubber band. The shape of the rubber band is probably how the
convex hull of the nails will be looked like.
We calculate the convex hull of the fore-arm contour in order to find the desired
information. Usually the arm part has a smooth contour and doesn’t contain any
important information. The hand part has more convex and concave contours and it
usually contains the information we want. After the comparison of a fore-arm contour
and its convex hull, we find out that the convexity defects are around the palm area.
Hence we might be able to find the points which have the longest distance to the
convex hull in each convexity defect. Since the fore-arm is relatively smooth, so the
neighbor defect usually won’t create a point which has the longest distance to the
convex hull on the arm contour. By these points, we can determine the position of
palm even the fore-arm contour is included in the contour.
The 3-coin algorithm is applied to find the convex hull of a set of points. After
the convexity defects are found, we will discuss about the features we can find through
it.

Convex Polygon

Before we begin to explain convex hull, the concept of convex polygon needs to
be introduced first. A convex polygon is a polygon which does not contain any
concave part. In Figure the arrow is pointing to a concave part of a polygon.
Assume that we can choose any two points in a polygon (includes the
boundaries and the area which the boundaries are covering) and connect the two points
together within a straight line. If all the straight lines that any two points inside the
polygon can form doesn’t exceed the boundary of the polygon, we can say that the
polygon is a convex polygon. Figure 3.1.11 shows that we can find two points (A and
B) and connect them within a straight line. We can obviously see that red part of the

15 
 
line exceed the boundary, so we can say that the polygon is not a convex polygon.

Figure 3.1.11 (a) Not a convex polygon (b) convex polygon

Definition of Convex Hull

Once we know the definition of a convex polygon, we might be able to learn the
concept of convex hull. For a given nonempty set of points in a plane, the convex hull
of the set is the smallest convex polygon which covers all the points in the set. In
Figure 3.1.12 there are 10 points. The hexagon in the figure 3.1.12 is the convex hull
of the set. The six points which compose the hexagon are called “hull points”.

Figure 3.1.12: An example of convex hull

We know that a set S in a plane or in space is a convex polygon (or convex set) if
and only if whenever points X and Y are in S, the line segment XY must be contained
in S.

16 
 
The intersection of any collection of convex sets is also convex, as shown in Figure
3.1.13. For an arbitrary set W, the convex hull of W is the intersection of all convex
sets containing W. The boundary of the convex hull is a polygon with all the vertices
in W.

Figure 3.1.13: Any intersection of convex set is also a convex set.

Three-Coin Algorithm

The three-coin algorithm was developed by Graham and Sklansky trying to find
the convex hull of a given set of points. Suppose there is a convex polygon, if we take
a walk along the polygon’s edge, we would only make right turns (or left turns). If it is
not a convex polygon, a turn in an opposite direction will be expected.

Figure: 3.1.14 (a) Convex polygon (b) Not a convex polygon

In Graham’s algorithm, the input will be a set of points in a plane. Sklansky’s


algorithm was designed for a simple polygon in a plane. The sort gives number to each
point, and the algorithm processes according to the order. Sklansky assumes the input
to be a polygon. Once the starting point is decided, the algorithm follows the edges to
process, which is just like following the orders. Since the fore-arm contour we found
is a series of points, all of the points will be stored in a sequence structure; we

17 
 
obviously have the order of all the points. So we can skip the Graham’s sorting. Both
algorithms are the same except the sorting.

Figure: 3.1.15: Graham’s sorting

The fore-arm contour will be the input for the Sklansky’s algorithm. The three-

coin algorithm means that 3 points will be labeled each time. They will be represented

in different colors (for example, black, red and grey). The following figures will

illustrated the algorithm. Let’s take the polygon in figure 3.1.16 for example.

Figure 3.1.16: An example of input points

First of all, we choose a point to be the starting point, mark this point as a
black coin. The starting point must be a convex vertex. We can choose the point which
is the left most. The point after black point will be marked as a red coin. The point
after the red coin will be marked as grey point. We also called them end coin, middle
coin and front coin according to their order and regardless of their color.

Figure3.1.17 : Place three coins

18 
 
Let’s check the path from the end coin (currently the black coin) to the middle
coin (currently the red coin), then arrive the front coin (currently the grey coin). The
path forms a right turn. Whenever encounter a right turn, move the coin at the end coin
to the point next to the front point. So we have the black coin as front coin while the
grey coin is the middle coin and red coin turns out to be the end coin.

Figure 3.1.18: Move the end coin next to front coin

Once again, we check the path of the three coins. It forms a right turn again. So
we move the end coin to the point next to the front coin. The front coin is now the red
coin while the middle is the black point and the grey point become the end coin.

Figure 3.1.19: Move end coin next to front coin

We check the path of the three coins, this time it forms a left turn. Whenever
we encounter a left turn, we should delete the point where the current middle coin
stands on, and then move the middle coin to the point before the end point (currently
the grey point). Right now, our front coin remains the same (the red coin). But the
middle coin has been changed to the grey coin and the black coin is the end coin.

19 
 
Figure 3.1.20: Remove middle point

Let’s check the path again. It’s a right turn this time. So we move the end coin
to the point next to the front coin.

Figure 3.1.21: move end point next to front point

It forms a right turn again. So we move the end coin to the point next to the front
coin. Previous step still forms a right turn, so we do the same procedure again. Still
makes a right turn, so we do the same procedure.

Figure 3.1.22: (a-c) move end point next to front point

20 
 
We delete the point where the middle coin stands on right now (currently the red
coin). And we move the middle coin to the point before the end coin (currently the
grey coin).

Figure 3.1.23: Remove middle point


It forms a right turn. As we move the end coin to the point next to the front
coin, we find out that it comes back to the starting point. Therefore, the whole
procedure will be terminated.

Figure 3.1.24: Terminates when visit start point again, make a right turn
Link all the remained points. We got the convex hull of the set of points as shown in
figure 3.1.25

Figure 3.1.25 : Convex hull of the input points


We can successfully generate the convex hull of the fore-arm contour by applying the

three-coin algorithm as shown in figure 3.1.25. By observing the contours and its

convex hull, we notice that the areas which the convex hull contains but is not

included in the contour are called the convexity defect s. Convexity defect s give us

useful information about the shape of a contour.

21 
 
Convexity Defect
We calculate the convex hull of the fore-arm contour in order to get the
convexity defect of the contour. Convexity defect provides us very useful information
to understand the shape of a contour. Many characteristic of complicated contours can
be represented by convexity defects.
Figure illustrates the convexity defect of a star-like shape. The green lines
represent the convex hull of the star-like shape. As you can see in the figure, the areas
in yellow are contained in the convex hull. But they are not contained in the star.
Those areas are so called convexity defects.

Figure 3.1.26: Yellow parts represent the convexity defects

The previous section tells us that the points which form the convex hull must be
part of the contour. Our first step of searching a convexity defect will be finding the
starting point of a convexity defect on the contour. The starting point of a convexity
defect means a point on the contour which is also included in the convex hull points,
but the next point on the contour is not included in the convex hull points. The figure
3.1.26 illustrates the starting point of a convexity defect. The red point is the first point
which is included in the convex hull, but the next point is not included in the convex
hull. We mark the point as a starting point.

Figure 3.1.27: Starting and ending points of a convexity defect

22 
 
Once we know about the starting point, the ending point will be similar. We

define the ending point to be the point in the contour which is included in the convex

hull points, but the point before it is not included in the convex hull points. As we can

see in the figure the purple point is the ending point of a convexity defect.

By connecting the starting point, ending point and the points on the contour

between the starting point and the ending point, we get a convexity defect area as

shown in figure 3.1.27.

Figure 3.1.28: Convexity defect area

When all the points in the contour have been searched, we may find various
convexity defects as shown in figure. Each convexity defect is composed by a starting
point, ending point, and the points between them.

Figure 3.1.29: There are three convexity defects

Except the starting point and ending point, other useful information we can obtain
from a convexity defect will be the depth of the defect and the depth point. The depth
of the defect is the longest distance of all points in the defect to the convex hull edge

23 
 
of the defect. The point in the defect which has the longest distance to the convex hull
edge of the defect will be the depth point.

Firgure 3.1.30: the depth point of a convexity defect

As shown in figure 3.1.30, there is only one convexity defect. The purple point is
obviously the point with the longest distance to the defect’s convex hull edge. And the
length of the purple line is the depth of the purple point. Since the depth of the purple
is the longest, it will also be the depth of the convexity defect. And the purple point is
the depth point

3.2 PALM TRACKING AND FINGERTIP DETECTION

Now we will focus on fingertip and palm detection. We are able to extract the
point which has the longest depth in a convexity defect. The convexity defects with a
depth larger than a certain threshold tend to be appeared around the palm. That is
because the shape of fore-arm part is smooth. It’s not easy to generate a convexity
defect with a long depth. Meanwhile, the space between fingers will tend to be in a
long shape and the depth point will be on the edge of palm. The reason why is because
the palm is usually having more width than the arm.
Gathering the depth point gives us useful information to decide where the palm
position is. There are two ways to determine the palm center according to the depth
points. One is to enclose all the depth points within a minimum enclosing circle. It
calculates the smallest circle which covers all the depth points. The other way is to
calculate the average position of all the depth points. Both methods have their own
advantages and disadvantages. We decide to use both methods to mix up their
advantages also to reduce their disadvantages.
Another issue is about finding the fingertips. Since the fore arm contour that we
extracts is actually a series of points, we need a definition of fingertip in the contour.

24 
 
That will be the points in the end of a sharp shape. We applied an algorithm proposed
by to distinguish the fingertip points from a series of contour points.
When the all fingertips have been extracted, we are now able to know the palm
position and the fingertip positions. When our gesture is a fist, there will be no fingers.
The depth points will sometimes be the two points on the left and right side of the
wrist. This will cause the wrong palm position estimation.
Also the palm position we estimates will be shaking due to the constantly
changing of hand contour. Even if the contour only changes slightly, it will
sometimes affect some depth point . Since there are not much depth points,
changes from only a few of them will still cause the estimated palm position to
be changed.

3.2.1 Minimum Enclosing Circle

The minimum enclosing circle problem is to find the smallest circle that
completely contains a set of points. Formally, given a set of points S in a plane, find
the circle C with smallest radius which that all points in S are contained in C or its
boundary.
The minimum enclosing circle is often used in planning the location of a
shared facility. For example, a hospital or a post office will be a good shared
facility. If we consider each home in the community as points in a plane, the center
of the minimum enclosing circle will be a good place for the shared facility. We
apply this idea in finding the palm position because the ideas are similar. Since all
the depth points are around the hand palm and they share the same palm center, the
minimum enclosing circle of the depth points will be a good palm position
estimation.
The minimum enclosing circle can be found in O(n) time using the prune-
and-search techniques. This is one of the most impressive works in the field of
computational geometry. Assumed the total amount of points to be N, the
Megiddo’s algorithm discards at least N/16 points at each iteration without affecting
the result of minimum enclosing circle. This procedure can be repeatedly applied
until a basic case is reached. The total time will be 16N (N + 15/16N + 225/256N +
…) which is linear time.

25 
 
Find Circle through Three Points

It is obviously that three different points in a plane can form a unique circle.
The equation of the circle in a plane has three unknown variables.

When there are three points in a plane, we might be able to input their x and y
coordinate into the equation to generate three equations with three variables. That
makes unique solution. It also proves that three points can form a unique circle.
Instead of solving the simultaneous equations, we can first calculate the center of the
circle according to geometric property. When the center of the circle is calculated, it
will be easy to get the radius.
The circle is formed by three points P1, P2 and P3. Two lines can also be
formed through two pairs of three points. Line A passes through P1 and P2. Line B
passes through P2 and P3.

Figure 3.2.1: Center of circle formed by three points

The equations of these two lines are:

26 
 
Where m is the slope of the line and is given by:

The center of the circle is the intersection of the two lines which are
perpendicular to line A and line B and also pass the midpoints of P1P2 and P2P3. The
perpendicular of a line with slope m has slope -1/m, thus the equations of the lines
perpendicular to line A and line B which pass the midpoints of P1P2 and P2P3 are:

The two lines of the equations above intersect at the center point, so the
coordinate of the center point will be the common solution for the two equations
above. The equation below solves the x-coordinate of the center point:

will be zero only when the two lines are parallel, that implies that
they must be coincident. But we apply the calculation only when the three input
points are different, so we will avoid that situation. Since the center point has been
calculated, the calculation of the radius would be simple. Just compute the distance
between the center point and P1, P2 or P3 and we will get the radius.

3.2.2 Fingertip Detection

In this we apply the method which is proposed by Wen and Niu. The algorithm
marks up the fingertips on the contour by computing the angle of each point. In

27 
 
mathematic view, the problem is to find out the most aculeate point in a contour,
which might be the fingertips. In their proposal, the algorithm is experimented within
static images instead of real-time video. Also the hand contour includes only palm and
fingers. The fore-arm part is removed by wearing clothe to cover it up. We apply their
method in our real-time system and the contour of fore-arm is included.

To identify the most aculeate points in a contour, the angle of a point needs to be

defined. This can be done by the following equation:

Fingertip-Angle <X> = (X-X1) •(X-X2)/ | X-X1|| X-X2|

X1 and X2 are separated by X within the same distance. X1 is the previous point
with a certain number of points before X. X2 is the following point with a certain
number of points after X. The distance is 20 points in our experiment. The symbol (•)
means the dot product. The value of Fingertip-Angle <X> is between 0 and 1, because
it’s the cosine value of X1XX2. Figure illustrates the angle of a point.

Figure 3.2.2: Angle of a contour point

3.2.3 Distance Threshold for Fingertips

When the fingers are bending or when the gesture is a close hand, the contour can be
unsmooth so that some unexpected fingertips will be marked up.
We set a threshold for fingertips. The inner circle is the estimated palm circle and
the outer circle will be the threshold distance for fingertips. The distance between
fingertips and the palm center must be longer than the threshold; otherwise the
fingertips won’t show up. The outer circle also covers up all the unexpected fingertips
when the gesture is a close hand.

28 
 
3.3 MOUSE INPUT HANDLING

By applying vision technology and controlling the mouse by natural hand gestures,
we simulated the mouse system. We employ several image processing techniques and
we developed our own calculations to implement this. In this application, we needed
to track one hand only. And windows API are used to implement the mouse events.

About Mouse Input


The mouse is an important, but optional, user-input device for applications. A well-
written application should include a mouse interface, but it should not depend solely
on the mouse for acquiring user input. The application should provide full keyboard
support as well.

Mouse Cursor

When the user moves the mouse, the system moves a bitmap on the screen called the
mouse cursor. The mouse cursor contains a single-pixel point called the hot spot, a
point that the system tracks and recognizes as the position of the cursor. When a
mouse event occurs, the window that contains the hot spot typically receives the
mouse message resulting from the event. The window need not be active or have the
keyboard focus to receive a mouse message.

In this system, the calculated palm center is used to provide the current hot spot
location.

Mouse Capture

The system typically posts a mouse message to the window that contains the cursor
hot spot when a mouse event occurs. An application can change this behavior by using
the SetCapture function to route mouse messages to a specific window. The window
receives all mouse messages until the application calls the ReleaseCapture function
or specifies another capture window, or until the user clicks a window created by
another thread.

29 
 
Only the foreground window can capture mouse input. When a background
window attempts to capture mouse input, it receives messages only for mouse events
that occur when the cursor hot spot is within the visible portion of the window.

In the application developed one important fact is that the hand tracking is being
done only inside a particular window, which shows the camera input and the
computational processing results. But the motion of the mouse cursor has to be done
across the entire screen. For this appropriate scaling operation has to performed on the
palm center calculated to determine the final cursor position.

Mouse Messages
The mouse generates an input event when the user moves the mouse, or presses or
releases a mouse button. The system converts mouse input events into messages and
posts them to the appropriate thread's message queue. When mouse messages are
posted faster than a thread can process them, the system discards all but the most
recent mouse message.

A window receives a mouse message when a mouse event occurs while the
cursor is within the borders of the window, or when the window has captured the
mouse. Mouse messages are divided into two groups: client area messages and non-
client area messages. Typically, an application processes client area messages and
ignores non-client area messages.

In the Virtual Mouse System developed, based on the number of fingers at current
instant of time, a message for left click is delivered to the foreground application or
the application where the hot spot, ie the mouse cursor is located.

30 
 
CHAPTER 4: RESULTS AND DISCUSSIONS

The first and most basic step of developing any Vision-Based system using
gesture recognition is recognizing our hand and calculating an accurate mask for it.
There are many approaches that can be adopted to achieve this. In this project two of
these were tried out:

1. Background-Foreground Segmentation Based


This approach tries to determine the pixels that belong to foreground (in this
case, our hand) on the basis of change in their values. The pixels that change
their values over time are concluded to be in motion and hence these pixels are
collectively considered to form the hand. The constant pixels are the
background.
The main observations while implementing the virtual mouse system using this
approach are:
 A normal webcam usually captures images with a lot of noise. While
determining the constant background pixels, this might lead to some
erroneous pixels that the system concludes belong to foreground while
they actually belong to the background. Appropriate noise reduction
techniques can be employed to improve results somewhat.

 If we keep our hand static for some duration of time, the foreground
mask slowly starts decaying and becoming part of the background. To
recover from this, a long duration of time is needed without the hand in
the frame for the detected background image.

Figure: 4.1 Decay of foreground image when hand is static

31 
 
 In case the background and foreground are very similar in color,
illumination etc., this segmentation does not work very accurately.
The biggest drawback of this approach is that the background has to be completely
static throughout. Even the slightest changes in the background result in very erratic
results in the foreground mask detected. The mouse system becomes completely
inaccurate in such a situation.

2. Skin-Color Detection Based


In this approach the system determines which pixels belong to the hand
using the color of the pixels. A range of colors is provided to the system and
the pixels falling in between these ranges are used to build the binary mask
image.
The main observations while implementing this approach are:
 The first observation is very clear. The range of skin colors varies very
widely among people. People can be extremely fair to extremely dark
in complexion. The system will be able to track only a narrow range of
colors at one point. If we try to widen our range to include a wider
variety of colors, this would also lead to high inaccuracy as the
probability of including background objects with similar colors while
tracking the hand increases.
 The background again should not contain any objects that are similar to
the color of the skin, or the system again inaccurately classifies them as
belonging to the hand. The same problem arises if any bodies part of
the user in visible in the frame, such as his face, the other hand and in
some cases even the forearm.

Figure 4.2: Background similar color has been detected

32 
 
 The same is true for the case that whenever illumination and lighting
conditions change, the skin color varies accordingly. This again makes
our system inaccurate.
 The use of HSV color space instead of the usual RGB color format
does provide better results as colors described in HSV space is less
sensitive to changes in illumination.
 The results produced using this approach in a static system with a dark
background is accurate to a great extent.

Once a mask for the hand pixels is obtained further processing is done to calculate
attributes like palm center and number of fingers that provide basis for the mouse
input.

33 
 
CHAPTER5: CONCLUSION AND FUTURE SCOPE

In the Virtual Mouse system developed the overall accuracy or efficiency is directly
dependent on how correctly the system recognizes the hand, and how correctly it can
separate it out from the background. Many different methods can be adopted to do
this, each having its own advantages and disadvantages. For more accuracy a
combination of these methods can be adopted.
The future work in this project can include implementing other approaches and trying
to maximize the accuracy of such virtual systems. Other systems that can be
implemented using this Virtual Technology are a standard 26 character keyboard, a
calculator, a Gameboy remote control etc.

34 
 
RFERENCES

[1] R.C. Gonzalez and R.E. Woods, Digital Image Processing, 3rd Ed, 2008.

[2] http://docs.opencv.org/

[3] T. Pavlidis, “Algorithms for Graphics and Image Processing”, Computer Science
Press, Rockville, Maryland, 1982.

[4] R. L. Graham. “An efficient algorithm for determining the convex hull of a finite
Planar set”, Information Processing Letters, 7:175–180, 1972.

[5] Sklansky J., "Measuring Concavity on a Rectangular Mosaic", IEEE


Transactions on Computing, 21, p1355, 1972.

[6] M. Soriano, S. Huovinen, B. Martinkauppi, and M. Laaksonen, “Using the skin


locus to cope with changing illumination conditions in color-based face tracking,” in
Proceedings of the IEEE Nordic Signal Processing Symposium, pp. 383-386,
2000.

[7] http://en.wikipedia.org/wiki/Dot_product

[8] http://msdn.microsoft.com/en‐us/library/windows/desktop/ms645601(v=vs.85).aspx 
 

[9]http://en.wikipedia.org/wiki/Microsoft_Visual_Studio 

35 
 

You might also like