You are on page 1of 45

HUMAN COMPUTER INTERFACE BASED ON

FACE TRACKING FOR PHYSICALLY


CHALLENGED USERS

A PROJECT REPORT

Submitted by

ABDUL ASIM A.

AFSHAN S.

ANAND R.

in partial fulfillment for the award of the degree


of

BACHELOR OF TECHNOLOGY

in

INFORMATION TECHNOLOGY

B.S.ABDUR RAHMAN CRESCENT ENGINEERING COLLEGE,


VANDALUR
ANNA UNIVERSITY: CHENNAI 600 025
APRIL 2009
ANNA UNIVERSITY : CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “HUMAN COMPUTER INTERFACE BASED

ON FACE TRACKING FOR PHYSICALLY CHALLENGED USERS” is the

bonafide work of “ABDUL ASIM A. (40405205001), AFSHAN S. (40405205005) &

ANAND R. (40405205008)” who carried out the project work under my

supervision.

SIGNATURE SIGNATURE

Dr. T.R RANGASWAMY Dr. ANGELINA GEETHA

HEAD OF THE DEPARTMENT SUPERVISOR

Professor

Department of Information Technology Department of Computer Science

B.S.A CRESCENT ENGINEERING COLLEGE B.S.A CRESCENT ENGINEERING COLLEGE


SEETHAKATHI ESTATE SEETHAKATHI ESTATE
G.S.T. Road, Vandalur, G.S.T. Road, Vandalur,
Chennai - 600 048, India Chennai - 600 048, India

ii
ANNA UNIVERSITY : CHENNAI 600 025

VIVA VOCE EXAMINATION

The viva-voce examination of the following students who have submitted the
project work “HUMAN COMPUTER INTERFACE BASED ON FACE TRACKING
FOR PHYSICALLY CHALLENGED USERS” is held on _____________

ABDUL ASIM A. (40405205001)


AFSHAN S. (40405205005)
ANAND R. (40405205008)

INTERNAL EXAMINER EXTERNAL EXAMINER

iii
ACKNOWLEDGEMENT

We are grateful to our Principal, Dr. V. M. PERIASAMY, B.S.A. Crescent


Engineering College, for providing us an excellent environment to carry out our course
successfully.

We are deeply indebted to our beloved Head of the Department, Dr. T. R.


RANGASWAMY, Department of Information Technology, who moulded us both
technically and morally for achieving greater success in life.

We express our thanks to our project coordinator Ms. R. REVATHY, Senior Lecturer,
Department of Information Technology, for her valuable suggestions at every stage of
our project.

We record our sincere thanks to our guide Dr. ANGELINA GEETHA, Professor,
Department of Computer Science, for being instrumental in the completion of our
project with her exemplary guidance.

We thank all the staff members of our department for their valuable support and
assistance at various stages of our project development.

iv
TABLE OF CONTENTS

CHAPTER NO. TITLE PAGE NO.

ABSTRACT vii
LIST OF TABLE viii
LIST OF FIGURES ix
LIST OF ABBREVATIONS x

1. INTRODUCTION

1.1 Feature Detection 1


1.2 Face Detection 3
1.3 Algorithms on Face Detection 3
1.4 Human Computer Interface for 4
physically challenged users
1.5 HCI based on Mouse Movements 5
1.6 Related Works 6

2. PROBLEM DEFINITION 8
3. DEVELOPMENT PROCESS 9

3.1 Requirement Analysis and Specification 9

3.1.1 Input Requirements


3.1.2 Output Requirements 10
3.1.3 Functional Requirements 10
3.2 Resource Requirements 10
3.2.1 Hardware 11
3.2.2 Software 11

v
3.3 Design 12
3.3.1 System Architecture 12
3.3.2 Detailed Design 13
3.3.2.1 User Interface 14
3.3.2.2 Module Description 14
3.4 Implementation 19
3.5 Testing 23

4. APPLICATION AND FUTURE ENHANCEMENTS 25

5. CONCLUSION 26

APPENDIX A – SCREENSHOTS 27
REFERENCES 35

vi
ABSTRACT

Physically challenged people find it difficult to use a computer because

information is presented in an inaccessible form to them. Though many forms of

computer access are available for disabled people, these systems are expensive and

require sophisticated hardware support. In this context, this system focuses on helping

quadriplegic and non-verbal users. The challenge is to develop a Human Computer

Interface for such users which is inexpensive and easy to implement. Human Computer

Interface is a discipline concerned with the design, evaluation and implementation of

interactive computing systems for human use and with the study of major phenomena

surrounding them. We propose an interface for people with severe disabilities based on

face tracking. Body features like the eyes and the lips may also be used for implementing

a human computer interface but with some limitations. In eye tracking, the motion of the

pupil is hard to track with a web camera which would be the primary mode of input in the

proposed system. For a physically challenged user, moving the face itself demands

greater effort and hence finer intricacies eyeball and lip movement cannot be considered.

The system depends on a web camera for input and hence would be affordable by the

target users. User friendliness is enhanced as the system is devoid of any sophisticated

hardware requirement.

vii
LIST OF TABLES

S.No Table Name Page No


1. Hardware Resource requirement table 11
2. Software Resource requirement table 11

viii
LIST OF FIGURES

Figure.No Figure Name Page No


1.1 Head tracking system 2
3.1 Architecture diagram 12
3.2 System flow diagram 15
3.3 Code snippet for webcam capture 16
3.4 Code snippet for face detection 17
3.5 Code snippet for mouse pointer movement 18
3.6 Code snippet for playing video clips 18
3.7 Message Board 19
3.8 Algorithm flow diagram 22

ix
LIST OF ABBREVIATIONS

S.No Acronym Expansion


1. CAMSHIFT Continuous Adaptive Mean Shift
2. HCI Human Computer Interface
3. SDLC Software Development Life Cycle
4. GUI Graphical User Interface
5. MFC Microsoft Class Foundation
6. CLR Common Language Runtime
7. ATL Active Template Library
8. OpenCV Open Computer Vision
9. COM Component Object Model

1. INTRODUCTION

x
1.1 Feature Detection

Feature detection is a process by which specialized nerve cells in the brain respond
to specific features of a visual stimulus, such as lines, edges, angle, or movement. The
nerve cells fire selectively in response to stimuli that have specific characteristics.
Feature detection was discovered by David Hubel and Torsten Wiesel of Harvard
University.

In computer vision and image processing the concept of feature detection refers to
methods that aim at computing abstractions of image information and making local
decisions at every image point whether there is an image feature of a given type at that
point or not. The resulting features will be subsets of the image domain, often in the form
of isolated points, continuous curves or connected regions.

Feature detection is a low-level image processing operation. That is, it is usually


performed as the first operation on an image, and examines every pixel to see if there is a
feature present at that pixel. If this is part of a larger algorithm, then the algorithm will
typically only examine the image in the region of the features. As a built-in pre-requisite
to feature detection, the input image is usually smoothed by a Gaussian kernel in a scale-
space representation and one or several feature images are computed, often expressed in
terms of local derivative operations. Occasionally, a higher level algorithm may be used
to guide the feature detection stage, so that only certain parts of the image are searched
for features.

Once features have been detected, a local image patch around the feature can be
extracted. This extraction may involve quite considerable amounts of image processing.
The result is known as a feature descriptor or feature vector.

Types of tracking:

Eye Tracking:

xi
Eye tracking is the process of measuring either the point of gaze or the
motion of an eye relative to the head. An eye tracker is a device for measuring eye
positions and eye movements. There are a number of methods for measuring eye
movements. The most popular variant uses video images from which the eye position is
extracted. Other methods use search coils or are based on the electro-oculogram. Two
general types of eye tracking techniques are used: Bright Pupil and Dark Pupil. Their
difference is based on the location of the illumination source with respect to the optics. If
the illumination is coaxial with the optical path, then the eye acts as a retro-reflector as
the light reflects off the retina creating a bright pupil effect similar to red eye. If the
illumination source is offset from the optical path, then the pupil appears dark because
the retro-reflection from the retina is directed away from the camera.

Head Tracking:

Head tracking technology consists of a device transmitting a signal from atop the
computer monitor and tracking a reflector placed on the user's head or eyeglasses. A
mouse alternative as this allows the person to control the mouse cursor by moving his/her
head. Once calibrated, the movement of the user's head relates to what direction the
onscreen cursor will travel. An example of a head tracking system is given in Figure 1.1.

Figure 1.1: Head tracking system

1.2 Face Detection

xii
Face detection is a computer technology that determines the locations and sizes of
human faces in arbitrary (digital) images. It detects facial features and ignores anything
else, such as buildings, trees and bodies.

Face detection can be regarded as a more general case of face localization; In face
localization, the task is to find the locations and sizes of a known number of faces
(usually one). In face detection, one does not have this additional information.

Early face-detection algorithms focused on the detection of frontal human faces,


whereas newer algorithms attempt to solve the more general and difficult problem of
multi-view face detection which is the detection of faces that are either rotated along the
axis from the face to the observer (in-plane rotation), or rotated along the vertical or left-
right axis (out-of-plane rotation) or both.

Face detection is used in biometrics, often as a part of (or together with) a facial
recognition system. It is also used in video surveillance, human computer interface and
image database management. Some recent digital cameras use face detection for
autofocus. Also, face detection is useful for selecting regions of interest in photo
slideshows that use a pan-and-scale effect.

1.3 Algorithms on Face Detection

Neural Network-Based Face Detection by Rowley, Baluja and Kanade:

This is a neural network-based algorithm to detect upright, frontal views of faces in


gray-scale images. The algorithm works by applying one or more neural networks
directly to portions of the input image, and arbitrating their results. Each network is
trained to output the presence or absence of a face. The algorithms and training methods
are designed to be general, with little customization for faces. Many face detection
researchers have used the idea that facial images can be characterized directly in terms of
pixel intensities. These images can be characterized by probabilistic models of the set of

xiii
face images or implicitly by neural networks or other mechanisms. The parameters for
these models are adjusted either automatically from example images or by hand.

Algorithm by Henry Schneiderman and Takeo Kanade

This algorithm is a statistical method for three dimensional object detection. The
statistics of both object appearance and non-object is represented using histograms. Each
histogram represents the joint statistics of a subset of wavelet coefficients and their
position on the object. This approach uses many such histograms to represent a wide
variety of visual attributes. The algorithm is the first of its kind to reliably detect human
faces with out-of-plane rotation.

CAMSHIFT Algorithm

CAMSHIFT stands for "Continuously Adaptive Mean Shift.". It combines the


basic Mean Shift algorithm with an adaptive region-sizing step. The kernel is a simple
step function applied to a skin-probability map. The skin probability of each image pixel
is based on color using a method called histogram back projection. Color is represented
as Hue from the HSV color model. While it is a very fast and simple method of tracking,
because CAMSHIFT tracks the center and size of the probability distribution of an
object, it is only as good as the probability distribution that is produced for the object.

1.4 Human Computer Interaction for the Physically Challenged

Human–computer interface (HCI) is the study of interaction between people


(users) and computers. It is often regarded as the intersection of computer science,
behavioral sciences, design and several other fields of study. Interaction between users
and computers occurs at the user interface (or simply interface), which includes both
software and hardware, for example, general-purpose computer peripherals and large-
scale mechanical systems, such as aircraft and power plants.

xiv
Persons with severe motion impairment like biplegics, quadriplegics etc. face
difficulty in accessing computer-based systems since they cannot use conventional
computer access devices like mouse or keyboards. Alternate computer interfaces based
on tracking of body features needs to be developed for these users. The challenge lies in
designing a system which would serve as a general interface between computers and
physically challenged users.

1.5 HCI Based on Mouse Movements:

Pointing devices like the mouse and trackball enables users to control a pointer and
interact with a graphical user interface. The current human-computer interaction mode,
based primarily on the message board and the mouse, has seen little change since the
advent of modern computing. Currently computers come with cameras as standard
equipment. Hence it is desirable to employ them in designing next-generation human
computer interaction devices. The feasibility of interfaces based on speech driven input
has also been extensively investigated.

Relying on input based on human features has opened up the possibility of


developing interfaces for people who cannot use the keyboard or mouse due to severe
disabilities. Such systems make use of human features such as the head, eyes, lips or face
for tracking the movement of the user and translating the movements into mouse
movements on the screen. The purpose of this project is to develop an interface for
quadriplegic and non-verbal users.

1.6 Related Work

In the works of James Gips, Margrit Betke and Peter Fleming (2000), preliminary
investigations have been carried out for the design of a human computer interface for
xv
quadriplegic and non-verbal users. The system has been broken down into two main
components. The first component is the Vision Computer which receives real-time input
from a camera mounted on the monitor. The second component is the User’s Computer
which runs a special driver program in the background to translate the user’s movement
from the input device into mouse movements on the screen.

A camera mouse system was developed by James Gips, Margrit Betke and Peter
Fleming (2002). The system makes use of body features like the tip of the user’s nose or
finger or face to track the position of the mouse. Various body features are examined for
tracking reliability and user convenience. The visual tracking algorithm used in this
system is based on cropping an online template of the tracked feature from the current
image frame and testing where this template correlates in the subsequent frame. The
location of the highest correlation is interpreted as the new location of the feature in the
subsequent frame. Our system takes into consideration, part of the modules of the
algorithm for regular updating of the image frames.

We study the working of the CAMSHIFT algorithm proposed by Gary R.Bradski


(1998) to develop a Perceptual User Interface. Perceptual interfaces are the ones in which
the computer is given the ability to sense and produce analogs of the human senses. The
CAMSHIFT algorithm is a modification of the mean shift algorithm which is based on
probability distributions. The Continuous Adaptive Mean Shift (CAMSHIFT) algorithm
deals with dynamically changing color probability distributions derived from video
frames. Since CAMSHIFT relies on color distribution alone, errors in color will cause
errors in tracking.

A face detection algorithm based on skin color has been proposed by Sanjay Singh,
D.S. Chauhan, Mayank Vatsa and Richa Singh (2003). The authors have discussed
various algorithms based on skin color. Three main color spaces of RGB, YCbCr and
HIS have been combined to get a new skin color based face detection algorithm which
achieves higher accuracy. Our system involves face localization discussed in this
publication.
xvi
In the works of Rajesh Kumar and Anupam Kumar (2008), alternate input systems
to replace the traditional mouse and keyboard are discussed. The authors have developed
an input system which uses the head and eyes to track the movements of the user. The
algorithm is based upon image matching using correlation coefficients. The system
comprises of an image tracer module and cursor position is determined by calculating
correlation coefficient of tracing window in image space.

Ian R. Fasel and Javier R. Movellan (2002) have conducted a comprehensive


analysis of some techniques used in neutrally inspired face detectors. Algorithms such as
SNoW, AdaBoost and Bootstrap have been studied. The AdaBoost algorithm is based on
active sampling of images whereas its counterparts use random sampling. It has been
experimentally proven that Adaboost delivered consistent performance under various
conditions.

In the works of Zhaomin Zhu, Takashi Morimoto, Hidekazu Adachi, Osamu


Kiriyama , Tetsushi Koide and Hans Juergen Mattausch (2005), a face detection system
has been proposed based on Haar- like features. The detection technique is based on the
idea of the wavelet template that defines the shape of an object in terms of a subset of the
wavelet coefficients of the image. The object in this case is the human face.

Our system makes use of the Haar face detection algorithm to recognize and track
faces from real time video input. The main tasks involved are webcam capture, face
detection and translation of facial movements into mouse movements. A web camera is a
low-resolution capture device. The Haar face detection algorithm processes the video
feed using a large number of evaluations called classifiers to localize faces. This helps in
achieving a high degree of accuracy.

2. PROBLEM DEFINITION

xvii
People with severe disabilities resulting from birth or accidents or from
degenerative diseases and bed ridden patients have been excluded from access to
computers and even lack proper means of communication with fellow human beings.
Information is presented in an inaccessible form to them. They are unable to speak and
have very little or no voluntary muscle control. In most cases, these people are able to
move only their heads. Their level of mental functioning might not be known because of
their inability to communicate. People with severe physical disabilities often are isolated,
spending hours in bed or in a wheelchair at home or in an institutional setting.

Computer and communication technology can make all the difference in the world
for people with profound physical disabilities. Our approach is to develop a computer
interface for the disabled using facial tracking. The challenge is to develop a low cost
system devoid of any sophisticated hardware for input. The system should be free from
any special hardware to track the desired feature as this may cause inconvenience to the
user.

The facial movements of the user are captured using a webcam and translated into
mouse pointer movements after preprocessing and applying face detection algorithm.
Thus by moving the face, the user would be able to control the mouse. The interface
contains options for raising an alarm, summoning a nurse and playing audio and video for
entertainment. An on-screen message board has also been provided to enable the user to
communicate effectively.

3. DEVELOPMENT PROCESS

xviii
A software development process is a structure imposed on the development of a
software product. The activities concerned with the development of a software are
collectively known as Software Development Life Cycle (SDLC). SDLC is any logical
process used by a systems analyst to develop an information system, including
requirements, validation, training, and user ownership. An SDLC should result in a high
quality system that meets or exceeds customer expectations, reaches completion within
time and cost estimates, works effectively and efficiently in the current and planned.

3.1 Requirement Analysis and Specifications

The requirement engineering process consists of feasibility study, requirements


elicitation and analysis, requirements specification, requirements validation and
requirements management. Requirements elicitation and analysis is an iterative process
that can be represented as a spiral of activities, namely requirements discovery,
requirements classification and organization, requirements negotiation and requirements
documentation.

3.1.1 Input Requirements

The input for the human computer interface will be obtained from a web camera.
Since the interface would solely depend on the camera, care should be taken in choosing
the computer camera. A web camera is chosen over other mediums of video capture for
two reasons. First, a web camera is less expensive compared to other visual input devices
and this makes the system affordable to every individual. Also the web camera does not
require any specialized drivers or software support and this makes it easy for the
developer to access real-time video feeds.

3.1.2 Output Requirements


xix
The output will be the movement of the mouse pointer on the interface. The video
stream from the camera will be displayed at the center of the interface along with the
tracking of the face.

3.1.3 Functional Requirements

The facial movements of the user are captured through the camera in Visual C++.
The live video stream is fed to the face detection algorithm. The detected face is given as
input to the tracker module which translates the facial movements into mouse pointer
movements. This can be then be used to access the user interface.

3.2 RESOURCE REQUIREMENTS

Software requirements is a sub-field of Software engineering that deals with the


elicitation, analysis, specification, and validation of requirements for software.
Requirements analysis in systems engineering and software engineering, encompasses
those tasks that go into determining the needs or conditions to meet for a new or altered
product, taking account of the possibly conflicting requirements of the various
stakeholders, such as beneficiaries or users. Requirements analysis is critical to the
success of a development project. Requirements must be actionable, measurable, testable,
related to identified business needs or opportunities, and defined to a level of detail
sufficient for system design.

3.2.1 Hardware

The minimum hardware requirements for this project are listed in Table 1.

Table 1: Hardware Requirements

xx
Hardware Requirement

Processor Intel Pentium IV or AMD – 1.8 GHz


Memory 1 GB RAM
Hard Disk 1 GB
Video Capture Device Logitech or Microsoft web camera
(Input)

3.2.2 Software

The minimum software requirements for this project are listed in Table 2.

Table 2: Software Requirements

Software Requirement
Operating System Windows 2000/XP
Runtime Package Microsoft Visual C++, Intel OpenCV
Webcam Drivers Logitech/Microsoft SDK

3.3 DESIGN

Software design is a process of problem-solving and planning for a software


solution. After the purpose and specifications of software are determined, software
developers will design or employ designers to develop a plan for a solution. It includes
low-level component and algorithm implementation issues as well as the architectural
view.

xxi
3.3.1 System Architecture

Figure 3.1: Architecture Diagram

The architecture of the system is represented in Figure 3.1. The system receives
real time input from the user via a web camera. The vide o stream is accessed via the
webcam capture module. The vendor-supplied webcam software cannot be used for
interfacing the webcam and the face detection module.

The input from the camera is given to the face detection module. The core of the
face detection module contains the algorithm which works on localizing the facial
segments from the rest of the image. The algorithm is adapted to detect faces from
streaming video feeds.

After the face has been detected in the video stream, the movements of the face are
translated into mouse cursor movements on the screen and updated accordingly in real-

xxii
time. The position of the face is converted into onscreen coordinates and this is mapped
into mouse pointer coordinates in the tracker module. Hence, when the user moves his
face, the mouse cursor is moved correspondingly. This tracking module is interfaced with
the Graphical User Interface (GUI). Using the mouse movements, the user can interact
with the application interface.

3.3.2 Detailed Design

Our system provides an efficient way for bed ridden people to interact with a
computer and also provides an efficient communication system. The main tasks to be
accomplished in the development of the proposed system are as follows:

• Accessing the video stream from the video camera in real time

• Detecting the facial motion from the captured video

• Development of the user interface to aid the target users

• Translating the facial motion into an input format which can be used to
manipulate the user interface

• Triggering of control signals based on the translated input format

3.3.2.1 User Interface

The system has been developed in Microsoft Visual C++. The system can be
executed by running the project executable file. The web camera has to be setup and
initialized before executing the system. The system will automatically detect the web
camera provided there is only one active camera at execution time.

The web camera must be fixed and focused on the facial region of the target user.
Care should be taken to align the camera in this way. The system tracks the signals
captured by the web camera, analyses and detects the face region. As the video stream

xxiii
progresses, by applying the algorithm, the facial movement is detected. Once face
detection has been established, control passes to the mouse pointer and the user is able to
move the mouse pointer by moving his/her face.

At the center of the interface is a display window which shows the real time video
stream from the web camera. It displays the detected face which is updated constantly in
real-time. The interface has buttons to invoke various functions. The user is able to raise
an alarm, summon a nurse or play audio and video for entertainment purpose. An
onscreen message board can also be invoked for communication purposes. The invoked
function can be stopped using the stop button and the application can be closed using the
exit button provided in the interface.

3.3.2.2 Module Description

The basic flow of the system is represented in Figure 3.2. The Human computer
interface for physically challenged users is made possible by the video feed from the web
camera. The modules of the proposed system are as follows:

1. Webcam Capture module

2. Face Detector

3. Tracker module

4. Application Interface

xxiv
Figure 3.2: System Flow diagram

Webcam Capture module:

The input for the system is captured using the web camera. Lighting conditions
should also be favourable. The bundled software supplied with the camera can be used to
capture images and video. But this cannot be interfaced with the application to be
developed. Thus we capture the video stream from the camera in Visual C++ using
Microsoft DirectShow. Microsoft DirectShow is a part of the Microsoft Direct X SDK. It
is a set of low-level application programming interfaces for creating games and other
high performance multimedia applications. DirectShow automatically detects and uses
audio and video acceleration whenever available. The captured video stream is displayed
at the center of the user interface. The video stream is given as input to the face detection
module. The code for webcam capture is given in Figure 3.3.

xxv
// Capture from the camera
capture = cvCaptureFromCAM(-1);
// Capture the frame and load it in IplImage
frame = cvRetrieveFrame( capture );
// Allocate framecopy as the same size of the frame
if( !frame_copy )
frame_copy = cvCreateImage( cvSize(frame->width,frame->height),

IPL_DEPTH_8U, frame->nChannels );

Figure 3.3 : Code snippet for webcam capture

Face Detector module:

The facial movements of the user are captured from the web camera and given to
the face detector module. The algorithm used in our system is the Multi-view Face
Detection and Recognition Algorithm using Haar-like Features. Haar-like features are
digital image features used in object recognition. They owe their name to their intuitive
similarity with Haar wavelets. The feature set considers rectangular regions of the image
and sums up the pixels in this region. This sum is used to categorize images. We could
thus categorize all images whose Haar-like feature in this rectangular region to be in a
certain range of values as one category and those falling out of this range in another. This
might roughly divide the set of images into ones having a lot of faces and the ones not
having faces. We could thus categorize all images whose Haar-like feature in this
rectangular region to be in a certain range of values as one category and those falling out
of this range in another. This might roughly divide the set of images into ones having a
lot of faces. Once the face has been detected, a coloured box is drawn around the face to
localize it. The algorithm constantly localizes the face in the dynamic video stream.

xxvi
const char* cascade_name = "haarcascade_frontalface_alt.xml";
// Create a new image based on the input image
IplImage* temp = cvCreateImage( cvSize(img->width/scale,img->height/scale), 8, 3 );
// Detect the objects
CvSeq*faces=cvHaarDetectObjects(img,cascade,storage,1.1,2,
CV_HAAR_DO_CANNY_PRUNING, cvSize(40, 40) );

Figure 3.4 : Code snippet for face detection

Tracker module:

The face detector module draws a square around the localized face. The
coordinates of the square are passed as coordinates to the SetCursor function. This
enables the mouse pointer to move when the user moves his/her face. The coordinates are
multiplied by a scaling factor in order to enhance mouse movement. Mouse clicking
function is implemented using a time delay. When the mouse pointer hovers over a
button for a specified time, the button gets clicked. The code snippet for mouse control is
given in Figure 3.5.

//face coordinates
pt1.x = r->x*scale;
pt2.x = (r->x+r->width)*scale;
pt1.y = r->y*scale;
pt2.y = (r->y+r->height)*scale;
pt3.x=(pt1.x)*7;
pt3.y=(pt1.y)*7;
SetCursorPos(pt3.x,pt3.y);

xxvii
//mouse clicking
mouse_event(MOUSEEVENTF_LEFTDOWN,0,0,0,GetMessageExtraInfo());
mouse_event(MOUSEEVENTF_LEFTUP,0,0,0,GetMessageExtraInfo());

Figure 3.5 : Code snippet for mouse pointer movement

Application Interface:

The user interface is a Microsoft Class Foundation (MFC) Dialog based


application built in VC++. A face tracking display is present at the center of the user
interface to display the facial movements of the user. The following function buttons are
present around the face tracking display.

Emergency button - raises an alarm when clicked

Video - plays a small video clip as entertainment. The code


snippet for playing videos is given in Figure 3.6.
clock1=MCIWndCreate(GetSafeHwnd(),AfxGetInstanceHandle(), WS_VISIBLE|
WS_CHILD,"globe.avi");

Figure 3.6 : Code snippet for playing video clips

Audio - plays a small audio clip as entertainment

Message board - enables the user to display small messages to express their

needs. A screenshot is provided in Figure 3.7

xxviii
Figure 3.7 : Message board

Stop - stops the currently invoked function

Exit - used to exit the application

3.4 IMPLEMENTATION

Software implementation involves compilation and execution of the designed


system. Modular and subsystem programming code will be accomplished during this
stage. Unit testing and module testing are done in this stage by the developers. This stage
is intermingled with the next in that individual modules will need testing before
integration to the main project. Planning in software life cycle involves setting goals,
defining targets, establishing schedules, and estimating budgets for an entire software
project.

Microsoft Visual C++

xxix
Microsoft Visual C++ 2005 provides a powerful and flexible development
environment for creating Microsoft Windows–based and Microsoft .NET–based
applications. It can be used as an integrated development system, or as a set of individual
tools. Visual C++ is comprised of these components:

The Visual C++ 2005 compiler tools - The compiler has new features supporting
developers that target virtual machine platforms like the Common Language Runtime
(CLR) . There are now compilers to target x64 and Itanium. The compiler continues to
support targeting x86 machines directly, and optimizes performance for both platforms.

The Visual C++ 2005 Libraries - This includes the industry-standard Active
Template Library (ATL) , the MFC libraries, and standard libraries such as the Standard
C++ Library, and the C RunTime Library, which has been extended to provide security
enhanced alternatives to functions known to pose security issues. A new library, the C++
Support Library, is designed to simplify programs that target the CLR.

The Visual C++ 2005 Development Environment - Although the C++ compiler
tools and libraries can be used from the command-line, the development environment
provides powerful support for project management and configuration (including better
support for large projects), source code editing, source code browsing, and debugging
tools. This environment also supports IntelliSense, which makes informed, context-
sensitive suggestions as code is being authored.

In addition to conventional graphical user-interface applications, Visual C++


enables developers to build Web applications, smart-client Windows-based applications,
and solutions for thin-client and smart-client mobile devices. C++ is the world's most
popular systems-level language, and Visual C++ gives developers a world-class tool with
which to build software.

Intel OpenCV Library

xxx
The Intel Open Source Computer Vision (OpenCV) library is a computer vision
library originally developed by Intel. It is free for commercial and research use under a
BSD license. The library is cross-platform, and runs on Windows, Mac OS X, Linux,
PSP, VCRT (Real-Time OS on Smart camera) and other embedded devices. It focuses
mainly on real-time image processing, as such, if it finds Intel's Integrated Performance
Primitives on the system, it will use these commercial optimized routines to accelerate
itself. Officially launched in 1999, the OpenCV project was initially an Intel Research
initiative to advance CPU-intensive applications, part of a series of projects including
real-time ray tracing and 3D display walls. The library is mainly written in C, which
makes it portable to some specific platforms such as Digital signal processor. But
wrappers for languages such as C# and Python have been developed to encourage
adoption by a wider audience. Our system makes use of some functions present in this
library in the form of DLLs.

Microsoft DirectShow:

DirectShow codename Quartz, is a multimedia framework and API produced by


Microsoft for software developers to perform various operations with media files or
streams. It is the replacement for Microsoft's earlier Video for Windows technology.
Based on the Microsoft Windows Component Object Model (COM) framework,
DirectShow provides a common interface for media across many programming
languages, and is an extensible, filter-based framework that can render or record media
files on demand at the request of the user or developer. The DirectShow development
tools and documentation were originally distributed as part of the DirectX SDK.
Currently, they are distributed as part of the Windows SDK. DirectShow's counterparts
on other platforms include Apple's QuickTime framework and various Linux multimedia
frameworks such as GStreamer or Xine.

xxxi
Working of the Algorithm:

The algorithm used in our system is the Multi-view Face Detection and
Recognition Algorithm using Haar-like Features. This algorithm is designed for still
images. It has been modified to detect faces from streaming video feeds.

The working of the algorithm is as follows

Rectangular Scaling

Input Rectangle Haar-like


Sum Pixel
Image Node Feature
Calculation
Selection Calculation
Haar-like
Face
Feature
Detection
Comparison
Haar-like
features in
Database
Scaling

Figure 3.8: Algorithm Flow Diagram

The overall algorithm is depicted in Figure 3.8. The detection technique is based
on the idea of a wavelet template that defines the shape of an object in terms of a subset
of the wavelet coefficients of the image.

The input image is scanned across location and scale using a scaling factor of 1.1.
At each location and independent decision is made regarding the location of the face.

xxxii
This leads to a large number of classifier evaluations. Each classifier is a simple function
of rectangular sums followed by a threshold.

In each round of boosting, one feature is selected, that with the lowest weighted
error. In subsequent rounds incorrectly labeled examples are given a higher weight while
correctly labeled examples are given a lower weight. In order to reduce the false positive
rate while preserving efficiency, classification is divided into a cascade of classifiers. The
input is passed from one classifier to the next as long as each classifier classifies the
window as a face

An input window is evaluated on the first classifier of the cascade and if that
classifier returns false then computation on that window ends and detector returns false.
If the classifier returns true then the window is passed onto the next classifier in the
cascade. The next classifier evaluates the window in the same way. The more a window
looks like a face, more classifiers are evaluated on it and longer it takes to classify the
window.

3.5 TESTING

Testing is the process of evaluating the correctness, the quality and the
completeness of the system developed. Our system was tested across a variety of
applicants. It was found that the system was able to detect faces successfully in all cases.
The application is also able to pick out faces from considerably large distances. The user
requires some training in order to move the mouse efficiently. Face detection is found to
be efficient even with a normal web camera and under ordinary lighting conditions.
However, care should be taken to align the web camera with the facial region of the user
for optimum face detection.

xxxiii
4. APPLICATIONS AND FUTURE ENHANCEMENT

Our system is mainly targeted towards physically disabled people who are
quadriplegic and non-verbal and bed ridden patients. But this human computer interface
has other applications as well. It can be used as an alternative to the traditional mouse and

xxxiv
keyboard. It can be used to control the entire computer, browse the internet, prepare
documents etc. As the system is relatively inexpensive, it can be installed in hospitals as
a communication system for patients. The system may also be used as a hands-free
navigation device to access a computer. This facilitates multitasking. For example, a
doctor while performing a surgery can make use of this system to issue commands to a
computer.

The system can be enhanced with high resolution cameras like infra red cameras to
improve face detection. It can be interfaced with external mobile devices to enhance the
communication part. The system can be enhanced for use in biometric security systems.

5. CONCLUSION

The objective of this project is to provide an automated system which will capture
the facial movements of the target user and correlate it with mouse pointer movements on
the screen. The developed interface will enable quadriplegic and non-verbal users to
access a computer.
xxxv
A system has been developed for use by disabled people and bedridden patients. A
webcam interface captures the facial movements of the user. Face detection algorithm is
implemented and integrated with mouse movements on the screen. The system has been
integrated with four functions to aid physically challenged people. An emergency button
is provided for raising an alarm. Clicking on the audio button plays audio files for
entertainment. The video button is used to play videos for entertainment. An onscreen
message board has been provided for communication purposes. It helps the users to
display short messages to express their needs . The future focus is on enabling the system
to incorporate certain hardware based interfaces such as moving a robot.

Appendix A – Screenshots

MAIN INTERFACE

xxxvi
FACE TRACKING 1

xxxvii
FACE TRACKING 2

xxxviii
FACE TRACKING 3

xxxix
FACE TRACKING 4

xl
FACE TRACKING FOR A BED RIDDEN USER

xli
PLAYING VIDEO

xlii
MESSAGE BOARD

xliii
REFERENCES

xliv
1. Gary R. Bradski (1998), “Computer Vision Face Tracking for Use in a Perceptual
User Interface”, Intel Technical Journal Q2 ’98, Microcomputer Research Lab,
Santa Clara, CA, Intel Corporation.

2. James Gips, Margrit Betke and Peter Fleming (), “The Camera Mouse: Preliminary
Investigation of Automated Visual Tracking For Computer Access”, Computer
Science Department, Boston College, Chestnut Hill, MA 02467.

3. James Gips, Margrit Betke and Peter Fleming (2002), ”The Camera Mouse: Visual
Tracking of Body Features to Provide Computer Access for People With Severe
Disabilities”, IEEE Transactions on Neural Systems and Rehabilitation
Engineering, Vol. 10, No. 1.

4. Rajesh Kumar, Anupam Kumar (2008), “Black Pearl: An Alternative for Mouse
and Keyboard”, ICGST-GVIP, ISSN 1687-398X, Volume (8), Issue (III).

5. Sanjay Kr. Singh, D. S. Chauhan, Mayank Vatsa, Richa Singh(2003), ”A Robust


Skin Color Based Face Detection Algorithm”, Tamkang Journal of Science and
Engineering, Vol. 6, No. 4, pp. 227-234.

6. Zhaomin Zhu, Takashi Morimoto, Hidekazu Adachi, Osamu Kiriyama , Tetsushi


Koide and Hans Juergen Mattausch (2005), “Multi-View Face Detection and
Recognition using Haar-like Features”, Research Center for nano-devices and
systems, Hiroshima University.

xlv