Professional Documents
Culture Documents
Report
Report
Project Report
on
Virtual Mouse Using Hand Gesture and Voice Assistant
submitted in partial fulfillment for the award of
BACHELOR OF TECHNOLOGY
DEGREE
SESSION 2022-23
in
May, 2023
DECLARATION
We hereby declare that this submission is our own work and that, to the best of our knowledge
and belief, it contains no material previously published or written by another person nor material
which to a substantial extent has been accepted for the award of any other degree or diploma of
the university or other institute of higher learning, except where dueacknowledgment has been
made in the text.
Signature: Signature:
Signature:
Roll No.:1900290110056
iii
CERTIFICATE
Certified that the Project Report entitled “Gesture Control Virtual Mouse” submitted by Hrithik
Chandok (1900290110045), Manas Khare (1900290110056), and Huzaifa Ansari
(1900290110047) is their own work and has been carried out under my supervision. It is
recommended that the candidates may now be evaluated for their project work by the University.
Date: Supervisor
(Professor)
iii
ACKNOWLEDGEMENT
We wish to express our heartfelt gratitude to all the people who have played a crucial role in
the research for this project, without their active cooperation thepreparation of this project could
not have been completed within the specifiedtime limit.
We are also thankful to our project guide Prof Vinay Kumar sir who supported methroughout
this project with utmost cooperation and patience and for helping me in doing this Project.
Date:
Signature: Signature:
Signature:
Roll No.:1900290110056
iv
ABSTRACT
Mobile Gesture-controlled laptops and computers have recently gained a lot of traction.
Leap motion is the name for this technique. Simple gestures of our hand in frontof our
computer/laptop allows us to manage its operations. Unfortunately, employing these
techniques is more complicated. In the dark, these devices are difficult to see, and
manipulating them causes the presentation to be disrupted.Hand gestures are the most
natural and effortless manner of communicating. The camera’s output will be displayed
on the monitor.
The user will be able to see their image and gestures in a window for better accuracy.
The concept is to use a simple camera instead of a classic or standardmouse to control
mouse cursor functions. The Virtual Mouse provides an infrastructure between the user
and the system using only a camera.
It allows users to interface with machines without the use of mechanical or physical
devices, and even control mouse functionalities. This study presents a method for
controlling the cursor’s position without the need of any electronic equipment. While
actions such as clicking, dragging etc.will be carried out using various hand gestures. In
addition to that more functionality like volume and brightness control are given to the
user which creates an additional motivation to use this version of mouse. As an input
device, the suggested system will just require a webcam.
The suggested system will require the use of OpenCV and Mediapipe, python
programming environment and several other libraries as well as other tools. The python
dependencies that will be used for implementing this machine are NumPy, math,
pyautogi, pycaw, messagetodict, screen_brightness_control andothers.
In this paper, we present a singular approach for human computerinterplay (hci) in
which cursor motion is controlled using a real-time camera A manner to control the
Position of the cursor with the bare palms without using Any digital tool. While the
operations like clicking and dragging of objectscould be accomplished with special
Hand gestures.
v
The proposed gadget willrequire a Webcam as an input tool. The software’s that will be
Required to put in force the proposed machine are opencv and python. The output of the
camera may be displayed on the system’s display screen so that it could be similarly
calibrated by means of the user. Thecamera’s output will be presented on the system’s
screen so that the user can further calibrate it.
vi
TABLE OF CONTENTS Page
No.
DECLARATION……………………………………………………………………. ii
CERTIFICATE……………………………………………………………………… iii
ACKNOWLEDGEMENTS…………………………………………………………. iv
ABSTRACT………………………………………………….…………………….... v
LIST OF FIGURES………………………………………………………………….. x
LIST OF ABBREVIATIONS……………………………….………………………. xii
CHAPTER 1 (INTRODUCTION)………………………………………………….. 13
vii
CHAPTER 3 (HARDWARE AND SOFTWARE REQUIREMENTS) ......................... 25
viii
CHAPTER 7 (RESULTS AND ANALYSIS) …………………….................................. 59
REFERENCES 62
ix
LIST OF FIGURES
xi
LIST OF ABBREVIATIONS
xii
CHAPTER 1
INTRODUCTION
INTRODUCTION
Gesture Recognition has been very interesting problem in computer vision community
fora long time. Hand gestures are an aspect of body language that can be conveyed
throughthe center of the palm, the finger position and the shape constructed by the hand.
Hand gestures can be classified into static and dynamic. As its name implies, the static
gesture refers to the stable shape of the hand, whereas the dynamic gesture comprises a
series of hand movements such as waving. There are a variety of hand movements within
a gesture; for example, a handshake varies from one person to another and changes
according to time and place. The main difference between posture and gesture is that
posture focuses more on the shape of the hand whereas gesture focuses on the hand
movement.
Computer technology has tremendously grown over the past decade and has become a
necessary part of everyday live. The primary computer accessory for Human Computer
Interaction (HCI) is the mouse. The mouse is not suitable for HCI in some real life
situations, such as with Human Robot Interaction (HRI). There have been many
researches on alternative methods to the computer mouse for HCI. The most natural and
intuitive technique for HCI, that is a viable replacement for the computer mouse is with
the use of hand gesture.
Our vision was to develop a virtual mouse system that uses a Web camera to
communicate with the device in a more user-friendly way, as an alternative to using a
touch screen and physical mouse. In order to harness the full potential of a webcam, it
can be used for visionbased CC, which would effectively track the hand gesture predict
the gesture on basis of label.
The software enables user to control the complete functionality of a physical mouse, just
13
by using easy symbols and gestures. it utilizes a digital camera and computer
imagination and technology to control numerous mouse activities and is able to acting
each assignment that the bodily computer/laptop mouse can.
The major motivation of this project was in amidst of covid 19 spread. We wanted to
build a solution which enables the user to use their device without physically touching it.
It also reduces the e-waste and also helps to reduce the cost of
The proposed AI virtual mouse system can be used to overcome problems in the real
worldsuch as situations where there is no space to use a physical mouse and also for the
persons who have problems in their hands and are not able to control a physical mouse.
Also, amidst of the COVID-19 situation, it is not safe to use the devices by touching them
because it may result in a possible situation of spread of the virus by touching the devices,
so the proposed AI virtual mouse can be used to overcome these problems since hand
gesture and hand Tip detection is used to control the PC mouse functions by using a
webcam or a built-in camera.
current system is comprised of a generic mouse and trackpad monitor control system, as
well as the absence of a hand gesture control system. The use of a hand gesture to access
the monitor screen from a distance is not possible. Even though it is primarily
attemptingto implement, the scope is simply limited in the virtual mouse field.
The existing virtual mouse control system consists of simple mouse operations using a
hand recognition system, in which we can control the mouse pointer, left click, right
click, and drag, and so on. The use of hand recognition in the future will not be used.
Even though there are a variety of systems for hand recognition, the system they used is
static hand recognition, which is simply a recognition of the shape made by the hand and
the definition of action for each shape made, which is limited to a few defined actions
and causes a lot of confusion. As technology advances, there are more and more
alternatives to using a mouse.
The following are some of the techniques that were employed:
14
1.1.1 Head Control
A special sensor (or built-in webcam) can track head movement to move the mouse
pointeraround on the screen. In the absence of a mouse button, the software's dwell
delay feature is usually used. Clicking can also be accomplished with a well-placed
switch.
1.1.2 Eye Control
The cost of modern eye gaze systems is decreasing. These enable users to move the
pointeron the screen solely by moving their eyes. Instead of mouse buttons, a dwell delay
feature, blinks, or a switch are used. The Tobii PCEye Go is a peripheral eye tracker that
lets you use your eyes to control your computer as if you were using a mouse.
1.1.3 Touch Screens
Touch screens, which were once seen as a niche technology used primarily in special
education schools, have now become mainstream. Following the success of smartphones
and tablets, touch-enabled Windows laptops and all-in-one desktops are becoming more
common. Although this is a welcome new technology, the widespread use of touch
screenshas resulted in a new set of touch accessibility issues.
However, each of the methods below has its own set of disadvantages. The use of the
head or eyes to control the cursor regularly can be hazardous to one's health. This can
lead to a number of problems with health. When using a touch screen, the user must
always maintain their focus on the screen, which can cause drowsiness. By comparing
the following techniques, we hope to create a new project that will not harm the user's
health.
This project promotes an approach for the Human Computer Interaction (HCI) where
cursor movement can be controlled using a real-time camera, it is an alternative to the
current methods including manual input of buttons or changing the positions of a physical
computer mouse. Instead, it utilizes a camera and computer vision technology to control
various mouse events and is capable of performing every task that the physical computer
mouse can.
We’ll first use MediaPipe to recognize the hand and the hand key points. MediaPipe
returns a total of 21 key points for each detected hand. Palm Detection Model and Hand
15
Landmark Model are used by mediapipe to detect hand. First Palm is detected as it is an
easy process with respect to hand landmark. Then Hand Landmark Model performs
precisekey point localization of 21 3D hand-knuckle coordinates inside the detected hand
regions via regression, that is direct coordinate prediction.
We are detecting which finger is up using the tip Id of the respective finger that we found
using the MediaPipe and the respective co-ordinates of the fingers that are up, and
according to that, the particular mouse function is performed.
Then we apply formulas like distance calculation to track the gestures that are being done
by the user, for example if the distance between index finger and middle finger becomes
zero the perform single click operation by calling single click function.
Similarly, we have defined and deployed different various gestures like sliding up with
thumb and index finger increases the volume of the system, keeping a constant distance
between index and middle finger keeps the system in stable state, means does a neutral
gesture similarly moving joined fingers up or down will perform scrolling action. The
movement of palm performs drag gesture. The software can also detect multiple hands
but is deployed in such a way that only gestures of one hand are active a time so that
multiple hands don’t cause ambiguity to the system and user.
Current voting system is based on ballot machine where when we press the button with
the symbol the voting is done. Here there is a security risk, the person who votes may be
fake person voting. The people there might not know that a person is using fake voting
card, this may cause problem. Also, the person who has to vote should travel from
faraway places to his constituency to cast his vote. So, effective method is to use face
detection while voting online and enabling the right person to vote.
The proposed project will avoid covid-19 spread by means of putting off the human
intervention and dependency of gadgets to govern the computer. Amidst of the covid-
19
state of affairs, it isn't always safe to use the devices by way of touching them because it
16
is able to result in a possible situation of spread of the virus by using touching the gadgets,
sothe proposed ai digital mouse can be used to conquer these problems seeing that hand
gesture detection is used to manipulate the mouse functions by using the usage of a
webcam or a built-in digicam of the users computing device be it a pc, laptop, workstation
etc.
Hence, in public hotspots like cyber cafe, places of work, academic institutes, etc. An
person can operate operations on a laptop with no physical contact with it. This in turn
willreduce the spread of virus, bacteria’s and communicable diseases.
Home and office automation is a large discipline wherein gesture popularity is being
employed. As an example, clever tv’s can feel finger actions and hand gestures, offer
contact much less manage over lighting and audio systems.
Also, the project aims to reduce e-waste which is a very common concern with physical
hardware devices. The use of this software will eradicate the use of remotes from tv,
buttons from smart appliances, mouses from personal computers, laptops and work-
stations.
The project also reduces users’ cost, be it upfront cost for buying hardware with new
machines, or for old machines which install this software will not need to replace their
un- usable or are not working any more, we can simply use this software to control the
mouse controls of the system.
It's miles fair to say that the digital mouse will soon to be substituting the traditional
bodilymouse inside the close to future, as people are aiming towards the life-style where
that each technological gadgets may be controlled and interacted remotely without the
usage of any peripheral gadgets along with the remote, keyboards, and so on. It doesn't
simply give convenience; however, it is price effective as properly.
The software functionality can also be extended to Augmented reality applications likes
gaming and VR/AR headsets, gestures-based games etc. Apart from these mobile
applications based for android phones and smart tvs can be implemented, so that they can
be operated in wire-less mode.
17
CHAPTER 2
LITERATURE REVIEW
Different approaches have been employed to enable users to interact via speech. Three
of the commonly used are natural language, dialog trees, and commands.
Dialog tree systems reduce the difficulties in recognition by breaking down activity into
a sequence of choice points at which the user selects. The disadvantage of this approach
is that, the user will be unable to directly access the parts of a domain that are of
20
immediate interest.
Regardless of the difficulty in constructing such systems on the side of the designer, they
simplify the interaction as well as lessen the need for user training.
2.1.3 Cursor Control and Text Entry using Speech.
In different research areas including academic and commercial, voice-based cursor
control has been proposed to enable control of a cursor using speech input.
Lohr and Brugge proposed two approaches for simulation of cursor control using speech
and these are target-based and direction-based navigation. Direction-based mouse
emulation is done by having the cursor moving in the direction uttered by the user. In
their research, Sears et al implemented the movement of the cursor with commands like
“Move down”,”-up”, “-right” or “-left” and stops when “stop” is uttered. However, this
has precision bottleneck when the stopping command “stop” is uttered. In this case the
cursor keeps on moving as the speech recognizer will still be processing the command.
In thestudy by Igarashi and Hughes, mouse control was achieved by uttering the direction
followed by a non-verbal vocalization. The cursor will move as long as the
vocalizationlasts, e.g., “move down”. Using non-verbal vocal has the advantages of high
precision.
Harada et al used a similar approach. In their study they assigned each direction a specific
sound. In their system the user utters vowel sound corresponding to one of the desired
directions (“a” for up, “e” for right, “i” for down, and “o” for left). The speed of cursor
tobegin with, starts out slow and gradually increases with time. The cursor is stopped
byuttering the same vowel again and a click is performed by uttering a two-vowel
command(“a-e”). The advantage of this system is that it offers immediate processing of
vocal input.Target-based mouse emulation involves the definition of specific targets on
the screen andassigning speakable identifiers which will be displayed closely to the
target. Uttering anidentifier will place the cursor within the corresponding target. For
example, uttering thewidget name (widget used as target) causes the mouse cursor to be
placed over the button.However, this method suffers from layout and usability issues if
the number of widgets ishigh.
Vowel sounds (shown using International Phonetic Alphabet symbols) as a function
oftheir dominant articulatory configurations. (b) Mapping of vowel sound to direction in
the 8-way mode Vocal Joystick. In 4-way mode, only the vowels along the horizontal
21
and vertical axes are used. In order to counteract this effect, the number of targets might
be restricted by, for instance, dividing the screen into a course grained grid of named
cells. Nevertheless, a number of commands will be needed per task.
24
CHAPTER 3
HARDWARE AND SOFTWARE REQUIREMENTS
For the purpose of detection of hand gestures and hand tracking, the MediaPipe
frameworkis used, and OpenCV library is used for computer vision. The algorithm
makes use of the machine learning concepts to track and recognize the hand gestures
and hand tip.
3.1 HARDWARE REQUIREMENTS
The following describes the hardware needed in order to execute and develop the
VirtualMouse application.
3.1.1 Computer Desktop or Laptop
The computer desktop or a laptop will be utilized to run the visual software in order to
display what webcam had captured. A notebook which is a small, lightweight and
inexpensive laptop computer is proposed to increase mobility.
System will be using Processor : Core2Duo Main Memory : 4GB RAM Hard Disk :
320GB Display : 14" Monitor.
3.1.2 Webcam
Webcam is utilized for image processing, the webcam will continuously taking image in
order for the program to process the image and find pixel position.
Rather than building all of its functionality into its core, Python was designed to be highly
extensible via modules. This compact modularity has made it particularly popular as a
means of adding programmable interfaces to existing applications. Van Rossum's vision
of a small core language with a large standard library and easily extensible interpreter
stemmed from his frustrations with ABC, which espoused the opposite approach.
Python strives for a simpler, less-cluttered syntax and grammar while giving developers
a choice in their coding methodology. In contrast to Perl's "there is more than one way
to do it" motto, Python embraces a "there should be one—and preferably only one—
obvious way to do it" philosophy. Alex Martelli, a Fellow at the Python Software
Foundation and Python book author, wrote: "To describe something as 'clever' is not
considered a compliment in the Python culture."
26
3.2.2 OpenCV
OpenCV (Open Source Computer Vision Library) is a huge open-source library for
computer vision, machine learning, and image processing, cross-platform library using
which we can develop real-time computer vision applications. OpenCV supports a wide
variety of programming languages like Python, C++, Java, etc. It mainly focuses on
image processing, video capture and analysis to identify objects, faces, or even the
handwriting ofa human. It can be installed using "pip install opencv-python".
OpenCV was built to provide a common infrastructure for computer vision applications
and to accelerate the use of machine perception in the commercial products. Being a
BSD-licensed product, OpenCV makes it easy for businesses to utilize and modify the
code.
Computer Vision can be defined as a discipline that explains how to reconstruct,
interrupt, and understand a 3D scene from its 2D images, in terms of the properties of the
structure present in the scene. It deals with modeling and replicating human vision using
computer software and hardware.
Computer Vision overlaps significantly with the following
fields −Image Processing − It focuses on image manipulation.
Pattern Recognition − It explains various techniques to classify patterns.
Photogrammetry − It is concerned with obtaining accurate measurements from
images.
Image processing deals with image-to-image transformation. The input and output of
image processing are both images.
Computer vision is the construction of explicit, meaningful descriptions of physical
objects from their image. The output of computer vision is a description or an
interpretation of structures in 3D scene.
Applications of Computer Vision:
Here we have listed down some of major domains where Computer Vision is heavily used.
Robotics Application
1. Localization − Determine robot location automatically
2. Navigation
3. Obstacles avoidance
27
4. Assembly (peg-in-hole, welding, painting)
5. Manipulation (e.g. PUMA robot manipulator)
6. Human Robot Interaction (HRI) − Intelligent robotics to interact with and
servepeople
Single-shot detector model is used for detecting and recognizing a hand or palm in real
time. The single-shot detector model is used by the MediaPipe[3]. First, in the hand
detection module, it is first trained for a palm detection model because it is easier to train
palms. Furthermore, the non maximum suppression works significantly better on small
objects such as palms or fists. A model of hand landmark consists of locating 21 joint or
knuckle co-ordinates in the hand region.
28
Figure 2.4 – Co-ordinates or Landmarks on Hand
3.2.4 PyAutoGUI
PyAutoGUI is a cross-platform GUI automation Python module for human beings. Used
toprogrammatically control the mouse & keyboard. or we can say that it facilitates us to
automate the movement of the mouse and keyboard to establish the interaction with the
other application using the Python script. It can be installed by pip install pyautogui.
PyAutoGUI has several features:
3.2.5 Math
This module provides access to the mathematical functions defined by the C standard.
These functions cannot be used with complex numbers; use the functions of the same
29
name from the cmath module if you require for complex numbers. The distinction
between functions which support complex numbers and those which don’t is made since
most users do not want to learn quite as much mathematics as required to understand
complex numbers. Receiving an exception instead of a complex result allows earlier
detection of theunexpected complex number used as a parameter, so that the programmer
can determine how and why it was generated in the first place.
The following functions are provided by this module. Except when explicitly noted
otherwise, all return values are floats.
3.2.6 PyClaw
Python CydProtocol buffers (Protobuf) are a language-agnostic data serialization format
developed by Google. Protobuf is great for the following reasons: Low data volume:
Protobuf makes use of a binary format, which is more compact than other formats such
as JSON. Persistence: Protobuf serialization is backward-compatible.ia API Wrapper.
Youcan use PyCaw to retrieve information (JSON) of packages via the Cydia/Sileo API.
3.2.7 ENUM
Enum is a class in python for creating enumerations, which are a set of symbolic names
(members) bound to unique, constant values. The members of an enumeration can be
compared by these symbolic names, and the enumeration itself can be iterated over. An
enum has the following characteristics.
• The enums are evaluable string representation of an object also called repr().
• The name of the enum is displayed using ‘name’ keyword.
30
3.2.8 Screen_Brightness_Control
A Python tool for controlling the brightness of your monitor. Supports Windows and
mostflavors of Linux. We can install this library by pip install screen-brightness-
control.
31
CHAPTER 4
METHODOLOGY
We all use new technology development in our day to day life. Including our devices as
well. When we talk about technology the best example is a computer. A computer have
evolved from a very low and advanced significantly over the decades since they
originated.However we also use the same setup, which includes a mouse and keyboard..
Though the technology have made many changes in the development of computers like
laptop where the camera is now an integrated part of the computer. We still have a mouse
which is eitherintegrated or an external device.
This is how we have come across the implementation a new technology for Our mouse
where we can control the computer by finger tips and this system is known as
HandGesture Movement. With the aid of our fingers, we will be able to guide our cursor.
For this project we have used .
Python as the base language as it is an open source and easy to understand and
environment friendly. Ananconda is packaged python IDE that is shipped with tons of
important packages. It is an friendly environment. The packages that are required here is
PyAutoGUI and OpenCV. PyAutoGUI is a Python module for programmatically
controlling the mouse and keyboard. OpenCV through which we can control mouse
events.Red, Yellow, and Blue will be the three colors we use for our finger tips. It is a
program that uses Image Processing to extract required data and then adds it to the
computer'smouse interface according to predefined notions. Python is used to write the
file. It uses of the cross platform image processing module OpenCV and implements the
mouse actions using Python specific library PyAutoGUI.Real time video captured by the
Webcam is processed and only the three colored finger tips are extracted.
Their centers are measured using the system of moments, and the action to be taken is
determined based on their relative positions.
The first goal is to use the function cv2.VideoCapture().This function uses to capture the
live stream video on the camera. OpenCV will create an very easy interface to do this.
To capture a image we need to create an video capture object. We then covert this
captured images into HSV format. The second goal is to use the function
Calibratecolor().Using thisfunction the user will be able to calibrate
32
the color ranges for three fingers individually.
The third goal is to use the function cv2.inRange().In this function depending on
thecallibrations only the three fingers are extracted. We remove the noise from the feed
using the two stem morphism one is Erosion and
second is Dilation.
The next goal is to center the radius of the finger tip. So that we can start moving the
cursor. ChooseAction() is
used in the code to do this. The performAction() method uses the PyAutoGUI library to
perform all of the following actions: free cursor movement, left click, right click,
drag/select, scroll up, scroll down, and so on, depending on its performance.
33
events like redraw, resizing, input event etc. are processed by HighGui. So we call the
waitKey function, even with a 1ms delay[4].
34
ML Pipeline:
MediaPipe Hands utilizes an ML pipeline consisting of multiple models working
together: A palm detection model that operates on the full image and returns an oriented
hand bounding box. A hand landmark model that operates on the cropped image region
defined by the palm detector and returns high-fidelity 3D hand keypoints. This strategy
is similar to that employed in our MediaPipe Face Mesh solution, which uses a face
detector togetherwith a face landmark model.
Providing the accurately cropped hand image to the hand landmark model drastically
reduces the need for data augmentation (e.g. rotations, translation and scale) and instead
allows the network to dedicate most of its capacity towards coordinate prediction
accuracy. In addition, in our pipeline the crops can also be generated based on the hand
landmarks identified in the previous frame, and only when the landmark model could no
longer identify hand presence is palm detection invoked to relocalize the hand.
The pipeline is implemented as a MediaPipe graph that uses a hand landmark tracking
subgraph from the hand landmark module, and renders using a dedicated hand renderer
subgraph. The hand landmark tracking subgraph internally uses a hand landmark
subgraph from the same module and a palm detection subgraph from the palm detection
module.
Palm Detection Model :
To detect initial hand locations, we designed a single-shot detector model optimized for
mobile real-time uses in a manner similar to the face detection model in MediaPipe Face
Mesh. Detecting hands is a decidedly complex task: our lite model and full model have
to work across a variety of hand sizes with a large scale span (~20x) relative to the image
frame and be able to detect occluded and self-occluded hands
35
Figure 4.2 Neutral Gesture
Whereas faces have high contrast patterns, e.g., in the eye and mouth region, the lack of
such features in hands makes it comparatively difficult to detect them reliably from their
visual features alone. Instead, providing additional context, like arm, body, or person
features, aids accurate hand localization.
Our method addresses the above challenges using different strategies. First, we train a
palm detector instead of a hand detector, since estimating bounding boxes of rigid objects
like palms and fists is significantly simpler than detecting hands with articulated fingers.
Inaddition, as palms are smaller objects, the non-maximum suppression algorithm
workswell even for two-hand self-occlusion cases, like handshakes. Moreover, palms can
be modelled using square bounding boxes (anchors in ML terminology) ignoring other
aspect ratios, and therefore reducing the number of anchors by a factor of 3-5.
Second, an encoder-decoder feature extractor is used for bigger scene context awareness
even for small objects (similar to the RetinaNet approach). Lastly, we minimize the focal
loss during training to support a large amount of anchors resulting from the high scale
36
variance.
MAX_NUM_HANDS
Maximum number of hands to detect. Default to 2.
MODEL_COMPLEXITY
Complexity of the hand landmark model: 0 or 1. Landmark accuracy as well as inference
latency generally go up with the model complexity. Default to 1.
MIN_DETECTION_CONFIDENCE
Minimum confidence value ([0.0, 1.0]) from the hand detection model for the detection
to be considered successful. Default to 0.5.
MIN_TRACKING_CONFIDENCE:
Minimum confidence value ([0.0, 1.0]) from the landmark-tracking model for the hand
landmarks to be considered tracked successfully, or otherwise hand detection will be
38
invoked automatically on the next input image. Setting it to a higher value can increase
robustness of the solution, at the expense of a higher latency. Ignored if
static_image_mode is true, where hand detection simply runs on every image. Default to
0.5.
Output
Naming style may differ slightly across platforms/languages.
MULTI_HAND_LANDMARKS
Collection of detected/tracked hands, where each hand is represented as a list of 21 hand
landmarks and each landmark is composed of x, y and z. x and y are normalized to [0.0,
1.0] by the image width and height respectively. z represents the landmark depth with
the depth at the wrist being the origin, and the smaller the value the closer the landmark
is to the camera. The magnitude of z uses roughly the same scale as x.
MULTI_HAND_WORLD_LANDMARKS
Collection of detected/tracked hands, where each hand is represented as a list of 21 hand
landmarks in world coordinates. Each landmark is composed of x, y and z: real-world 3D
coordinates in meters with the origin at the hand’s approximate geometric center.
MULTI_HANDEDNESS
Collection of handedness of the detected/tracked hands (i.e. is it a left or right hand).
Each hand is composed of label and score. label is a string of value either "Left" or
"Right". score is the estimated probability of the predicted handedness and is always
greater than or equal to 0.5 (and the opposite handedness has an estimated probability of
1 - score).
4.4.3 Volume Controls
Increasing the volume of System using system hand gestures. Decreasing the volume of
system using hand gestures.
39
Figure 3.5 – Increase Volume Gesture
Now we can think on creating a straight line between the landmark 4 and 8, and
computingits length which will be proportional to the volume. We need to be careful with
the following, the length of this line might not be 0 even when we have our fingers
touching each other, because the landmark points are not on the edge and we don’t know
what is the distance in pixels when they are the farthest away from each other, therefore
we will have to print the length and create a UPPER_BOUND and LOWER_BOUND
based on this, second, the volume range that our package will can be 0 to 100, but
it can also besomething else, in any case we need to map our [LOWER_BOUND,
UPPER_BOUND] interval into [MIN_VOL, MAX_VOL].
Finally, just to make it look prettier, we will add draw a circle in the middle point that
will change color when both fingers are super close to each other and a volume bar to the
left.
40
4.4.4 Scrolling Commands
Dynamic Gestures for horizontal and vertical scroll. The speed of scroll is proportional
to the distance moved by pinch gesture from start point. Vertical and Horizontal scrolls
are controlled by vertical and horizontal pinch movements respectively.
Scroll up command is implemented. Scroll down command is implemented.
41
We have a hand tracking module already done, so let’s say we want to control the volume
of our computer by moving the thumb and index finger closer and further away from
each other. From before we now the thumb is landmark number 4 and the index is
landmark number 8.
42
CHAPTER 5
SOURCE CODE
43
44
45
46
47
CHAPTER 6
SCREENSHOTS
48
Figure 6.2 – Mouse Click
49
Figure 6.3 – Neutral Gesture
50
Figure 6.4 – Neutral Gesture
51
Figure 6.5 – Neutral Gesture
52
Figure 6.6 – Neutral Gesture
53
Figure 6.7 – Scroll Up and Down
54
Figure 6.8 – Low Brightness Gesture
55
Figure 6.9 – High Brightness Gesture
56
Figure 6.10 – Increase Volume Gesture
57
Figure 6.11 – Decrease Volume Gesture
58
CHAPTER 7
In the proposed AI vir tual mouse system, the concept of advancing the human- computer
interaction using computer vision is given.
Cross comparison of the testing of the AI virtual mouse system is difficult because only
limited numbers of datasets are available. The hand gestures and finger tip detection have
been tested in various illumination conditions and also been tested with different
distances from the webcam for tracking of the hand gesture and hand tip detection. An
experimental test has been conducted to summarize the results shown in Table 1. The test
was performed25 times by 4 persons resulting in 600 gestures with manual labelling, and
this test hasbeen made in different light conditions and at different distances from the
screen, and each person tested the AI virtual mouse system 10 times in normal light
conditions, 5 times in faint light conditions, 5 times in close distance from the webcam,
and 5 times in long distance from the webcam, and the experimental results.
The purpose of this project was to make the machine stand out it interacts with and
responds to human behavior. Alone The purpose of this paper was to make technology
accessible and is compatible with any standard operating system.
The proposed system is used to control the pointer for mouse by seeing a human hand
and inserting the cursor in the middle direction in the hand of man respectively. System
controlmouse activity as simple as left click, cursor pull and movement.
The path finds a human skin hand and follows it continuously with the movement of the
cursor at an angle between the fingers of a human hand the process performs the
functionof the left click.
59
CHAPTER 8
CONCLUSION
8.2 APPLICATIONS
Virtual mouse system is useful for many applications; it can be used to decrease the space
for using the actual mouse, and it can be used in situations where we cannot use the
physical mouse. The system eliminates the usage of devices, and it improves the human-
computer interaction.
Following are the more applications of the proposed system:
• The proposed model has a greater accuracy of which is far greater than the
that of other proposed models for virtual mouse, and it has many
applications.
• IN COVID-19 scenario, it is not safe to use the devices by physically
touching as it will result in a scenario of spread of the virus by touching the
devices, so the proposed virtual mouse can be used to control the PC mouse
functions without using the physical mouse.
• The system can be used to control robots and systems without the usage of devices.
• Can be used to play augmented reality games and use AR applications.
• Persons with some disability will be able to use the mouse.
61
REFERENCES
[1] Mokhtar M., Hasan, and Pramod K. Mishra. "Robust gesture recognition using gaussian
distribution for features fitting." International Journal of Machine Learning and Computing 2,
no. 3 (2012): 266
[2] “A Multi-Sensor Technique for Gesture Recognition through Intelligent Skeletal Pose
Analysis.” Nathaniel Rossol, Student Member, IEEE, Irene Cheng, Senior Member, IEEE, and
Anup Basu, Senior Member, IEEE (2015).
[3] Shining Song, Dongsong Yan, and Yongjun Xie “Design of control system based on hand
gesture recognition.” the Natural Science Foundation of Guangdong Province
(NO˖2017A030310184) ©2018IEEE.
[4] Xuhong Ma and Jinzhu Peng. “Kinect Sensor-Based Long-Distance Hand Gesture Recognition
and Fingertip Detection with Depth Information.” Hindawi Journal of Sensors Volume 2018,
Article ID 5809769, (2018).
[5] Liu Qiongli, Xu Dajun, Li Zhiguo, Zhou Peng, Zhou Jingjing and Xu Yongxia “A New
Distance Metric Learning Algorithm for Hand Posture Recognition.” 3rd International
Conference on Mechatronics and Industrial Informatics (ICMII 2015)
[6] Tran, DS., NH., Ho, Yang, HJ. et al. “Realtime virtual mouse system using RGB-D images and
fingertip detection” Multimed Tools Appl 80, 10473–10490, 2021.
[7] Sherin Mohammed Sali Shajideen, Preetha V H. ``Hand Gestures - Virtual Mouse for Human
Computer Interaction.” International Conference on Smart Systems and Inventive Technology
(ICSSIT 2018) IEEE Xplore Part Number: CFP18P17-ART; ISBN:978-1-5386-5873-4.
[8] “Cursor Control using Hand Gestures” Pooja Kumari, Saurabh Singh, Vinay Kr. Pasi
International Journal of Computer Applications (0975 – 8887) (2013).
[9] Sandeep Thakur, Rajesh Mishra, Buddhi Prakash “Vision based computer mouse control using
hand gestures.” (2015) International Conference on Soft Computing Techniques and
Implementations (ICSCTI).
[10] Chen-Chiung Hsieh, Dung-Hua Liou, David Lee “A real time hand gesture recognition
system using motion history image.” .IEEE Xplore: 23 August (2010).
62
[12] Bharath Kumar Reddy Sandra, Katakam Harsha Vardhan, Ch. Uday, V Sai Surya, Bala Raju,
Dr. Vipin Kumar "GESTURE-CONTROL-VIRTUAL-MOUSE." International Research Journal of
Modernization in Engineering Technology and Science (2012).
63