You are on page 1of 7

20th International Multitopic Conference (INMIC'17)

CMSWVHG–Control MS Windows via Hand Gesture


Hafiz Mohsin Abdul-Rashid, Lehmia Kiran, M.Danish Mirrani, M. Noman Maraaj
hmohsin478@gmail.com, lehmia.kiran@nu.edu.pk, Danish_mirrani@hotmail.com,numanmaraaj@gmail.com
National University of Computer and Emerging Sciences, FAST-NU, Pakistan

Abstract—CMSWVHG (Control MS Windows via hand Gesture) module, Gesture recognition module and the module for
provides the facility to users to perform numerous windows performing window's activity. System takes input from
actions, those can be performed by keyboard and mouse, through camera using OpenCV and after processing image frames it
hand gesture. The proposed system recognizes different hand detect the hand and recognize the posture then it recognizes
gestures to enable natural Human computer interaction without the performed gesture and at the end it performs the
using any external device. This gesture controlled system reveals corresponding defined mouse or keyboard action.
the technology for those users who are uncomfortable with input
devices like keyboard or mouse. By keeping in view different II. LITERATURE REVIEW
detection and recognition techniques which already exists in A lot of work done has been done in this field but that work is
literature, this methodology provides a device less environment to related to Hand recognition, gesture recognition [3] Real time
control MS Windows. The individual techniques used for finger tracking [2] Gesture recognition of alphabet characters
performing a single task like hand detection, Gesture recognition [12] and comparisons of many algorithms for doing the same
etc. are combined together to give a broad view of technology in things. There is a little work on applying these gesture for
form of this gesture controlled application CMSWVHG. This controlling some system. But our system is something
application uses internal or external camera of computer for advanced and add on to the previous work, which is need of
taking input with the help of OpenCV [4]. The gestures used for the time. There are a lot of automated system controlled
controlling some actions of mouse and keyboard are very simple, through hand gestures like home automation [8] and Human
so that the humans can interact with computer easily which is the Vehicle Interaction [9] [16] similarly, our system controls
basic purpose of this system and research. some action of MS windows operation system with gestures
as an input [4].
Keywords—Detection, Gesture, K nearest neighbors, Opencv,
Posture, Recognition, True positive, True negative, False positive, In [13] author presents a real time hand gesture recognition
False negative. system for dynamic applications which can be used in virtual
reality, gaming and in sign language which is a case of
I. INTRODUCTION communicative gestures. Hand gesture recognition, as an input
method, is not only limited to gaming and other virtual reality
The major input devices like Keyboard and mouse are used to
fields but also covers medical [15] applications. Gesture
interact with computers. There are many interaction
recognition systems can work as input method between
techniques, like speech recognition [1] and devices including medical instruments and human body as proposed in [15]. The
wired and wireless for making the interaction easy. Among the author in [15] describes a touch-less communication between
various interaction techniques use of hands as an input is an medical machines and human body using hand and foot
attractive method for establishing natural and faster. By using gesture recognition. Hand gesture recognition systems also
Hand gestures user can communicate naturally in less time used in controlling robots [17].
period. In this way user interaction with computer will become
easier and there is no need of physical connection between Major work has been done in the field of core usage of
user and machine. This system can also help the people who gesture recognition techniques for human computer
suffer from RSI [5] due to usage of keyboard. interaction, Cursor control system proposed in [18]
discussed about controlling cursor of mouse using gestures
Considering the need of human computer interaction, we are as an input method. Author used different algorithms like
giving a desktop application that will use camera as primary SVM [21] and skin [7] color model.
source of input, with camera’s live feed we can detect Hand
gestures and associate some Windows actions to these Real time human computer interaction using face and hand
gestures. gesture are also used, music player has been controlled using
real time gesture recognition techniques [14]. Multiple similar
This system with hand Gestures as an input method can system has been developed like HANDY: A Configurable
replace all the functionalities of mouse control and some Gesture Recognition System [10] and A System to Recognize
frequently used keyboard shortcut keys. User can also give Dynamic Hand Gestures using Hidden Markov Model to
input from distance of 5-6 foot. Control Windows Applications [11]. But our system presents
The system contains number of small modules which a simpler way to control windows.
performs different tasks to achieve the required output. The Similar systems only proposed to perform one or two activities
system includes hand detection and posture recognition or only focused on single mouse tracing. Other systems used

978-1-5386-2303-9/17/$31.00 ©2017 1|P a ge


simple gesture and there is no similarity in doing the activity In Gesture recognition module, KNN classification algorithm
from actual input channel like mouse or keyboard and doing has been used. The controller contains whole business logic
from gesture input. Our system covers about 19 activities and Mappings of Windows activity against a combination of
including copy, paste, reverse, mouse tracking, zoom in, zoom a posture and gesture. After detecting hand controller
out, close, switch between open applications (alt + tab), Back, recognize the posture of and then identify the gestures by
Forward, Minimize, Maximize, redo, undo, click, drag drop, tracking hand and finally send call to Windows API to
double click, lock pointer and open context menu. These perform the activity defined against the posture and gesture
gestures are defined in appendix A with figures. performed. The layered view of above architecture is
represented in figure 6.

Fig. 1. Copy Gesture Fig. 2. Paste Gesture

Fig. 3. Minimize Gesture Fig. 4. Maximize Gesture

Considering gesture recognition as baseline technique for


input, it covers a lot of fields like robot interaction, computer
interaction [14] interaction with vehicles [9][16], interaction
with images in virtual reality gaming input methods [13] and
interaction with machines and instruments used in medical Fig. 5. System Architecture
[15].In next section detailed architecture of CMSWVHG is
explained.

III. SYSTEM ARCHITECTURE


The architecture diagram of the proposed system is shown in
figure 5.The user interacts with user interface and start the
camera. Camera starts taking input with the help of OpenCV
and sends those input frames to controller for further image
processing. In controller, first of all these frames are sent to
Hand detection module to detect the hand. For posture
recognition controller again send these frames to Posture
recognition module. After that for recognizing the gesture
formed by the movement of hand is recognized Using Gesture
recognition module. Finally, there is a windows activity
defined against some posture and the gesture performed. For
performing that activity controller send call to Windows API.
The Hand detection module is implemented using Cascade
classifier and data-set of palm is used from previously work
done.
Fig. 6. Layered Architecture
For Recognizing Posture, Bagged decision tree has been used.
Complete data-set for different postures with many tilted
positions is available at handy-gest.webnode.com.

2|P a ge
IV. IMPLEMENTATION D. Gesture Recognition

A. Live Feed Data- set for different gestures is created manually using MS
Paint on windows and tablets, and then KNN algorithm has
First step is to stream the live feed. Camera is used to taking been used to classify the gestures. It finds the similarity of new
input and OpenCV is used for its streaming. User can start and coming gesture with all classes and classify it in a class with
stop the live feed with the options on GUI. The figure 7 shows maximum similarity. There is flexibility in gesture data-set as
the output of live feed module. user can give a variety of input. There is a possibility that user
draws a gesture with different hand motion, with curvy motion
or with tilted hand. These different motions will result many
possible input gestures. This scenario has been solved by
training the algorithm on variety of possible inputs. In KNN
the character is classified on basis of k-Nearest neighbors.

E. Windows Activity

After detection of a gesture, the type of gesture has been


recognized to check whether it is mouse controlling gesture or
keyboard shortcut. Gestures to particular have already been
Fig. 7. Live Feed mapped on windows activities via win32 API.
B. Hand Detection
V. RESULTS AND FINDINGS
Cascade classifier [19] [20] has been used for hand detection. Accuracy is one of some good evaluation measure in this
Cascade Classifier is available in OpenCV. It only takes .xml case. The accuracy of Hand Detection module is its ability
file of data -set for training. Training data-set includes to differentiate the hand from other objects correctly.
thousands of positive images of the object which is to detect Similarly, the accuracy of Posture recognition module and
and many negative images of other objects of the same size. Gesture recognition module is their ability to differentiate
After training it can detect the trained object. After training of between right and wrong postures and gestures respectively.
data-set for hand detection, we detected hand and tracked the Mathematically, this can be stated as:
movement of hand so that we can recognize the gesture drawn
by hand motion.
Where
TP = True Positive is the proportion of positives that are truly
classified.
TN = True Negative is the proportion of negatives that are
truly classified.
FP = False Positive is the proportion of positives that are
falsely classified.
FN = False Negative is the proportion of negatives that are
falsely classified.
Fig. 8. Hand Detection Accuracies of each module algorithm wise are given below
as shown in Table 1.
C. Posture Detection
TABLE 1. ACCURACY OF EACH ALGORITHM
In bagged decision tree [25] bootstrap bagging is applied Module Algorithm Accuracy
on decision tree. Bootstrap bagging help machine
learning algorithms in improving accuracy and stability. Hand Detection Cascade Classifier 94.33%
It also helps to avoid over fitting. So, bagged decision Posture KNN 84.88%
tree [25] is used for posture detection in this project. Recognition
Posture Weighted KNN 85.24%
Recognition
Posture Bagged Decision 90.33%
Recognition Tree
Posture Complex Decision 69.2%
Recognition Tree

Fig. 9. Gesture Recognition


Gesture KNN 90.22%
Recognition

3|P a ge
Table 1 also gives a comparison of used technique with some 6 10 3 4 70%
other techniques. We have tried all these techniques and 7 10 3 5 80%
chosen one with the best accuracy i.e. Cascade classifier for 8 10 3 3 60%
hand detection, bagged decision tree for posture recognition 9 10 3 5 80%
and KNN for gesture recognition. 10 10 3 4 70%
As far as the accuracy of overall system is concerned,
complete system with the help of 50 users has been tested. If we take weighted average of both cases [(87*14) +
Each user has given 10 inputs for both type of gestures which (70*5)]/19 then we get 82.52% accuracy of overall system.
are easy and difficult to draw. Out of these 10 inputs 5 Results of all 50 users is given in appendix B and C.
gestures were wrong and 5 were right. In this way total size of
test data become 1000 [50*2 (5+5)] = 1000. After training of This system can be used in a presentation or a meeting room
these users, they used the system by giving different gestures. where the audience is sitting away from the input devices.
Second application of the system could be in an environment
The system performed with accuracy 87% on average for where the input devices are not available or the input devices
gesture that were simple like C, V, L, | and other that include are out of order. If there is emergency and no backup of input
straight lines in different directions. 10 input with gestures devices is available then this system can help a lot in such
containing straight lines, that are 14 out of 19, were given by critical situation. Thirdly this system can help the patients of
each user randomly and system classified them correctly 9/10
RSI to some extent. If we consider the world of human
times on average and performed action according to input.
Only a single time on average system classified the input computer interaction without this system then there will be an
wrongly. Results for sample of 10 users out of 50 is shown in add-on of cost of input devices along with their dependency.
Table 2.
VI. CONCLUSION AND FUTURE WORK
But for those gestures which includes curvy hand tracking,
which are 5 out of 19, system performed with an accuracy of The system consisting of simple modules that are implemented
70% on average. Results for sample of 10 users out of 50 is using simple but effective techniques, can help people to use
shown in Table 3. computer easily.

TABLE 2. ACCURACY OF THE SYSTEM FOR SIMPLE GESTURES A. Advantages

1) There is no need of any external costly device.


Input by
each User
Correct Wrong Only internal or external camera of laptop can be
Gestures Gestures used to give input to the system.
including 5
User ID classified as classified as Accuracy
Correct and
Correct Wrong
2) By using this system, you can control your laptop
5 Incorrect from a distance. This feature can help you to
(TP) (TN)
Gestures control computer in some conference room for
1 10 4 4 80% presentation.
2 10 4 5 90% 3) System can help the people who are suffering
3 10 4 5 90% from repetitive strain injury. They can take rest
4 10 5 5 100% for some while performing common keyboard
5 10 4 4 80% shortcuts.
6 10 4 4 80% 4) The modular nature of system gives it the power
7 10 4 5 90% of enhancement in future.
8 10 4 4 80%
9 10 3 5 80% B. Limitations
10 10 4 5 90%
1) There is a limitation of distance due to quality and
TABLE 3. ACCURACY OF THE SYSTEM FOR COMPLEX GESTURES range of the camera.
2) Background noise can affect accuracy and
Input by
Correct Wrong
performance of the system.
each User
Gestures Gestures
including 5
User ID classified as classified as Accuracy
Correct and
Correct Wrong In future, the system can be enhanced by integrating it with the
5 Incorrect
(TP) (TN)
Gestures camera of mobile phone with the help of Wi-Fi or Bluetooth.
The user will give input on mobile camera and can control the
1 10 3 5 80%
computer. This enhancement is one solution of distance
2 10 2 4 60%
limitation of the current system. There is another possible
3 10 4 4 80%
enhancement of actual file sharing using gesture technology.
4 10 3 4 70%
The user can perform the gesture and copy some file on one
5 10 3 4 70%

4|P a ge
computer and can paste it in some other computer which is [15] X. Li “Hand gesture recognition Technology in Human-
connected through Bluetooth, Wi-Fi or LAN. computer interaction,” School of Computer Science,
University of Birmingham, 2016.

[16] Khan F, Leem SK, Cho Sh, “Hand-Based Gesture


REFERENCE Recognition for VehicularApplications Using IR-UWB
Radar,” Department of Electronics and Computer
[1] L. R. Rabiner, “A tutorial on hidden markov models and Engineering, Hanyang University, 2017.
selected applications in speech recognition,” Proceedings of
the IEEE, vol. 77, no. 2, pp. 257-286, 1989.
[17] M. Hassanuzaman, T. Zhang, V. Ampornaramveth, MA.
[2] R. M. Gurav, P. K. Kadbe, “Real time Finger Tracking and Bhuiyan, Y. Shirai, H. Ueno, “Gesture Recognition for
Contour Detection for GestureRecognition using OpenCV,” Human-Robot Interaction Through a Knowledge Based
International Conference on Industrial Instrumentation and Software Platform,” International Conference on Image
Control (ICIC)College of Engineering Pune, India, May 28- Analysis and Recognition, 2004.
30, 2015.
[18] P. R. Futane, R. V. Dharaskar, V.M. Thakre, “Cursor
[3] A. B. Godbehere, A. Matsukawa, K. Goldberg, “Visual Controlling of Computer System through Gestures,”
Tracking of Human Visitors under Variable-Lighting International Journal of Innovative Research in Computer and
Conditions for a Responsive Audio Art Installation,” America Communication Engineering, December 2013.
Control Conference, 2012 . [19] mathworks.com, “cascade classifier”, 2017. [Online].
Available:https://www.mathworks.com/help/vision/ref/train
cascadeobjectdetector.html. [Accessed: 12-Jan-2017].
[4] opencv.org,“opencv”,2017.[Online].
Available:http://opencv.org/. [Accessed: 12-Aug-2016].
[20] docs.opencv.org “cascade classifier”, 2017. [Online].
[5] C. C. Tat, “HCI design and RSI,” ASIA-PACIFIC Available:http://docs.opencv.org/3.0beta/doc/tutorials/objdet
ect/cascade_classifier/cascade_classifier.html. [Accessed:05-
INSTITUTE OF INFORMATION TECHNOLOGY.
Jan-2017].
[6] S. A. Anas, A, Ehsan, J. Ahmed, “Sign Language Recognition
using Hidden Markov Model, “Thesis Report, Fast NU [21] N. B. Abdul-Hamid, N. B. Amir, “Handwritten Recognition
Lahore Campus, Dept. of Computer Science, June 2014. Using SVM, KNN and Neural Network,” University
Technology Malaysia Kuala Lumpur, Malaysia, 2017.
[7] ilab.cs.ucsb.edu,“hand tracking with skin color segmentation”
2017[Online].
Available:https://ilab.cs.ucsb.edu/index.php/component/conte
[22] en.wikipedia.org, “Bootstrap_aggregating” 2017 [Online].
Available:https://en.wikipedia.org/wiki/Bootstrap_aggregatin
nt/article/12/31. [Accessed: 12-Mar-2017].
g.[Accessed: 14-Mar-2017].
[8] U. V. Solanki, N. H. Desai, “Hand Gesture Based Remote
Control for Home Appliances,” World congress on APPENDIX A
information and communication technologies, 2011. Mouse Control
[9] C. A. Pickering, K. J. Burnham, M. J. Richardson, L. Rover, (G1.1) User shall move the pointer by moving one finger
“A Research Study of Hand Gesture Recognition parallel to the camera.
Technologies and Applications for Human Vehicle (G1.2) User shall click anywhere on the screen by tapping
Interaction,”Technical Research, UK, Jaguar Cars, down his/her finger.
Engineering Centre, Whitley, Coventry. Coventry University,
UK. (G1.3) User shall double- click anywhere on the screen by
tapping down his/her finger twice.
[10] M. Teimourikia, H. Saidinijad, S. Comai, “HANDY: A (G1.4) User shall use ‘drag & drop’ functionality by moving
Configurable Gesture Recognition System,” The Seventh pinch.
International Conference on Advances in Computer-Human (G1.5) User shall open context menu anywhere on screen by
Interactions, 2014.
drawing circle.
[11] J. R. Pansare, M. Bansal, S. Saxsena, D. Desale, “Gestuelle: (G1.6) User shall scroll by moving two fingers up/down
A System to Recognize Dynamic Hand Gestures using (vertical scroll) or left/right (horizontal scroll).
Hidden Markov Model to Control Windows Applications,” (G1.7) User shall be able to lock the pointer movement by
International Journal of Computer Applications Volume 62–
No.17, January 2013 spreading thumb along with index finger.
[12] M. Elmezain, A. A. Hamadi, G. Krell, S. E. Etriby, B. Keyboard Shortcuts
MIchealis, “Gesture Recognition for Alphabet Characters
from Fingertip Motion Trajectory Using LRB Hidden Markov
(G2.1) User shall copy by drawing ‘C’ with two fingers.
Models,” 7th IEEE International Symposium on Signal (G2.2) User shall paste by drawing ‘V’ with two fingers.
Processing and Information Technology, 2007. (G2.3) User shall undo by moving two fingers down and then
[13] S. S. Rautaray, A. Agarwal, “REAL TIME HAND left.
GESTURE RECOGNITION SYSTEM FOR DYNAMIC (G2.4) User shall redo by moving two fingers down and then
APPLICATIONS,”International Journal of UbiComp (IJU), right.
Vol.3, No.1, January 2012. (G2.5) User shall maximize the current window by moving
[14] R. Azed, B. Azad, N. B. Khalifa, S. Jamali, “REAL-TIME three fingers up.
HUMAN-COMPUTER INTERACTION BASED ON FACE (G2.6) User shall minimize the current window by moving
AND HAND GESTURE RECOGNITION,” International three fingers down.
Journal in Foundations of Computer Science & Technology
(IJFCST), Vol.4, No.4, July 2014.
(G2.7) User shall close the current window by drawing α with
two fingers.

5|P a ge
(G2.8) User shall use ‘back’ functionality by drawing opening
angle bracket < with two fingers.
(G2.9) User shall use ‘forward’ functionality by drawing Fig. 28. (G 2.12)
closing angle bracket > with two fingers. APPENDIX B
(G2.10) User shall zoom in by moving pinches on two
different hands closer. Input by
Correct Wrong
(G2.11) User shall zoom out by moving pinches on two each User
Gestures Gestures
including 5
different hands away. User ID
Correct and
classified as classified as Accuracy
(G2.12) User shall switch between windows by moving three Correct Wrong
5 Incorrect
(TP) (TN)
fingers left/right. Gestures

1 10 4 4 80%
2 10 4 5 90%
3 10 4 5 90%
4 10 5 5 100%
5 10 4 4 80%
Fig. 10. (G 1.1) Fig. 11. (G 1.2) Fig. 12. (G 1.3) 6 10 4 4 80%
7 10 4 5 90%
8 10 4 4 80%
9 10 3 5 80%
10 10 4 5 90%
11 10 4 5 90%
12 10 4 5 90%
Fig.13. (G 1.4) Fig. 14. (G 1.5) Fig. 15. (G 1.6) 13 10 5 4 90%
14 10 4 4 80%
15 10 4 5 90%
16 10 4 4 80%
17 10 4 5 90%
18 10 4 5 90%
19 10 4 5 90%
20 10 4 5 90%
Fig. 16. (G 1.7) Fig. 17. (G 2.1) Fig. 18. (G 2.2) 21 10 3 4 70%
22 10 4 5 90%
23 10 4 5 90%
24 10 5 4 90%
25 10 4 5 90%
26 10 4 4 80%
27 10 4 5 90%
28 10 4 5 90%
29 10 4 5 90%
Fig. 19. (G 2.3) Fig. 20. (G 2.4) Fig. 21. (G 2.5) 30 10 4 5 90%
31 10 4 4 80%
32 10 5 5 90%
33 10 4 5 90%
34 10 4 5 90%
35 10 4 5 90%
36 10 4 4 80%
37 10 4 5 90%
38 10 4 5 90%
39 10 4 5 90%
Fig. 22. (G 2.6) Fig. 23. (G 2.7) Fig.24. (G 2.8) 40 10 3 5 80%
41 10 4 3 70%
42 10 4 5 90%
43 10 4 5 90%
44 10 4 5 90%
45 10 3 5 80%
46 10 4 4 80%
Fig. 25. (G 2.9) Fig. 26. (G 2.10) Fig. 27. (G 2.11)
47 10 4 5 90%
48 10 4 5 90%
49 10 4 5 90%

6|P a ge
50 10 4 4 80% 49 10 3 5 80%
50 10 3 4 70%

APPENDIX C

Input by
Correct Wrong
each User
Gestures Gestures
including 5
User ID classified as classified as Accuracy
Correct and
Correct Wrong
5 Incorrect
(TP) (TN)
Gestures
1 10 3 5 80%
2 10 2 4 60%
3 10 4 4 80%
4 10 3 4 70%
5 10 3 4 70%
6 10 3 4 70%
7 10 3 5 80%
8 10 3 3 60%
9 10 3 5 80%
10 10 3 4 70%
11 10 2 4 60%
12 10 3 4 70%
13 10 2 4 60%
14 10 3 4 70%
15 10 3 4 70%
16 10 3 4 70%
17 10 3 3 60%
18 10 4 4 80%
19 10 3 4 70%
20 10 3 4 70%
21 10 3 4 70%
22 10 3 4 70%
23 10 3 4 70%
24 10 2 4 60%
25 10 3 4 70%
26 10 3 5 80%
27 10 3 4 70%
28 10 3 4 70%
29 10 3 4 70%
30 10 4 4 80%
31 10 3 4 70%
32 10 3 4 70%
33 10 3 3 60%
34 10 3 4 70%
35 10 3 4 70%
36 10 3 4 70%
37 10 3 4 70%
38 10 3 4 70%
39 10 3 4 70%
40 10 4 4 80%
41 10 3 5 80%
42 10 3 4 70%
43 10 3 4 70%
44 10 3 4 70%
45 10 2 4 60%
46 10 3 4 70%
47 10 3 4 70%
48 10 3 4 70%

7|P a ge

You might also like