You are on page 1of 55



This project aims to develop a toolkit that aids in the analysis of finger movement in a video
stream. The toolkit will consist of a hardware setup that includes a camera and an input
device such as an iPad. In addition, there is also a software side to the toolkit which includes
a program to analyse the videos recorded.

The toolkit will be beneficial as a research tool for academics such as PhD students as it helps
to better understand the movement habits of the intended target user by producing a result of
the analysis containing the finger path, coordinates and angle. Computer vision libraries, such
as OpenCV will be used to analyse the video streams that are recorded.

[The aim of this project is to be able to use this research toolkit to be able to record, analyse
and produce a suitable result sheet for any experiment that involves finger movement within
a video stream.]


Thanks to Jason Alexander, my project supervisor for providing me with feedback and

Chapter 1


1.1 Chapter Overview

In this chapter, we will be looking at the proposed aims and motivations behind this project.
In addition, there will be an explaining of the structure of the paper and how everything has
come together for this project.

1.2 Introduction
Today there are almost 2.53 billions smartphone users in the world, this market for touch
screen devices is growing with an increase of 8.3% since 2017 [9]. For a long time, research
on human-computer interaction has been restricted to techniques based on the use of a
graphics display, a keyboard and a mouse [2]. Nowadays users want to get as close to the
software as possible and the way users interact with their phones and other electronic devices
have changed drastically from the 2007 iPhone release [9]. In line with these changes it is
only natural that the way user interaction is recorded and analysed should also adapt, this is
what this project hopes to achieve.

With this growing use of touchscreens, multi-touch interaction techniques have become
widely available recently [3]. This project is relevant to the current growth of how users
interact with technology and, with this continued growth, we need to analyse how users are
approaching new technology and how we can adapt devices to be easier to use, with human-
centred requirements i.e. low system latency [1].

The main challenge facing this project and similar projects to this is the ability to be accurate
with the analysis. There are a large number of factors to be considered when looking into user
interaction analysis, these factors can be a hindrance when accurately analysing human-
computer interaction. Producing a final qualitative result of these user input analysis
evaluations can be challenging. This paper will break down the steps needed for the
development of a research toolkit that aids in the analysis of finger movements captured as a
video and how and why certain features were chosen to be a part of this toolkit. The research
included looks at a range of hand and finger movement analysis from pianists to simple bare-
hand finger tracking analysis and some open-source toolkits that are available for use now [4,
1, 6].

A research toolkit such as this would be useful to a researcher for three main reasons. Firstly,
the toolkit includes both the software and hardware aspects to it, thus making it easy for the
researcher to conduct their tests. This is because everything the research needs is in one place
and all works together well, in addition the toolkit is adaptable to meet the researches need
and thus they would be able to swap the hardware or add elements onto the software.
Secondly, the hardware is easy to construct and portable while the software should be user
friendly and easy to use. Lastly, the research toolkit should be able to be adapted to the
intended needs of the researcher, thus with a modular structure that constraints can be added
on top of.

The aim for this research toolkit is to have the functionality to process and detect hand
positions without prior knowledge of existence [3]. The difficulty with this task would be to
determine the background and foreground of the video to make the data intake easier, while
also needing to make sure the lighting and environment did not hinder the results. Thus by
distinguishing the background and foreground (foreground the object in question and
background everything else in a frame) it enables the progression of data analysis from the
video, this process is called segmentation [5].

Furthermore, detecting individual's fingers from the hand and having the ability to track these
fingers in their movement, we will need a high-resolution RGB camera to observe a path. The
intended goal is to produce useful information that would be required by a researcher. Areas
of analysis will include ‘hit’ locations, ‘hit’ times and ‘hit’ angle.

Finally, a suitable output of data in various forms, graphs and charts would be available for
the researchers to choose from and the toolkit would produce an appropriate data sheet. The
toolkit itself will be evaluated on how easy it is to use. How responsive the program is and
most importantly how intuitive the whole sequence of gathering a library of data, analysing
and publishing a final report is.

1.3 Aims and Objectives

The aim of this project is to develop a simple and easy toolkit that analyses the finger
movement path and hit detection, the time between hits, angle and the errors for a user and
their particular device, the aim being to record at least two of the initial criteria suggested.
The toolkit should be able to analyse a large library of video data recorded on a high-
resolution RGB camera and should be simple to use while producing the data collected in a
useful manner to the researcher.

My Aims for this project are:

• To build a toolkit, which includes hardware that should be quick and easy to
assemble, with minimal equipment.
• Record a library of 20 - 40 video streams of participants undergoing various tasks
using the RGB camera.
• To be able to extrapolate information/data from the library gathered, including ‘hit’
time, ‘hit’ angle, ‘hit’ locations, time taken for tasks to complete etc.
• To evaluate the toolkit and if it meets the researchers needs and wants (this may
include ‘hit’ are, ‘hit’ time, ‘hit’ angle etc.). Evaluate the toolkit to make sure it is
capable to do everything that was intended from it for this project.
• For the user to be able to carry out Fittz’s law study while finger movement is
analysed by the toolkit.

1.4 Report Overview

The rest of this report will be structured as followed, and will outline the key areas of
research and design.

• Chapter 2 – Background: The Initial research and findings from past works will be
discussed and analysed. The results of similar projects were looked at and the
implementations of those designs were taken to aid in the design of this project.
• Chapter 3 – Design: After condensing the background research, the design chapter
looks at a high-level construct of the project. In this chapter examples of equipment
that could possibly be used along with a structure of how the project should be
• Chapter 4 – Implementation: An overview of the program written, with examples
from the code itself and how the implementation of the program is structured.
• Chapter 5 – Process Description: A breakdown of the process that are taking place
behind the code, along with a manual breakdown of what happens when analysing the
video stream
• Chapter 6 – Testing and Evolutions: Critical review of the program with vigorous
testing and examples of results along with evolutions of if aims were achieved.
• Chapter 7 – Conclusions: A full review of the project, with a reflection on the
introductions aims and objectives chapter, the project as a whole, challenges faced,
lessons learns and what would be changed if conducted again will be discussed.

Chapter 2


This section will highlight findings and research prior to the design and implementation
phase, it will look at areas of research such as, segmentation techniques, contouring and

2.1 OpenCV Introduction

In this project, computer vision libraries will be needed to help breakdown and analyse the
multiple video streams collected. Thus the use of OpenCV, a library of programming
functions mainly aimed at real-time computer vision, is vital for this project.

OpenCV (Open Source Computer Vision), as stated before, is a library of programming
functions mainly aimed at computer vision, this library of functions can be used for
image/video processing. The library has more than 2500 optimized algorithms, which
includes a comprehensive computer vision and machine learning algorithms, these algorithms
can be used to detect and recognise faces, identify objects, classify human actions in videos,
track camera movements, track moving objects, extract 3D models of objects, produce 3D
point clouds from stereo cameras, etc. [17]

OpenCV is a cornerstone in computer vision analysis and has been used for countless
projects over the years. Yeo, Lee and Lim [9] exclusively use features from the OpenCV
library along with a Kinect to create a hand/finger tracking and gesture recognition system
for human-computer interacting using low-cost hardware. On the other hand, Godbehere,
Matsukawa and Goldberg [10] developed a tracking algorithm within their own computer
vision system that is said to demonstrate significant performance improvement over existing
methods in OpenCV2.1. For this project, the use of the functions provided in the OpenCv
library are of great aid while helping to deconstruct a video stream.

2.2 Background/Foreground Segmentation

Once the video has been captured, it is necessary for the software to break down the video to
help with analysis. One technique that can be used to break down the video is
background/foreground segmentation to help the software distinguish between the subject
that is being analysed and everything else in the background that is irrelevant.

The basic problem lies in the extraction of information from vast amounts of data. The most
important part of the project for the researchers are the results and how/why they are relevant.
The goal of segmentation is to decrease the amount of image information by selecting areas
of interest. Typically hand segmentation techniques are based on stereo information, colour,
contour detection, connected component analysis and image differencing [2].

Similarly, Moscheni et al [18] highlight the intimate relationship between background and
foreground segmentation and computation of the global motion. When breaking down the
global motion (background) we can estimate the local motion (foreground) to be performed
by foreground, thus foreground can be assumed to be the displacement of the object in a
scene. This approach of tracking or clustering pixels works well at determining the
foreground and background. This paper was helpful as it helped to approach the project in a
layered structure, by building on existing concepts and adapting and adding to them to gain
more accurate results. The disadvantages of background/foreground segmentation that were
found by Von Hardenberg and Berard included, “unrealisable cluttered backgrounds,
sensitive to changes in the overall illumination, sensitive to shadows, prone to segmentation
errors caused by objects with similar colours” [2] . Letessier and Bérard developed new
techniques to overcome these disadvantages, Image Differencing Segmentation (IDS) is
insensitive to shadows and Fast Rejection Filter (FRF) is insensitive to finger orientation [1].

2.3 Haar Cascade

Haar Cascade s are using in computing vision software such as OpenCV to help with
detection [13]. In the OpenCV python environment we are able to train a Cascade by
providing the program thousands of positive and negative images. Next, the program takes
each positive image and matches that against every pixel of the negative images, when the
process is complete the Cascade should be able to distinguish between a positive object
Cascade and its negative environment.

This concept is perfectly demonstrated in Padilla et al’s face detection paper [19]. Faces
much like fingers/hands come in various different shapes and sizes. The 4-steps suggested in
this paper are similar to the suggestions made in the python programming tutorial [13] these
consists of: face detection, image pre-processing, feature extraction and matching. This along
with varying image colour, lighting, rotation and quantity of times gives you a more precise
detection of the frame. Ultimately, to gain an accurate representation of the subject of interest
it is vital to do everything that can be done make the object (finger in this case) as easy to
detect, with the highest precision possible, by the system.

2.5 Chapter Summary

Through the research conducted there are several ways to conduct this project. The following
steps will be to undertake each method and evaluate which gains the most precise result that’s
required but is able to fit out timeline and resources. Ultimately, the background research that
has been conducted will determine the shape and structure of the projects design.

Chapter 3

Design (high level system design)

3.1 Chapter Overview
Through this chapter the various design decisions that were made will be discussed prior to
the final implementation of the toolkit. The explanation of the discussion made will also be

3.2 Interview with PhD student
As the intended target user for this research toolkit would be academics such as PhD students
to aid them in their research, it was a logical start by finding out what researchers are actually
looking for in a product such as this and what they intended its design be like.

The conversations with PhD students were helpful as it brought a different perspective to this
project, the PhD student looked at this project as a stepping stone and look at further study
and how this toolkit can be adapted and evolved into making it better and used in other areas
of work such as Medicine and Physiology. Additionally, it allowed the toolkit to be
visualised in a broader picture and seeing how it could possibly be used in multiple other
ways as a small part of a much larger project as well. This added significance to the results
that would be achieved from this toolkit. The questioning opened up with "What analytical
information would you like to gain from tracking hand/finger movement in a video stream?"
[Ref_phD_Interview], the higher level and general answer that was given my most students
was that the requirements for projects varies and a useful toolkit should accommodate

multiple results but also should be open to adapt and change for the users desired output. This
indicated the need to publish the source code for other users to be able to modify and adapt
the desired code for their specific needs. As a default, the most important results that should
be produced by the toolkit include time to track and analyse the finger through the video
stream, as well as the "pause" time taken by the individual between the tasks they carry out.
An additional add-on that could be added to the toolkit would be the ability to calculate errors
during the experiment.

Following on from the initial talk, the next questions posed to the PhD students was "Would
a toolkit which analyses finger movement in a video stream be useful to your work?". As
previously addressed the feedback from the students depended on their work. Most HCI
(Human-Computer Interaction) students had part of their work that they said would benefit
from this toolkit. One particular student stated that she was working on a medical based
project and that this kind of toolkit would be extremely useful in that line of work. The
toolkit could help track and analyse a surgeon’s hand movements during surgery and then use
this information to work towards getting robots to replicate the work done by the surgeon.
With the use of movement tracking software, if robots are able to replicate precise human
tasks this could lead to safer medical operations that would eliminate human error and
increasing precision. Technology such as this is still a far distance in the future but is, and
should be, slowly worked towards.

The final questions expanded on the idea of how a movement tacking toolkit could be
adapted and used in the future for more extensive research. For a wider range of uses, the
PhD student advised looking outside of computer science to find some common group with
departments such as physiology. Not only the use of finger and hand movement tracking but
this can also be expanded to analyse facial features and detect facial releases and this would
be very useful for website builders/content producers as they use physiological reactions
from individuals of certain content. This would include gaining information on hot sports on
interfaces, how to achieve an action most efficiently and we will be able to gage emotional
reactions (happy, sad) etc.

In summary, there are three key areas that the HCI PhD students are looking for. Firstly
adaptability, as projects differ and result needed are different and it is important that a
research toolkit should be able to be adaptable to the researcher's needs. This could include

things such as being able to code specific reading requirements that are needed from the
software. Secondly, a clear representation of results is extremely useful for future analysis,
this includes producing the output data in a CSV or excel format. Lastly, the whole toolkit,
which includes the hardware and software should be easy and quick to use and setup.

3.3 Software Resources

As discussed in prior research and in the project brief the OpenCV library provides vast tools
that are capable to gather all information’s required for this toolkit. Furthermore, the
functions available in the library aid in analysing this information. This includes being able to
convert videos files to grayscale and using that information to further analyse the video clips.
Accompanying OpenCV library, Python was the language of choice for this toolkit. The
simplicity of the Python language along with its consistent syntax and large library,
compliments OpenCV functions in a way that enables them to work fluently together. These
were the tools available to undertake this project but to get a high-level overview of the
design that was required by the intended target audience interviews were carried out with
PhD students to ask what results would be helpful for them in their line of work.

3.4 Video Clip Capture

A large part of this project is to be able to capture video from your tests that will run through
the software for analysis. It is important that the videos captured are fitting for what the
software is capable of analysing, this is why it is important to capture video clips in one of
the methods that is explained.

For any analysis work to take place, a large quantity of raw data needs to be on hand to be
able to put through the software of the toolkit and evaluate the results of the analysis. As
stated in the proposal of this project the video capture of this project needs to be simple, easy
to assemble and portable. The camera for capturing the footage may vary in choice but the
video should be of good quality with a minimum of 1080p, this is because it will produce a
resolution of 1,920x1,080 pixels thus allowing for the edge of the fingers in the video to be

clear and crisp making it easier to be detected by the software resulting in more accurate
results. In addition, 60fps is a standard of most webcams and cameras therefore that will be
perfect for the recording. Anything less than 60fps such as 24 fps or 30 fps, could result in
empty frames and the footage will not be as smooth as that collected from a 60fps camera.
This is especially important when you have a fast-moving fingers in a video stream.

An iPad will be used as an input device for this project. The camera will be placed directly
above the iPad, birds eye view and will capture videos of participants while they attempt to
do the tests for the experiment. The iPad (or any equivalent device that is used) must be
portable and easy to set up, it should be able to load any tasks and should be simple to use.
An iPad was the ideal device for this project but this can be altered depending on the
researcher's preference.

The iPad should be placed on a flat surface with the brightness of the screen adjusted for the
camera. The camera will be connected to a tripod thus allowing it to be able to gain a birds
eye view of the iPad. The cameras intention is to capture the iPad screen and to minimise the
recording any of the surroundings, therefore, the video should only have footage of the iPad
screen in its frames and with the brightness adjusted so that we are able to clearly read any
writing on the iPad balancing out the exposure of the screen. The camera should be
connected to a laptop or any other portable device that is able to host and execute the
program. This is the final setup to take in data/video clips [reference_picture_of_setu].

Once everything is in place, the experiment will begin and the participant will be asked to
undertake various tasks, these could involve: typing a sentence, navigating a website, doing
the Fitz law test etc. The video stream is then uploaded onto the laptop and saved. Following
this, the video file is run through the finger movement program that should analyse the time,
position and angle of the subject’s finger movement in the video stream. This information
will be represented in the form of a spreadsheet or document that the researcher can then
further use in their work.

3.5 Camera Capture Types

Figure 1:Samsung S9 [29] Figure 2: Logitech C922 [30] Figure 3: Iphone X [31]

Figure {1,2,3} shows the different cameras that are able to capture the intended video input,
these include smartphones that are able to capture the desired input. Ultimately the chosen
device should be able to precisely capture the videos clip while being in focus, clear and easy
for the file transfer to take place. A webcam works well as the video captured can be directly
saved onto a laptop and conveniently inputted into the program. On the other hand, a
smartphone also works well as you don't need to directly connect it to a laptop straight away
and multiple video streams can be recorded and saved. These can be transferred to a laptop at
a later date for analysis. The choice of camera is at the discretion of the researcher's

3.6 System Constraints

As discussed by Yu, et al [33] the problems of background subtraction includes a resulting
image being filled with noise, thus requiring the process of edge preserving filtering in order
to remove this noise and to make the system more accurate. This is a very costly process and
would not be suitable for this project. However (as explained in the implementation chapter)
other techniques such as thresholding and contouring would need to be used to produce an
accurate result in this project.

Furthermore, foreground constraints include accurate detection due to colour noise and
colour tracking. A solution to this would be to use a specific colour range that would be

tracked similarly to the one used by Ghorkar, et al [34]. Another solution would be
grayscaling the images and eliminating the need for colour tracking altogether.
When using contouring to analyse the shape of the finger it goes through the frames and
images to get the most accurate shape it can find, but this function only allows you to gage
the approximation of the shape and not the actual shape [28], thus ultimately this has an
effect on the accuracy of the results.

Another option with regards to gaining the shape of the finger would be using Song, et al
finger shape tracking [35]. Similar to implementing Haar Cascades with a redefined image of
a shape, this system requires a very complicated process of checking certain colour and
extraction of finger trips that don’t include the full hand (if the hand is out of frame) will
cause an issue with the system.

3.7 Desire Output and System Run-Through

The overall desired output for this system is for a program to accurately be able to track the
finger movements within a video stream and produce these results in a useful excel
spreadsheet format. The system should be able to output various results depending on the
researches preference. This can be implements in HCI analysis of various input devices. The
system should be able to be adapted for functional add-ons that are able to be coded on by
any user.

Take video
Review the
Set-up recordings of Run the video
result file that is
hardware, with the hand files into the
produced with
camera and movments and software on the
the details of
input device. upload onto a computer
the experiment

3.8 Chapter Summary

In this chapter the high-level design features of this project have been established. This will
be broken-down further in the implementation phase.

Chapter 4


4.1 Chapter Overview

In this chapter, the concepts that are involved in the project will be looked at further to gain a
better understanding of what is occurring in the background of the toolkit’s software. These
concepts include Video Capture, grayscale conversation, binarization, contouring, polygon
approximation, morphology, the law of cosine etc.
Segmentation Contouring Finger Position Angle

4.2 Video Capture

The Requirement for a simple and easy to use command is a must for this toolkit as it is
intended to be quick and efficient for the intended research purpose

As the software is initially run using the command “python
<full_path_of_video_file>” the “cv2.VideoCapture” function sets the reader to read from
video file input. This “VideoCapture” instance is able to capture the video frame-by-frame
until it is released.

4.3 Grayscale Conversion

After the video stream is read into the program frame-by-frame, the images are converted
into grayscale. Grayscale is simply reducing complexity: from a 3D pixel value (R, G, B) to a
1D values. Grayscale is extremely important in image processing for several reasons. Firstly,
the elimination of noise in coloured images. Due to the change in pixels, hue and various
different colours, is it difficult to identify edges and the extra colours are just considered to be

Secondly, colour is complex and unlike the ease of perception and identification that humans
can detect colour, computers need a lot more processing power and this results in an increase
in the complexity of the code. On the other hand, grayscale is fairly easy to conceptualise
because we can think of the two spatial dimensions and one brightness.
Lastly, the speed of processing is a major factor as coloured images can take a very long
time, a lot longer than processing a grayscale image. As we are analysing hundreds of these
frames the time taken to analyse coloured images is much longer than that of a grayscale
image, thus we use grayscale images for the next sections of processing.

In the following images grayscale has been implemented along with foreground/background
subtraction. With this technique, the object that is moving (foreground) is white and the
objects that are still in the frame (the desk etc.) are in black.

4.3 Thresholding (Extracting the finger from the video)

The need to separate the subject, in this case it would be the individuals finger in a video, is
highly important. It is a priority to establish the difference between the background/
irreverent noise and the specific finger that will be used for analysis. Once the finger is
established, the initial stages of analysis on the finger will be easier to accomplish. There are
many forms of image segmentation, these include: clustering, compression, edge detection,
region-growing, graph partitioning, watershed etc. The most basic type and the one looked at
in this project is thresholding.

Firstly, the screen area of the video stream is segmented, this is done by thresholding and
looking for bright rectangles/squares. Thresholding works as followed, If a pixel value is
greater than a threshold value, it is assigned one value (may be white), else it is assigned
another value (may be black). The function used is cv.threshold [22].

By looking at the signature of the thresholding function it is determined that the first
parameter is our source image, or the image that we want to perform thresholding on. This
image should be grayscale. The second parameter, thresh, is the threshold value which is used
to classify the pixel intensities in the grayscale image. The third parameter, maxval, is the
pixel value used if any given pixel in the image passes the thresh test [21]. Finally, the fourth
parameter is the thresholding method to be used, these methods include:

• cv2.THRESH_BINARY_INV – Inverts the colours of cv2.THRESH_BINARY
• cv2.THRESH_TRUNC - If the source pixel is not greater than the supplied threshold the
pixels are left as they are.

4.3 Binarization

In this project, after the video stream is converted into grayscale this is followed by
binarization using “cv2.THRESH_BINARY”, if the source pixel is not greater than the
supplied threshold, the pixels are left as they are.

"Neutrophils" by Dr Graham Beards

The Binarization method converts the grayscale image (0 up to 256 gray-levels) into a black
and white image (0 or 1). Simply, Binarization is the process of converting a pixel image to a
binary image [24]. The high-quality binarized image can give more accuracy in character
recognition as to a compared original image due to the noise that is present in the original
image [23].

4.4 Morphological Transformations

The binary image is then used to find contours in the image. Once contours are found, we use
polygon approximation to get a rectangle from the contour. This is done to get the location of
the screen so that we can observe the hand motions only within the screen. To obtain the
hand contour a couple of steps need to be completed first. These involved looking at darker
objects in the frames then applying the blurring, thresholding and other morphology
operations that we have looked at. This is all done to obtain a more accurate hand contour.

In linguistics, morphology is the study of the internal structure of words. While in computer
vision analysis, morphology transformations are simple operations based on the image shapes
The transformation is normally performed on binary images (this gained after binarization). It
needs two inputs, one is our original image, the second one is called structuring element or
kernel which decides the nature of the operation. Two basic morphological operators are
Erosion and Dilation [25].

Erosion works by having a slider, called the 'kernel'. In this project “MORPH_RECT” was
used as a kernel, this is a rectangle with the 7 pixels by 7 pixels dimension. Additionally, there
are three shapes that can be used for the kernel:

§ Rectangular box: MORPH_RECT

Figure 3: MORPH_RECT, is shown to be used in the code to obtain a rectangle.

The kernel slides through the image and erodes away any pixel that doesn't match all
surrounding pixels. If all the pixels in the kernel are 0's (black) and one of the pixels is a 1
(white), it will remove (eroded) that pixel. It is useful for removing small white noises,
detach two connected objects etc.

On the other dilation is the opposite to erosion, dilation works by pushing out the pixels that
do not match. Normally, in cases like noise removal, erosion is followed by dilation.
Because, erosion removes white noises, but it also shrinks the object (as seen in the images
below [26]). So we dilate it. Since noise is gone, they won't come back, but our object area
increases. It is also useful in accurately determining the whereabouts of the finger/hand in the
area of the screen.

Figure 4: Original image Figure 5: Erosion Figure 6: Dilation

4.5 Contour Features

As the most important parts of the screen have been established, the ones where we have the
hand movement occurring in, the next step would be to use contour techniques to determine
the outline and position of the finger itself to start tracking. There are several contour features
such as moments, contour area, contour parameter, contour approximation, convex hull etc.
The prerequisites for contouring is first to use a binary image, and secondly to apply
thresholding or canny edge detection, both which have been done, this is to ensure that a
accurate hand contour.

Contours can be explained simply as a curve joining all the continuous points (along the
boundary), having same colour or intensity [27]. The contours are a useful tool for shape
analysis and object detection and recognition. This is the perfect feature to use for the finger
analysis portion of the software.

Figure 7: Image shows the start of using contouring to find the finger

Starting with the "cv2.findContours()" function (in the picture above). this function allows us
to detect an object in an image. The function, "findContours" consists of three arguments.
First is the source image, second is contour retrieval mode, third is contour approximation
method, the output is a modified image. The contour retrieval mode "RETER_TREE"
retrieves all the contours and creates a full family hierarchy list, this means it is able to detect
objects that are in different locations or shapes are inside other shapes but is still able to
connect (the contours) them together.

Moments help you to calculate some features like centre of mass of the object, area of the
object etc. From this moments, you can extract useful data like area, centroid etc. Centroid is
given by the relations [28]:

This calculation is used to determine the centre of the finger for the tracking to be able to
follow a certain point. The centre of contour is calculated to aid us in the angle detection
future on in the analysis. The “contourArea” function is used to be able to gain a list of all the
contours within the screen area to be able to pick the maximum contour area.

Once the hand contour is obtained, we then look for the finger by analyzing position of the
hand and looking for extreme points which would correspond to pointed fingers. This allows
the “calculate_fingertip” function to be used to calculate the location of the fingertip. It does
this by looking all the contour points, the points on the extreme left and extreme right that are
found. Next the top left, top right, bottom left and bottom right corner are searched for
presence of hand. Based on where the hand is present, extreme left or extreme right point is
picked as fingertip. If hand is present in top left, then it is expected that point on the extreme
left will be the fingertip. This allows us to identify the fingertip and we are now able to track

4.6 Haar Cascade
This section will look at an object detection technique called Haar Cascades, this technique
was implemented during initial iteration of the toolkit but was not used in the final version.
This is due to Haar Cascade not being as accurate as required by this project, and it is very
time to consume to develop Haar Cascade which wasn't the most efficient technique for this

Haar Cascades is a machine learning based approach where a cascade function is trained
from a lot of positive and negative images and this is used to detect objects in other images.
For this project, four thousand positive images of hands/finger and four thoughts of images
that did not include hands or fingers were used to train a hand detection Haar Cascade. This
proved to take a long time as training the Haar Cascade took a couple of hours, sometimes
overnight. Furthermore, the detection accuracy using this technique for fingers was poor but
worked well for faces and eyes.

The Haar Cascade are beneficial when wanted to track a large object that won’t necessarily
change shape or over. This includes being able to detect eyes and a head but also can be used
to detect an object such as a ball or a pen. However, trying to detect a finger movement
which can be small and intricate can be challenging to achieve with a Haar Cascade. The
pictures below demonstrate a trained Haar Cascade detecting a face, eyes and a hand. As
demonstrated it is not accurate and therefore unsuitable for this project.

4.7 Finger Path
To visualise the finger path being tracked it was useful for researchers to be able to have a
video clip of the actual tracking taking place. We use the pixels in the screen to gain

distances arounds the finger, three lengths a,b and c having been able to be calculated. Next,
the cosine rule is used to find the angle to of finger. The cosine rule is given as: cos(C)

= a2 + b2 − c22ab, in the code this same formula is replicated and the three lengths that
were previously found are used to find the angle on the finger in comparison to the iPad.

Figure 8: The points obtain and the use of the cosine rule to work out the angle of the finger.

The centre of the hand contour is found and angle is measured between the center and the
pointed finger to get angle of the finger.

The ‘self.draw’ method runs through that points that were calculated and is able to use the
.circle and .line to draw the path of the finger over the videos file. The results are stored as
the position of the x and y coordinates on each frame of the video stream. Finally, the
function ‘write_output’ is able to publish the results in a .csv file. That fine can be used for
further analysis by creating graphs to show the diverse experiments that can be obtained from
this research toolkit. The full analysis of the toolkit will be presented in the next section.

4.8 Chapter Review
In this chapter, a clear understanding of the stages required for accomplishing this project
aims has been established. The techniques mentioned in the implementation chapter will be
able to track the path of a finger whilst providing the locations of the finger with respect to
the screen as well as the angle, this meets the required brought forward in the aims.

Chapter 5
System in Operation and Process Description
5.1 Chapter Overview

This section will look at the process of conducting an experiment using this research toolkit.
The section will be broken down into the software handing and hardware handing as well as a
general user guide or manual.

5. 2 Process Description

This following section is a high-level run-through of the manual in Appendix A.

Figure 9: The desired setup. camera birds-eye view of the iPad

As with the intention of an experiment, it is key to have a goal and understand what you are
trying to achieve and should be the basis of an experiment. When using this particular finger
tracking research toolkit, the hardware will need to be installed/ step-up first to get data to
analyse later. As described in the design chapter the set-up will include an overhead camera
with a bird's eye view on an input device (iPad).

The goal fo this experiment is to determine whether an input delay exists amoung the results
of the toolkit. The experiment begins with the particular test (such as the Fittz Law or any
finger traking test) up loaded onto the iPad and the camera recording (birds eye view on top

of the iPad) will start. The candidate being evaluated, will sit down and begin the tests on the
iPad using their pointer finger. The videos should then be uploaded to the computer that is
able to run the software portion of the toolkit. As the default, the toolkit is able to track the
finger movement by giving the x and y coordinates of the finger at various sections (frames)
in the video stream. Along with recording the time of the experiment and the 'hit' angle of the
finger. The video should be fed into the software and a .CSV file ( in the image below) will
be produced with the details of the experiment.

5.4 Chapter review
This chapter looks into the process that is taken through this experiment to achieve an
accurate and precise results file of a tracked finger. Furthermore visual representation of the
final output final and the setup are given

6. Testing and Evaluation
The final toolkit that was designed will be assessed and evaluated through several different
testing methods. The accuracy of the toolkit will be looked at to see if it meets the criteria,
these criteria’s involve accurately determining the time, location and angle of a finger as it is
traced in a video stream.

6.1 Accuracy
The main criteria when researchers are looking for in a research toolkit is how accurate the
results are. This evaluation will be broken down into 3 categories and will reflect the aims of
this problem. The key areas of accuracy that will be looked at are the ‘hit’ time, ‘hit’ angle
and the ‘hit’ locations.

6.11 Time

The reason for this investigation to be conducted is to determine whether there is a delay time
between the raw footage and the results that are produced after the analyses. If there is a
delay a quantitative result should be evaluated to determine the time offset present in the

To evaluate the ‘hit’ time, this experiment will consist of recording a video stream where an
individual is asked to move their hand from one square to another on the iPad. The squares
will be 15cm apart and will be at the same level. The individual will be asked to press a
square (for example square A) and hold their finger there for two seconds, then they will
move their finger onto the next square (square B) and hold for two seconds, finally they will
move their finger back to the original square (square A). This experiment will be repeated
with changes in the duration of the press (there will be a timer running beside the iPad).

After the video is recorded, the times and frames of the video will be looked at. The timing of
when the finger initially presses down on a square will be recorded as well as the time point
when the finger is released, the number of frames between the press will also be counted and
noted. After the video is recorded, the times and frames gathered from the video will then be
looked at and analysed.

Next, the video stream will be run through the software of the toolkit and the results will be
evaluated. In the results, concurrent repeat values of x and y coordinates will be looked for,
this is because if the x and y coordinates are the same for multiple frames/seconds thus the
finger hasn't moved in the 2D plane and can be concluded to be "pressing the square". The
time of the initial repeated coordinates will be noted along with its duration.

Finally, the results gained from the software will be compared to the results recorded when
looking at the video stream. The accuracy of the toolkit can be determined by if the results
match up or if there is a delay or offset in the time taken by the software.

The following explains the results from the experiment:

Figure 10: Screenshot of the initial touch at A at 0:02

Figure 11: Results screen for the first 3 seconds of the video

As shown in figure 10 the initial press onto A was done at 2 seconds however in figure 11 the
results from the toolkit show the finger was stationary at about 2.5 seconds this is determined
by the x coordinate only varying by 1 from 2.375 to 3.08 and the y varying by 2 between 2.5
second and 3 seconds.

We can conclude that the offset (delay) of this toolkit is around 0.5seconds, thus a delay of
0.5 seconds between the raw video file and the result sheet produced, this being a very
successful result. This experiment was carried out several time along with a 5 second press,
all the raw results and data are present on the website.

6.12 Angle
To accurately measure the angle, the next experiment involved an individual initially placing
their finger lying flat on the iPad . They lifted their finger up to 90 degrees then placed it
back down on the other side (as shown in the pictures below).

Figure 12: Side view of the experiment,
the actual recording will be taken from
a birds eye view.

This helped evaluate the accuracy of the toolkit and the desired output is for the program to
produce a data set that goes from 0 degrees (at the start of the video) up to 90 degrees and
then to 180 degrees (at the end of the video) , the accuracy of this experiment will be
determined by whether the system can accurately record the change in angle consistently.

The following experiment was carried out multiple times, with all the raw data collected
available on the website.

From the data below figure 13, it is clear to see that at the beginning of the video the angle
recorded is 98.40 degrees, which is not the desired output (desired output being 0 degrees).

At the midway point of the video around 3.65 seconds (with the 0.5 seconds added for delay
from the previous section), the angle at 3.67 is 145.46 degrees and 4.12 is 152.65 degrees
these values are again incorrect as the desired output should be 90 degrees. Finally, the end of
the video shows the angle at 7.33 seconds at 100.40 degrees however it should be 180

Figure 13: Results sheet from the Angles experiment

In conclusion, this toolkit is unable to accurately evaluate angles of fingers while they are
being tracked. The results showed that the angle region stays between 98 degrees to 150
degrees and therefore is something they would need to be refined for further work with this

6.13 Location

For this portion of the evaluating the accuracy of the fingers location was tested. To conduct
this experiment an iPad was used along with two rulers either side of the screen to record the
measurement of the iPad screen. To determine where the individual has touch, PPI (pixels per
inches) will be used to calculate where on the iPad a ‘hit’ has accoutred. This is done by
measuring the location using the ruler (in inches) and with the information that the iPad used
has a PPI of 264 the screen pixels can we calculated [34]. The result sheet data will be
checked to locate the finger at specific times and check x/y coordinates for the same location.
The accuracy will be judged according to how precise the results are able to output, the desire
output would be to get the same coordinates.

The following experiment was carried out and the results are evaluated below, the full data
and links to the video of the experiments being run are available on the website.




The following marks from A to E were made at 0.11, 0.15, 0.19, 0.23 and 0.28 seconds
respectively. Thus this would also be the location of the finger at those specific times. To
work out the location of the fingers using PPI the measurements of the marks were taken, for
example, A measure in at (1.74, 5.52) (x and y coordinates). With the information that this
iPad has a PPI of 264 the location of the finger at A would be 1.74 * 264 = 459.36px and the
5.52* 264 = 1457.28px. Concluding for A location with the x/y coordinates are (459.36,

1457.28). This result was then compared to the result output from the video file and large
differences were found. If we take into account the 0.5 seconds delay that was found out in
6.1, on line 37 of the results page (figure 14) the coordinates are notes as (610, 347) thus
concluding that these results are not accurate. Mark B was calculated as (459.30, 810.48) but
as recorded on figure 15 the coordinates were (600,128). This experiment was done with all
five marks and no correlation between the data was found, therefore can conclude there was
no specific offset number for the location.

Figure 14: Results for B at 15.5 seconds Figure 15: Results for A at 11.5 seconds

6.3 Set Up and Toolkit Run Through by Another Person

For this evaluation another person was given the equipment, the software and an user manual
guide for the toolkit. They were asked to setup the toolkit and record a video, upload it onto
their computer. Open the software and run the video stream through the software and provide
an output file. At the end the experiment the individual was asked some questions on how

they found the experience and what they thought worked well and what could be changed and
done better.

The full questionnaire for this experiment is detailed in Appendix B. From this experiment
the user has no prior knowledge of coding or this project. This experiment was to evaluate the
ease of use of this toolkit and therefore was appropriate to get someone who has no technical
knowledge. The user said “it was easy and took a short time to do” when referring to the
hardware setup as they used the pictures to guide them. In addition, even with no knowledge

of the code, they were able to understand the output of the toolkit and use appropriate
modifications (changing the threshold in line 74) to gain their desired output. They did notice
some glitches with the tracing as the marker would occasionally register something other than
their fingertip but overall they were happy with their experiment.

6.4 Tracking Different Object Movement
The final experiment consisted of using different object in the video stream and evaluated if
the software is able to recognise a finger or if it recognised anything that’s the shape of a
finger. The video stream used conducted of using a pen to mimic the movement of a finger
on a keyboard, using different fingers (pinky finger) and wearing a glove while doing the
experiment (this final experiment was not included but footage and results can be found on
the website).

The following experiment was carried out with initially using a pen to replicate the
movements of a finger typing.

Figure 16: The start of the video as the pen enters the frame.

Figure 17: Middle of the video with the pen 'typing' Figure 18: Final outcome after the pen has exited the frame.

As represented in figures 16,17 and 18 the system was able to track the pen at certain point
but was unable to fluidly track its path as it would do with a finger. The desired output of this
experiment would have been if the software wasn’t able to recognise the pen altogether but it
can be concluded that we software was able to partially track the pen but not to great

The next experiment was using the little finger or pinky finger to replicate what has
previously been done by a pointer finger.

Figure 19: Initial set-up when starting the video

Figure 20: Mid video at 00:09 seconds Figure 21: End of the video at 00:18 seconds

From the test run the conclusion can be made that the toolkit does tracks the pinky finger.
From the initial observations of the output video with the traced routes it does look like the
software is able to track the pinky with the same accuracy as the pointer finger. In contracts
with the pen this shows that the system is efficient at determining fingers in comparison to

6.5 Chapter Review

This section has seen the evaluation of the toolkit. It can be concluded that the toolkit needs
more work to refine and make the system more accurate. However, it does meet the aim of
tracking the path of a finger in a video stream and produces results with a 0.5 second delay.

Chapter 7

In this chapter the overall project is discussed. The overall outcome of the success/failure of
the project will be evaluated. The aims will be reflected upon as well as the knowledge and
experience gained over the process.

7.1 Conclusion of Aims

Aim 1: To build a toolkit, which includes hardware that should be quick and easy to
assemble, with minimal equipment.

This aim was successfully completed, in section 6.3 during the evaluation the individuals that
was given the user manual along with the toolkit was able to efficiently and quickly set-up
the equipment and conduct the experiment. From the feedback they said it was quick and
easy to run and required minimum3 pieces of equipment which was good but could have
benefited with a more user friendly user interface for the software. In addition, the hardware
for the toolkit was adaptable to allow the research whatever device they had available to them
but also the device used during this project was quick to set up and run.

Aims 2: Record a library of 20 - 40 video streams of participants undergoing various tasks
using the RGB camera.

Through various different experiments and investigation 20 video stream was a good target to
hit but during this experiment that was met and exceeded, while collecting and evaluating
nearly 50 videos streams. The evidence for this can be located on the website that contains all
the raw footage that was recorded during this experiment along with all the outputs produced.

Aim 3: To be able to extrapolate information/data from the library gathered, including ‘hit’
time, ‘hit’ angle, ‘hit’ locations, the time taken for tasks to complete etc.

From the evaluation of the video stream, the data regarding the angle is able to be collected
along with the location. The ‘hit’ time, could be calculated this is shown in 6.11 where the
toolkit was efficiently able to track the fingermovement but did experiment a 0.5 second
input delay. However, the toolkit wasn’t able to evaluate at what stages in the video stream
the subject actually clicked onto the input device, this was only determined by the repeated
values (coordinates) of x and y as the finger would have been stationary at those stages.

As evaluated in sections 6.12 and 6.13 the toolkit was not fully accurate when recording the
location and angle. This is due to the values that were calculated not matching up with the
results produced by the toolkit.

Aim 4: To evaluate the toolkit and if it meets the researchers needs and wants (this may
include ‘hit’ are, ‘hit’ time, ‘hit’ angle etc.). Evaluate the toolkit to make sure it is capable to
do everything that was intended from it for this project.

This aim is subject to the situation that it is being used in. For this experiment the three main
needs were ‘hit’ time, angle and location. The toolkit was able to accomplish one of these
needs but wasn’t able to accurately determine all three. Therefore this currently toolkit was
not capable to do everything that was intended by the project.

Aim 5: For the user to be able to carry out Fittz’s law study while finger movement is
analysed by the toolkit.

The Fitts’s law study was used as one of the testing methods when gaining video streams for
the evaluation of the toolkit. The images below show the Fitts’ law in action, these were
taken during the beginning stages of development for the toolkit and therefore results in
sporadic tracking. This helped to gage the threshold level that was require from a toolkit to
provide smooth and accuracy tracking.

7.2 Project Revision
As most of the aims were met and the toolkit was able to answer several questions there is
still room for improvement. These are a few ideas and implements that would be added on if
this project was done again or is extra time for further development was given.
Firstly, the toolkit intention was to be able to record the ‘hit’ of the finger onto an input
device. This would be able to record the actual input of the finger. For example, if a finger is

typing on a keyboard the toolkit should be able to recognise the letters that are being typed.
This doesn’t work when the toolkit if used to do the Fittz’s law experiment or instructed to
browse the web. This is because the actual input is not reflected by a finite letter

Secondly, the user interface with this toolkit is poor. This was developed for HCI researched
and was assumed that they had prior knowledge of compilation and running programs in java.
In reality, if this toolkit was to be used by other researched in a different department, for
example, psychology. The research, if they are unable to use the toolkit or able to follow the
manual will not be able to use it efficiently.

7.3 Further Work
The most important section in this project would be the further work. This section looks at
the concluded project and the ideas proposed and challenge the need for further research and
work on the idea to make it bigger and better in the future. This section will look at how this
idea of finger tracking in a video stream can be implemented in the future for a variety of
different uses in different sectors. These sectors may include medicine, sign language
translation, input device research and website development

For this project there are several avenues for improvement. Firstly the location and angle
detection is very inaccurate and therefore would require a different method of gaining these
results, these could include adding sensors gloves to this toolkit but would require more
equipment. In addition, the delay time of 0.5 seconds needs to be reduced with the goal of
obtaining 0 seconds delay time, . As mentioned in the project revision the user interface
needs to be more accessible and user friendly.

7.31 Medical Use
As previously outlined in the interview with the PhD student, a great use for tracking fingers
in a video stream would be in medicine. More important during surgery, minor surgeries that
are routine and mostly repetitive. A video that records the surgeon's hand movements during
a survey and outputting the analysed information including the x and y coordinates of their

hands and the angles of their fingers. This information can then be uploaded into a robotic
pair of hands that would be able to replicate the surgery, this would increase accuracy during
the surgery and would also free up time for the surgeons for more complicated work.

This approach works better than the current approach of hand/finger monitoring which uses
sensors that are clipped onto the individual's hands. The major challenges that face this
further work are first, the three definitional aspects of the surgery. This would possibly
require a minimum of two cameras along with a more powerful software toolkit. In addition,
the contrast between the surgeon and this patient might be an issue but could be solved with
bright gloves creating a greater contrast. There are more obstacles before the use of finger
tracking can be implemented in the health-care industry but hopefully, this project provides
some ideas for getting started.

7.32 Sign Language Translation
Another idea for further work that would benefit from the ideas that are proposed in this
project would be a sign language translation system. The basis of the idea is to capitalise on a
unique input method. The idea suggests that by recording a video stream of a pair of hands
saying a sentence in sign language and software using computer vision techniques can be
used to analyse the hand/finger movements and relate them to words.

After a library is filled and a software is taught how to translate sign language into words,
individuals can use sign language to type as the video stream analyses the finger movements,
translates them into words and they are added into a document, email anywhere. This would
be a unique input device that could challenge the transaction keyboard but also encourage
people to learn sign language thus help communicate with audio impaired individuals.

7.33 Input Device Research
One of the goals of HCI (human computer interaction) is to be able to make the
communication between the human and the computer as seamless and effortless as possible.
This starts with the input device, as mention before the sign language transitional is a unique
take on using the ideas of the project to create an input device. Another use of this port would
be to monitor the approach taken by individuals when using a keyboard and other input
devices. By analysing the travel time between keys we can use this toolkit to gain

information that could help with creating a more efficient and easier to use a keyboard. These
keyboards are already in the market, but by gathering information for individuals we can
determine what exactly they are looking for. Potentially recording individuals for a certain
amount of time and personalising keyboard specifically for their typing habits.

7.34 Website Development
Just like the toolkit used to gain information about user input habits and the devices they use,
this can also be translated to a website usage that is available on iPads or smartphone (any
touchscreen devices). The toolkit can be used to track the user's habits when using the
websites and how they navigate through it, over time the website developers can determine
what buttons/pathways are the most common in the website and can change the layout to be
more user-friendly and efficient. This will help the development of websites and will also
help with user experience because they are not frustrated navigating a website (and everyone
has been there).

7.35 Robotic Arm
Finally another optimistic idea for further research. As developed/proposed in on Internet
connect robotic arms, “the movement of the robot arm can be controlled by a computer via
the internet” [32]. This vision can be aided by the research and ideas brought forward in this
project. If an individual’s hand movements for example, while playing chess, are able to be
traced and recorded. This data can be transferred over the internet to a robotic arm that is able
to recreate the individual's hand movements, therefore playing physical chess over the
internet and all that is needed is a webcam.

7.4 Reflection
During this project, I believe I have learned a huge amount about computer vision, something
that I previously had very little to no knowledge of. I have been introduced to the world of
OpenCV with its massive library of functions capable of various operations. Initially, when
the project started I didn’t have a certain idea or direction I wanted to go with therefore I was
able to play around and learnt how to use different concepts such as Haar Cascades and
background subtraction, I was able to understand the advantages and disadvantages of using

various techniques to achieve my goal. This was all before understanding the steps I needed
to take to ensure I was able to create an efficient and accurate toolkit and overall project.

Going into detail regarding how images can be manipulated and changed to obtain certain
information really fascinated me while during my research. I really enjoyed learning how
binarization worked and why it worked and how we can basically manipulate individual
pixels and alter an image/video to analyse and extract specific information, such as the shape
of a finger. Furthermore, I understood in intricate relationship between thresholding,
contouring, blurring and how they all work together and how the order in which processes are
done is very important depending on what information you would like to obtain and how you
want to use that information.

I enjoyed talking to the HCI PhD students along with peers. As this toolkit is intended to be
used by others and therefore it was important to me that look at the user manual or the
instruction is given someone that they are able to understand where I am coming from and
why this is of interest. In particular, I enjoyed talking to the PhD student about innovations in
the future that involved HCI. Such innovation as robotic arms conducting their own surgeries
or using potentially recording hand recordings of yourself to play chess with someone across
the world just because the toolkit was able to analyse the movement of your hand and
replicate this to a robotic arm, this without using expensive sensors or high tech equipment.

Finally, thanks to this project I have been able to learn a wide arrange of skills. This
experience has taught me how to put together a large report along with research. In particular,
with the structure of the project, meeting personal deadlines and working largely
independently. This has improved my communication skills because I needed to be clear and
concise with my points and idea to get across the message to someone who is unfamiliar with
computer science altogether. In addition, doing the project in python has aided me immensely
as I am more confident with my coding. Python is extremely desired and this has allowed me
to be more confident during assessment days and job interviews while talking about python
and computer vision. Finally, I believe I have improved my ability to undertake a large
project and be able to efficiently execute it individually.

7.5 Closing statement
In conclusion, I believe this project was a great success as it personally has helped me
develop in various ways. This includes communication, time management, structure, work
management etc. Whilst I was notable to complete all my desired objectives to the fullest as I
aspired to, such as accurately calculating the angle. I believe I was able to put my best effort
in and I am proud of the outcome. I was able to pick an area of computer science I was
interested in (HCI) and I was given the opportunity to learn so much more this included
computer vision OpenCV library and techniques to alter video frames/images. Furthermore, I
have acquired many invaluable skills which I can now apply to any number of fields. Finally,
even with a challenging project such as this one, with scary hurdles and stressful times, I am
ultimately proud of the work that I was able to achieve.


[1] Letessier, J. & Bérard, F., 2004. Visual tracking of bare fingers for interactive surfaces.
Proceedings of the 17th annual ACM symposium on user interface software and
technology, pp.119–122. (Accessed Oct 2018)
[2] Von Hardenberg, Christian, and François Bérard. "Bare-hand human-computer
interaction." Proceedings of the 2001 workshop on Perceptive user interfaces. ACM,
2001 (Accessed Oct 2017)
[3] Hackenberg, G., Mccall, R. & Broll, W., 2011. Lightweight palm and finger tracking for
real-time 3D gesture control. Virtual Reality Conference (VR), 2011 IEEE, pp.19–26.
(Accessed Oct 2017)
[4] Gorodnichy, D.O. & Yogeswaran, A., 2006. Detection and tracking of pianist hands and
fingers. Computer and Robot Vision, 2006. The 3rd Canadian Conference on, p.63.
(Accessed Oct 2017)
[5] Dorfmuller-Ulhaas, K. & Schmalstieg, D., 2001. Finger tracking for interaction in
augmented environments. Augmented Reality, 2001. Proceedings. IEEE and ACM
International Symposium on, pp.55–64.
[6] Gesture-Based Interfaces: Practical Applications of Gestures in Real World Mobile
Settings – Julie Rico, Andrew Crossan, and Stephen Brewster (Accessed Oct 2017)
[7] Straw, Andrew D. & Dickinson, Michael H., 2009. Motmot, an open-source toolkit for
realtime video acquisition and analysis.(Research). Source Code for Biology and
Medicine, 4, p.5. (Accessed Oct 2017)
[8] Popa, D., Gui, V. & Otesteanu, M., 2015. Real-time finger tracking with improved
performance in video sequences with motion blur. 2015 38th International Conference on
Telecommunications and Signal Processing, TSP 2015, pp
[9] Facts, S. (2018). Topic: Smartphones. [online] Available at: [Accessed 7 Oct. 2017].
[10] (2018). Visualising Fitts's Law. [online] Available at: [Accessed 14 Oct. 2017].
[11] (2018). Basic Gesture Recognition with a Kinect and OpenCV –
Finding the Hand – Jack Romo. [online] Available at:

finding-the-hand/ [Accessed 9 Dec. 2017].
[12] (2018). Hand Tracking And Recognition with OpenCV.
[online] Available at:
recognition-with-opencv/ [Accessed 7 Mar. 2018].
[13] (2018). Python Programming Tutorials. [online] Available
at: -Cascade -face-eye-detection-python-opencv-
tutorial/ [Accessed 7 Mar. 2018].
[14] Google Books. (2018). Learning OpenCV. [online] Available at:
f=false [Accessed 7 Mar. 2018].

[1]. Visual Tracking of Bare Fingers for Interactive Surfaces – Julien Letessier, Francois
Berard (Accessed Oct 2018)
[2]. Bare-Hand Human-Computer Interaction – Christian Von Hardenberg, Francois Berard
(Accessed Oct 2017)
3. Lightweight Palm and Finger Tracking for Real-Time 3D Gesture Control – Georg
Hackenberg, Rod McCall, Wolfgang Broll (Accessed Oct 2017)
4. Detection and Tracing of pianist hands and fingers – Dimitry O.Gorodnichy and Arjun
Yogeswaran (Accessed Oct 2017)
5. Finger tracking for interaction in augmented environments – Klaus Dormuller-Ulhaas,
Dieter Schmalstieg (Accessed Oct 2017)

8. Real-Time Finger Tracking with improved Performance in Video Sequences with Motion
Blur – Daniel Popa, Vasile Gui and Marius Otesteanu (Accessed Oct 2017)
9. Smartphones industries: Statistics and Facts – (Accessed Oct 2017)
10. An Interactive Visualisation of Fitts's Law with JavaScript and D3 – Simon Wallner, Otilia
Danet, Trine Eilersen, and Jesper Tved. (Accessed Oct 2017)

[21] Rosebrock, A. (2018). Thresholding: Simple Image Segmentation using OpenCV -
PyImageSearch. [online] PyImageSearch. Available at:
[Accessed 7 Mar. 2018].

[22] (2018). OpenCV: Image Thresholding. [online] Available at: [Accessed 7 Mar. 2018].

[23] M. Sezgin, B. Sankur, “Survey over image thresholding techniques and quantitative

performance evaluation”, Journal of Electronic Imaging 13 (1) (2004) 146–168.

[24] (2018). Image Processing - Binarization. [online] Available at: [Accessed 7 Mar. 2018].

[25] (2018). OpenCV: Morphological Transformations. [online] Available at: [Accessed 8 Mar. 2018].

[26] (2018). Eroding and Dilating — OpenCV documentation. [online]
Available at: [Accessed
8 Mar. 2018].

[27] (2018). OpenCV: Contours : Getting Started. [online] Available at: [Accessed 8 Mar. 2018].

[28] (2018). OpenCV: Contour Features. [online] Available at: [Accessed 18 Mar. 2018].

[29] Webcam, C. and Webcam, C. (2018). Logitech C922 Pro Stream 1080P Webcam for Game
Streaming. [online] Available at:
stream-webcam [Accessed 17 Mar. 2018].

[30] Papitawholesale. (2018). Samsung Galaxy S9 Plus Dual Sim - 64GB, 6GB Ram, 4G LTE- Grey.
[online] Available at:
6gb-ram-4g-lte-grey?variant=11185661804587 [Accessed 17 Mar. 2018].

[31] (2018). [online] Available at: [Accessed 17 Mar. 2018].

[32] Kadir, Samin & Ibrahim, 2012. Internet Controlled Robotic Arm. Procedia
Engineering, 41, pp.1065–1071.

[33] Yu, T., Zhang, C., Cohen, M., Rui, Y., and Wu, Y. “Monocular video
foreground/background segmentation by tracking spatial-color gaussian mixture models.". In
Motion and Video Computing, 2007. 23 Feb. 2007.

// change reference styel ^^^^

[34] Ghotkar, A. S., and Kharate, G. K. "Hand segmentation techniques to hand gesture
recognition for natural human computer interaction." International Journal of Human,
University of Pune, India, 2012.

[35] Song, P., Yu, H., and Winkler, S. "Vision-based 3D finger interactions for mixed reality
games with physics simulation." In Proceedings of the 7th ACM SIGGRAPH International
Conference on Virtual-Reality Continuum and Its Applications in Industry. ACM, 2008,
Singapore, December 08, 2008.

Appendix A – User Manual

Crisanto Da Cunha
Toolkit support for the analysis of finger movement in a video stream

Finger Movement Toolkit
User Manual

Product Description

This toolkit is able to take recordings and run them through a software that is able to analyse
the finger movements within the video stream. The threshold (how sensitive the software
needs to be to eliminate background noise) can be altered to achieve best results when
analysing a video sample.

System Requirements

The video camera required to capture the footage for analyse needs to be of highest quality,
preferably 1080p with 60 fps or higher, this is to ensure the highest accuracy possible when
running through the software in addition to avoiding missed frames.

Hardware setup

//pics of setup

Download and Install the toolkit software

To begin, download the code that is linked below and copy and paste it in any code editor,
preferably in PyCharm.

Open Toolkit

Windows Open Toolkit using any of the following:

• Create a folder called ‘finger movement toolkit’
• Download the code into a code editor and save in the folder.
• Insert any video file that you want to run through the toolkit into the same folder
• Follow the link and instruction to download python for windows:
• Go into the folder and right click and select the option “open command window here”
• While in command promp enter the follow line: python

Mac Open Toolkit using any of the following:

• Create a folder called ‘finger movement toolkit’
• Download the code into a code editor and save in the folder.
• Insert any video that are going to be used in the same folder
• Open “terminal” and go into the folder that contains your project
• While in the folder use the following command to compile the code and analyse your
video file: : python <full_path_of_video_file>


• Lines 46, 72 and 73 can be varied to change threshold area for hand detection as well
as screen detection. //make more clear
• Additional modules (methods) can be added onto toolkit to detect/analyse other
• Line 229 you can alter the output that is written to the results file.

Appendix B – 6.3 Questionnaire

1.) Are you from a technical background (i.e. computer science,
engineering, physics etc.)?
No I am not.

2.) Do you have any experience coding?
No, I may have done some coding in ICT in school but I don’t really remember.
Even if I did it might have been excel or very basic.

3.) How did you find the set up process?
So I had the user manual but I mainly looked at the pictures and just tried to
replicate that, it wasn’t too difficult the hardest part was probably putting the
webcam onto the tripod. Other than that it was simple and when I plugged the
webcam into the computer it worked straight away. So it was very easy and simple
and took a short time to do.

4.) What experiments did you run?
I wanted to start slow to understand what I was doing so I just started by typing my
name into notes and then I tried to draw a smiley face in notes.

5.) Did you understand the code/ what was going on?
No I didn’t, but I knew what the actual program did and I knew I had to change
one number on line 74 to change how accurate I want the tracking to be, so I took
different videos and changed the number on each one and played the code to see
how it changes.

6.) How accurate is the tracking?
From what I can see the tracking is very accurate, I was very surprised on how
accurate it actually is. However, it goes bug out at times and picks up other parts

of my body like my knuckle and will get confused but normally goes back to
tracking my fingertips.

7.) Can you see this toolkit being use?
Personally for me I would never use it, but for someone whose doing research
work or in this field I can see it being an easy assemble tool to use to get quick
results. Especially if they know coding and are able to understand whats going on.