Professional Documents
Culture Documents
Submitted By
SANA ASHFAQ
IQRA MEHDI
[2019 - 2021]
From
Submitted By
Sana Ashfaq
Iqra Mehdi
DATE:03- March-2022
Acknowledgement
We are profoundly grateful to Prof.Farhan Ahmed siddiqui for
his expert guidance and continuous en-couragement
throughout to see that this project rights its target since its
commencement to its completion.
Modern world is changing in each pulse. New technologies are taking place in every sector of
our day to day life. Image processing is one of the major pioneer in this changing world. With a single
click many thing are taking place. Many things are possible with the help of an image. A text image can
be converted from one language to another without any help from a human interpreter. One can also
save his or her time to text someone with an image as a single image explains many things. Images are
also used to identify a person on the social media and in many other web. For this fact Face Detection is
getting very popular every day. With the help of Face Detection it is possible to identify a person very
easily. What if one could tell what type of emotional state a person is in? It would help one to approach
that person. For example if a person is sad can do something to make him or her feel happy and so on.
In this project it has been searched that is it possible to identify a person is it possible to identify a
person’s emotional state. Then it has been also researched to suggest music on the basis of his or her
emotion.
A facial expression is the visible manifestation of the affective state, cognitive activity, intention,
personality and psychopathology of a person and plays a communicative role in interpersonal relations.
It has been studied for a long period of time and obtaining the progress recent decades. Though much
progress has been made recognizing facial expression with a high accuracy remains to be difficult due to
The system and model were designed and tested on visual studio followed by python
environment that is capable of recognizing continuous facial expressions in real time that is
implemented on a pc.A system was designed to display the participant’s video in real time.
TABLE OF CONTENTS:
Abstract
Chapter 1: Introduction
1.1 Introduction.........................................................................................................................................9
1.2 Motivation..........................................................................................................................................10
1.3 Rational of the study..........................................................................................................................11
1.4 System design....................................................................................................................................11
1.5 Expected output................................................................................................................................12
1.6 Scope and applications......................................................................................................................12
Chapter 2: Backgroup
12.1 Conclusion....................................................................................................................................49
12.2 Future work..................................................................................................................................50
LIST OF FIGURES:
FIGURE 1: Demo image of face emotion detection
Facial emotion recognition is the process of detecting human emotions from facial expressions.
The human brain recognizes emotions automatically, and software has now been developed that can
recognize emotions as well. This technology is becoming more accurate all the time, and will eventually
AI can detect emotions by learning what each facial expression means and applying that knowledge to
the new information presented to it. Emotional artificial intelligence, or emotion AI, is a technology that
is capable of reading, imitating, interpreting, and responding to human facial expressions and emotions.
The human face is an important organ of an individual’s body and it especially plays an important
role in extraction of an individual’s behavior and emotional state. As humans, we classify emotions all
the time without knowing it. Nowadays people spend lots of time with their works. Sometimes they
forget that they should also find some time for themselves. In spite of their busyness if they see their
facial expression then they may be try to do something different. For example, suppose if anyone see
that his or her facial expression is happy then he or she will try to be more happier. On the other hand, if
anyone see that his or her facial expression is sad then he or she will improve his or her mental
condition. Facial expression plays an important role for detecting human emotion. It is a valuable
indicator of a person. In a word an expression sends a message to a person about his or her internal
feeling. Facial expression is the most important application of image processing. In the present age , a
huge research work on the field of image processing. Facial image based mood detection techniques
provides a fast and useful result for mood detection. The process of recognizing the expression of
feelings through facial emotion was an interesting object since the time of Aristotle. After 1960 this
topic became more popular , when a list of universal emotion was established and different system
were proposed. Because of the arrival of modern technology our expectation goes high and it has no
limitation. As a result people try to improve this image based mood detection in different ways. There
are six basic universal emotions for human beings. These are happy, sad, angry, fear, disgust and
surprise. From human’s facial expression we can easily detect this emotion. In this research we will
proposed a useful way to detect happy, sad and angry these three emotions from frontal facial emotion.
Our aim, which we believe we have reached, was to develop a method of face mood detection that is
fast, robust, reasonably simple and accurate with a relatively simple and easy to understand algorithms
and techniques.
Virtual learning is increasing day by day and human computer interaction is a necessity to make virtual
learning a better experience.The emotions of a person plays a major role in the learning process.Hence
the proposed work ,detects the emotions of a person by his face expression.
For a facial expression to be detected face location and area must be known,therefore in most cases
emotion detection algorithms start with face detection taking into account the fact that face emotions
are mostly depicted using the mouth.Eventually,algorithms for eye and mouth detection and tracking
are necessary,in order to provide the features for subsequent emotion recognition.In this project we
1.2 MOTIVATION:
In previous time, for psychologist, analyzing facial expression was an essential part. Nowadays image
processing have motivated significantly on research work of automatic face mood detection. There are
lots of depressed people lived in our society. Also lots of busy people those who do not know their
present mental condition. So we try to develop such an application and by this application they will able
Image Processing is a useful method for performing different operations on an image to get a
better image or to get some useful information from it. Normally image processing method
consider an images as a two dimensional signals. Because of this usefulness of image processing, in
our research we are dealing with this method. Mainly the project aim is to detect human’s facial
expression by applying image processing techniques and send them a massage about their internal
Real-time facial emotion recognition, in general, is divided by several phases. The first
phases would be detecting general area of human face, this process include tracking system
which require the hardware to monitor the general movement of facial layout. Second phase
would be Facial landmarking, which pointing out more accurate facial point to be extracted.
Of course, the writers have to choose the most efficient way to compute and doing facial
analysis to prevent computer's memory overload. On the other hand the system still need a
feature which is able to extract the subject's facial expression and then further processed it
into the emotion as the result. The writer needs to be sure the computer able to properly do
image acquisition, facial extraction, facial landmarking, and logic computation to get the
emotion from the subject. In short, a working webcam, and sufficient computation power on
the subjected computer is crucial. The location of facial features can be represented as
landmarks on the face. The image acquisition of human face can be represented as
coordinated vector landmark and Action Units (AU). Hence, facial feature offered geometric
information of each and overall shape of particular object . Facial expression extraction and
• User can get rid of their mental depression and also release their tension.
The scope of this system is to tackle with the problems that can arise in day to day life. Some of the
scopes are:
1.The system can be used to detect and track a user’s state of mind.
2.The system can be used in mini-marts, shopping center to view the feedback of the customers to
enhance the business.
3.The system can be installed at busy places like airport, railway station or bus station for detecting
human faces and facial expressions of each person. If there are any faces that appeared suspicious like
4.The system can also be used for educational purpose such as one can get feedback on how the
5.This system can be used for lie detection amongst criminal suspects during interrogation.
6.This system can help people in emotion related -research to improve the processing of emotion data.
7.Clever marketing is feasible using emotional knowledge of a person which can be identified by this
system.
CHAPTER 2: BACKGROUP
2.1 PLANNING:
In planning phase study of reliable and effective algorithms is done. On the other hand
data were collected and were preprocessed for more fine and accurate results. Since huge amount of
data were needed for better accuracy we have collected the data surfing the internet. Since we are new
to this project we have decided to use python.We have decided to implement these algorithms by using
open cv framework.
Producing a neural network model capable of accurately classifying emotions from facial expressions is
not easily achieved without extensive and representational training data. Images and videos captured
for this application area may suffer from poor lighting conditions, varying angles, and proximity to the
face; factors such as gender, age, and ethnicity can influence the expression of emotions. Meanwhile,
some facial expressions remain subjective, identified differently by separate individuals, and unsuitable
to place into a single class. As a result of these issues, numerous methods of data collection and
modelling have been implemented by researchers to obtain the best results. Kuo et al. Describe the
challenges of data collected ‘in-the-wild’, such as lighting conditions and head poses, which can lead to
modeloverfitting. Therefore, to produce a robust deep learning model capable of facial expression
recognition, training included the use of various types of data sets. The first, referred to as Set A, were
images obtained from laptop webcams capturing three angles. Participants were shown a selection of
videos, aimed to produce arousal of the seven common expressions, and afterwards annotated their
own images. This is an interesting method as it is likely to be more accurate compared to an observer
labelling the displayed emotion. Set B was a collection of images obtained from a Google search of
keywords, such as ‘angry/neutral expression’. These were annotated depending on the keyword
attached to each image and included a variety of head angles and partial faces. The last set was a
grouping of images from movies and television, more complex than set B due to stronger image
contrast. This work is illustrative of the different methods that can be used to obtain a wide variety of
data. As Deep Learning (DL) techniques burgeon, the variety of areas in which they can be implemented
continues to grow. These advancements are making headway into schools and the workplace to
enhance experiences and manage peoples’ well-being. Bobade and Vani present various strategies
where Machine Learning (ML) and DL can be used to detect stress levels of people using data obtained
from wearable sensing technology. This work used the publicly available data set WESAD (Wearable
Stress and Affect Detection) to create deep learning classification models to detect stress.
Electrocardiogram (ECG), body temperature, and Blood Volume Pulse (BVP) are some examples of the
data in WESAD, and their values fall into one of three classes: amusement, neutral, and stress. A
Electronics 2022, 11, 118 3 of 26 variety of machine learning and deep learning methods, including
Random Forest (RF) and Support Vector Machine (SVM), and an Artificial Neural Network (ANN) were
compared for performance. It was found that the machine learning methods reached an accuracy of up
to 84.32%, comparatively to the ANN, which achieved 95.21%. This work shows the generalisability of
trained ML algorithms and DL networks to real world problems, such as detecting the physiological
responses to stressful situations. Innovative research combining ML and DL techniques in FER systems
have reached high levels of accuracy and generalised well to unseen data. Ruiz-Gaecia et al.Produce
a hybrid emotion recognition system for a socially assistive robot that makes use of a Deep CNN (Deep
Convolutional Neural Network) and an SVM to achieve an average accuracy score across the 7 cardinal
emotions of 96.26%. This specific combination of algorithms achieved the highest score, compared to an
average of 95.58% for Gabor filters and SVM; 93.5% for Gabor filters and MLP (Multi-Layer Perceptron);
and 91.16% for a combination of CNN and MLP. Achieving accuracy rates north of 90% for facial
expression recognition can be considered significant, considering the difficulty of the task. This work
shows the advantages in performance where a combination of techniques is used. A further useful
application of such techniques is the monitoring of driver concentration levels in vehicles. Natio et al.
Aimed to identify when a driver was drowsy from their facial expressions using recordings obtained from
a 3D camera. Participants used a driving simulator for five hours each whilst in a dark room, during
which they were wearing an Electroencephalograph (EEG) to measure frequency of brain wave activity.
This was performed to categorise two states: drowsy, and awake. The 3D camera captured 78 points on
participants’ faces at 30 Frames Per Second (FPS), and these were used to visually support the labelling
of each state from data captured by the EEG. Using K-Nearest Neighbour (KNN) algorithm, the work
achieved 94.4% accuracy. This work is demonstrative of the supporting evidence facial expressions can
provide when using other types of data to train machine learning models. Goodfellow et al. Proposed
a zero-sum game concept in order to sufficiently train generator algorithms to reproduce similar data to
an existing training set. The framework was coined as a Generative Adversarial Network (GAN), and it
works in the described way. A Generator model takes a fixed-length random vector, which is taken from
a Gaussian distribution, as input and produces a sample within this domain. A normal classification
model, known as the Discriminator, classifies each image as fake or real (0 or 1), and its weights are
updated during training to improve its classification performance. The Generator is updated depending
on how well its generated samples have fooled the Discriminator; when the Discriminator successfully
classifies the generated samples, the Generator is penalised with large parameter updates. The current
SOTA GAN architecture performance can be attributed to the work by Karras et al. The research
implements improvements to address some issues identified with images produced by StyleGAN. For
example, smudge-like areas could be seen on all generated images above 64x64 resolution; the authors
attribute this to the use of Adaptive Instance Normalisation (AdaIN), where mean and variance values of
each feature map are normalised, causing a destruction of data. To combat such issues, the work re-
engineered generator normalisation with regularisation, and progressive growing. The work achieved
SOTA performance, such as an FID score of 2.32 on the LSUN Car data set (Yu et al. ). Johnston et al.
identified the standardised process for the creation of IUIs as an under-researched area. The work
aims to improve user experience universally through provision of a framework for the development of
IUIs. Specifically, it looks to combine the dynamic, adaptive, and intelligent components of intelligent
interface development. Dynamic elements are responsible for providing basic user experience, and are
referred to as being understanding of the user, along with the device and environment. Secondly,
adaptive elements allow for recognition of the user’s activity pipeline, and cover usability and
accessibility aimed at enhancing user experience. Machine Learning algorithms are Electronics 2022, 11,
118 4 of 26 implemented to develop intelligent components of IUIs; they are used to interact specifically
with the user’s interests, preferences and ease their workload. Intelligent components have an
understanding of the user’s end goal whilst using an application. The work argues that introducing a
standardised framework to this field would inevitably improve the journey through an application, thus
reducing cognitive load for users. The authors also discuss methods of testing when systems are built
using a standardised framework, such as gaze and eye tracking with sensor technology, and
electroencephalography to analyse brain activity. This work also identifies that using ML to assist the
adaptability of user interfaces is a limited area of research. Liu et al.Developed an Adaptive User
Interface (AUI) that is based on Episode Identification and Association (EIA). This concept refers to the
interface recognition of a user’s actions prior to them being carried out. This is achieved through trace
and analysis of action sequences created by the system observing interaction between user and
application. Episodes are derived from sets of actions to create classes, such as typing, menu selection,
pressing of buttons. This allows for recognition in user behaviour patterns, meaning that the system is
able to provide assistance preemptively. To test their proposed work, the study developed an AUI using
applied EIA under an existing, sophisticated application, Microsoft Word. The interface provides
assistance in two forms: firstly, it boasts a phrase association, where words inputted by the user are
treated as parts of an episode, storing up to five words. This enables word and phrase suggestion to the
user. Secondly, it offers assistance to help with paragraph construction by identifying commonly used
configurations, such as changing the font size, or changing text to bold. The system was tested by
nontechnical university staff and students, and the research recorded acceptance rates by user for
phrase association at 75%, and format automation at 86%. Feedback was gathered through completion
of a questionnaire for participants to rate their experience with the system; on a scale of 1 (very poor)
and 7 (very good), the phrase association was rated at 5.78, and for format automation, 5.69, for quality
measure. The study also concluded that usage of the system increased individuals’ productivity, for a
test and control group comparatively. Stumpt et al. Conducted a study that analysed the effect that
real-time user feedback via keyboard input had on training of a Neural Network, particularly in cases
where training data is limited. Forty-three English-speaking University students who were proficient in
using email were to classify 1151 emails into appropriate folders given their content. The purpose of the
work was to investigate the concept of Programming by Demonstration, whereby the end-user of a
system is able to teach a machine patterns of behaviour through demonstration. The work developed an
email application that is demonstrative of how interaction between ML and end users can improve
predictions; the classifier ‘explains’ its reasoning for outputting a prediction, alongside providing ways
that the user can give feedback. It makes use of a ‘feedback panel’, where users can update the words
found in emails they want the classifier to consider as keywords, remove words from the keyword
watch-list, and attach a weighted value to each word using a slider function. The authors liken this
approach to User Co-training, seen in semi-supervised learning, where two classifiers use different
features of the same data to classify samples. Labelled data is used at first for training, then they are
tested on unseen data. The samples classified with the highest confidence are appropriately labelled and
appended to the training set for further training. In this study, the user acts as a classifier which is
responsible for labelling data for the alternate classifier, the Naive Bayes algorithm. The work concluded
that providing the user with feedback from the classifier is beneficial to them, and providing the
classifier with rich user feedback aids to improve accuracy considerably. Research into the emotional
well-being of employees in the workplace remains a wellpopulated field of research. Technology and AI
can play significant parts in the capture and analysis of real-time data to better understand the working
system), which uses ML to detect real-time emotions, and a messaging system to make employees
aware of their Electronics 2022, 11, 118 5 of 26 overall emotional well-being during the working day.
Data was captured through employee webcams and the CNN was engineered to classify six emotions,
namely happiness, sadness, surprise, fear, disgust, and anger. The work also developed a GUI to display
information to the end user. For example, members of staff can navigate the interface and view the
consolidated percentage of emotions expressed by a specific employee. Yan et al. Carried out
research into Mental Illness among employees in health care settings, and developed a solution to aid
self-assessment of mental status, and encourage early detection of work-related stress and mental
illness. In their research, 352 medical staff completed psychological assessments, such as the Emotional
Labour and Mental Health questionnaire. This was designed to assess employees in two major areas:
firstly, their ability to deal with any external changes at work, and secondly, their internal resilience and
endurance to hardship at work. When an employee scores a high probability in the various categories of
questions, a classifier outputs 1 of 4 classes, and is then prompted to see a specialist for help if
classification is of a certain class. Walambe et al.Discuss that mental health and well-being remains
a neglected part of peoples’ lives, even though the impact can be so significant. Although it is crucial,
identifying stress levels and pinpointing the trends is challenging, and often relies on many factors.
Therefore, a better solution is needed; however, the work explains that solutions implementing
Machine and Deep Learning for this task are very few.
The objectives of the work will be achieved through the implementation of various concepts. Firstly, a
CNN will be trained on a vast database of images to detect the differences between facial features for
various emotions. The network will be optimised using several strategies, and a wide range of
implementation experimented with to achieve the best possible results. The optimised model, reaching
the best performance metrics, and its’ weights, will be saved and deployed to capture and classify facial
expressions via live webcam. A timer will start when a face is detected in the camera frame, and if
specified emotions are detected by the classifier, after a time threshold, a Graphical User Interface (GUI)
will be shown to the user. This will enable browsing of images intended to improve mood. The images
are false data, generated prior to this by a GAN. The system architecture is illustrated in Figure 1. The
work experiments with three different Models, whereby the number of classes is altered for each.
• Camera
3.3 TRAINING:
The paper experimented with three Models in relation to CNN training, which involved using the same
data, but dividing it into a different number of classes for each. This was completed to maximise
experimentation and investigate the best method to apply to the task. Additionally, this field of research
is typically dominated by classifying expressions into seven classes. This is necessary when all classes are
relevant to the application, however, the objectives of this research enable flexibility in this approach.
The prototype system for emotion recognition is divided into 3 stages: face detection, feature
extraction andemotion classification. After locating the face with the use of a face detection
algorithm, the knowledge in the symmetry and formation of the face combined with image
processing techniques were used to process the enhanced face region to determine the feature
locations. These feature areas were further processed to extract the feature points required for the
emotion classificationstage. From thefeature points extracted, distances among the features are
calculated and given as input to the neural network to classify the emotion
contained. The neural network was trained to recognize the 6 universal emotions.
The prototype system offers two methods for face detection. Though variousknowledge based and
template based techniques can be developed for face location determination, we opted for a feature
invariant approach based on skin color as the first method due to its flexibility and simplicity. When
locating the face region with skin color, several algorithms can be found for different color spaces .
After experimenting with a set of face images, the following condition 1was developed based on
H and S are the hue and saturation in the HSV color space. For accurate identification of the face,
largest connected area which satisfies the above condition is selected and further
refined. In refining, the center of the area is selected and densest area of skin colored pixels
The second method is the implementation of the face detection approach by Nilsson and others'
classifier to detect the face. This classifier results more accurate face detection than the hue and
saturation based classifier mentioned earlier. Moreover within the prototype system, the user also has
the ability to specify the face region with the use of the mouse.
In the feature extraction stage, the face detected in the previous stage is further processed to
identify eye, eyebrows and mouth regions. Initially, the likely Y coordinates of the
eyes was identified with the use of the horizontal projection. Then the areas around the y coordinates
were processed to identify the exact regions of the features. Finally, a corner point detection
algorithm was used to obtain the required corner points from the feature regions.
The extracted feature points are processed to obtain the inputs for the neural network. The neural
network has being trained so that the emotions happiness, sadness, anger, disgust, surprise and
fear are recognized. 525 images from Facial expressions and emotion database are taken to
train the network. However, we are unable to present the results of classifications since the
network is still being tested. Moreover we hope to classify emotions with the use of the
Human beings communicate through facial emotions in day to day interactions with others.Human
perceiving the emotions of fellow human is natural and inherently accurate.Human can express their
state of mind through emotions.Many times emotions indicates that a human needs help.Computer
method for physically disabled and to those who are unable to express thier requirement by voice or by
other means and especially to those who are confined to bed.The human emotion can be detected
through facial actions or through bisensors.Facial emotions are imaged through still or video cameras.
From still images taken at discrete times the changes in eye and mouth areas can be exposed.Measuring
and analysing such changes will lead to the determination of human expression.
CHAPTER 4: FACE DETECTION APPROACHES:
These methods aim to find structural features that exist even when the pose,viewpoint or
lighting conditions vary and then use these to locate faces.These methods are designed mainly for face
localization.
4.2 TEXTURE:
Human faces have a distinct texture that can be used to separate them from different
objects.The textures are computed using second order statistical features on sub images of 16x16
pixels.Three types of features are considered:skin,hair and others.To infer the presence of a face from
the texture labels.One advantage of this approach is that it can detect faces which are not upright or
have features such as beards and glasses.
Human skin color has been used and proven to be an effective feature in many applications
from face detection to hand tracking although different people have different color.Several studies have
shown that the major difference lies largely between their intensity rather than their
chrominance.Several color spaces have been utilized to label pixels as skin including RGB,Normalized
The functional requirements for a system describe what the system should do. Those
requirements depend on the type of software being developed, the expected users of the software.
These are statement of services the system should provide, how the system should react to particular
Nonfunctional requirements are requirements that are not directly concerned with the
specified function delivered by the system. They may relate to emergent system properties such as
reliability, response time and store occupancy. Some of the nonfunctional requirements related with
1.REALIBILTY:
Reliability based on this system defines the evaluation result of the system, correct
identification of the facial expressions and maximum evaluation rate of the facial expression
The system is simple, user friendly, graphics user interface implemented so any can use this system
Before starting the project, feasibility study is carried out to measure the viable of the system.
Feasibility study is necessary to determine if creating a new or improved system is friendly with the cost,
benefits, operation, technology and time. Following feasibility study is given as below:
Technical feasibility is one of the first studies that must be conducted after the project has been
identified. Technical feasibility study includes the hardware and software devices. The required
technologiespytohn existed.
Operational Feasibility is a measure of how well a proposed system solves the problem and takes
advantage of the opportunities identified during scope definition. The following points were considered
Schedule feasibility is a measure of how reasonable the project timetable is. The system is found
schedule feasible because the system is designed in such a way that it will finish prescribed time.
• Open cv
• Numpy
• Keras
• Matplotlib
algorithm used to identify faces in an image or real time video.This method was proposed by Paul Viola
and Michael Jones in their paper.Haarcascade is a machine learning based approach where a lot of
• POSITIVE IMAGE:
These images contain the images which we want our classifier to identify.
• NEGATIVE IMAGE:
Images of everything else which do not contain object we want to detect.
face_cascade=
cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_default.xml’)
• EXPLANATION:
Haar Cascade Detection is one of the oldest yet powerful face detection algorithms invented. It has
been there since long, long before Deep Learning became famous. Haar Features were not only used to
detect faces, but also for eyes, lips, license number plates etc. The models are stored on GitHub, and we
The algorithm uses edge or line detection features proposed by Viola and Jones in their research
paper “Rapid Object Detection using a Boosted Cascade of Simple Features” published in 2001. The
algorithm is given a lot of positive images consisting of faces, and a lot of negative images not consisting
of any face to train on them. The model created from this training is available at the OpenCVGitHub
repository. The repository has the models stored in XML files, and can be read with the OpenCV methods.
These include models for face detection, eye detection, upper body and lower body detection, license
plate detection etc. Below we see some of the concepts proposed by Viola and Jones in their research.
areas in the haar feature are pixels with values 1, and the lighter areas are pixels with values 0. Each of
these is responsible for finding out one particular feature in the image. Such as an edge, a line or any
structure in the image where there is a sudden change of intensities. For ex. in the image above, the haar
feature can detect a vertical edge with darker pixels at its right and lighter pixels at its left.
The objective here is to find out the sum of all the image pixels lying in the darker area of the haar
feature and the sum of all the image pixels lying in the lighter area of the haar feature. And then find
out their difference. Now if the image has an edge separating dark pixels on the right and light pixels on
the left, then the haar value will be closer to 1. That means, we say that there is an edge detected if the
haar value is closer to 1. In the example above, there is no edge as the haar value is far from 1.
This is just one representation of a particular haar feature separating a vertical edge. Now there are other
haar features as well, which will detect edges in other directions and any other image structures. To
detect an edge anywhere in theimage,thehaar feature needs to traverse the whole image.
and which are more related to human expressions. These features are taken as a subset from an
existing marker-less system for landmark identification and localization, which has actual 66 2D features
. These features or landmarks on the face are given. Landmark positions in the image space are used to
define two set of features: eccentricity features and linear features.
8.4 FLOWCHART :
In this section we have used flow chart to elaborate and enhance the face emotion
images of different facial expressions. The system includes the training and testing phase followed by
image acquisition, face detection, image preprocessing, feature extraction and classification. Face
detection and feature extraction are carried out from face images and then classified into six classes
The input video of an e-learning student is acquired using an image acquistion device and stored into a
database .This video is extracted and fragmented into several frames to detect the emotions of the e-
learning student and to thereby improve the virtual learning environment.By the video acquistion
feature which is used to record and register the on-going emotional changes in the eye and lip
region.The videos are recorded into a database before processing thereby making it useful to analyse
the changes of emotion for a particular subject or during a particular time of the day.
9.2 IMAGE ACQUISTION:
Images used for facial expression recognition are static images or image sequences. Images of
Face Detection is useful in detection of facial image. Face Detection is carried out in training
dataset using Haar classifier called Voila-Jones face detector and implemented through Opencv. Haar
like features encodes the difference in average intensity in different parts of the image and consists of
black and white connected rectangles in which the value of the feature is the difference of sum of pixel
Two separate eye maps are built,one from the chrominance component and the other from the
luminance component.These two maps are then combined into a single eye map.The eye map from the
chrominance is based on the fact that high Cb and low Cr values can be found around the eyes.
Eyes usually contain both dark and bright pixels in the luminance component so gray scale
operators can be de-signed to emphasize brighter and darker pixels in the luminance component around
eye regions.Such operators are dilation and erosion.We use gray scale dilation and erosion with a
The eye map from the chrominance is then combined with the eye map from the luminance by
The resulting eye map is then dilated and normalized to brighten both the eyes and suppress
other facial areas.Then with an appropriate choice of a threshold,we can track the location of the eye
region.
9.5 MOUTH DETECTION:
To locate the mouth region we use the fact that it contains stronger red components and weaker blue
components than other facial regions(Cr>Cb),so the mouth map constructed as follows:
Image pre-processing includes the removal of noise and normalization against the variation
a)Color Normalization
b)Histogram Normalization
classification problem. The image of face after pre-processing is then used for extracting the important
features. The inherent problems related to image classification include the scale, pose, translation and
The important features are extracted using LBP algorithm which is described below:
LBP is the feature extraction technique. The original LBP operator points the pixels of an
image with decimal numbers, which are called LBPs or LBP codes that encode the local structure around
each pixel. Each pixel is compared with its eight neighbors in a 3 X 3 neighborhood by subtracting the
center pixel value. In the result, negative values are encoded with 0 and the others with 1. For each
given pixel, a binary number is obtained by merging all these binary values in a clockwise direction,
which starts from the one of its top-left neighbor. The corresponding decimal value of the generated
binary number is then used for labeling the given pixel. The derived binary numbers are referred to be
Local Binary Pattern (LBP) is a very simple and robust technique for feature extraction and is
independent of the illumination differences in the images. A 3x3 matrix is generated and each of the
pixels in the matrix is assigned a binary value depending on the center pixel value. These 8- bit binary
values form an 8 bit binary number excluding the center pixel value which is finally converted to decimal
value [6]. 11 LBP code for a pixel at (xc, yc) is given by [6] LBPP,R (xc, yc) = ∑(P=0,7) S(gp−gc) 2P , S(x) =
Figure 6 shows an input image block with center pixel value as 7. Each value of the surrounding 8
pixels is reassigned by comparing it to the intensity of the center pixel. Pixel values greater than or equal
to the center pixel are assigned 1; otherwise 0. Figure 7 shows the binary values assigned to all the 8
pixels, which combined in clockwise direction, gives a binary 8 bit number. Converting the 8-bit number
10.1.2 FRAMEWORK:
10.1.2.1 OPENCV:
OpenCV is the huge open-source library for the computer vision, machine learning, and image
processing and now it plays a major role in real-time operation which is very important in today’s
systems. By using it, one can process images and videos to identify objects, faces, or even handwriting
of a human. When it integrated with various libraries, such as NumPy, python is capable of processing
theOpenCV array structure for analysis. To Identify image pattern and its various features we use
vector space and perform mathematical operations on these features.
The first OpenCV version was 1.0. OpenCV is released under a BSD license and hence it’s free for
both academic and commercial use. It has C++, C, Python and Java interfaces and supports Windows,
Linux, Mac OS, iOS and Android. When OpenCV was designed the main focus was real-time
applications for computational efficiency. All things are written in optimized C/C++ to take advantage
of multi-core processing.
• APPLICATIONS OF OPENCV:
There are lots of applications which are solved using OpenCV, some of them are listed below:
• face recognition
• Object recognition
• number of people – count (foot traffic in a mall, etc)
• Vehicle counting on highways along with their speeds
• Interactive art installations
• Anamoly (defect) detection in the manufacturing process (the odd defective products)
• Street view image stitching
• Video/image search and retrieval
• Robot and driver-less car navigation and control
10.1.2.2 NUMPY:
Python NumPy is a general-purpose array processing package which provides tools for handling the n-
mathematical functions, linear algebra routines. NumPy provides both the flexibility of Python and the
speed of well-optimized compiled C code. It’s easy to use syntax makes it highly accessible and
NumPy is a package that create arrays. It lets you make arrays of numbers with different precision and
Python by itself only has floats, integers, and imaginary numbers. But NumPy expands what Python can
do because it handles:
• 32-bit numbers
• 15 big numbers
• Signed numbers
• Unsigned numbers
But that’s not the only reason to use NumPy. It’s designed for efficiency and scale, making it the
workhouse for large machine learning (ML) libraries like TensorFlow.NumPy is a module for Python. The
name is an acronym for "Numeric Python" or "Numerical Python". It is pronounced /ˈnʌmpaɪ/ (NUM-py)
or less often /ˈnʌmpi (NUM-pee)). It is an extension module for Python, mostly written in C. This makes
sure that the precompiled mathematical and numerical functions and functionalities of Numpy
Furthermore, NumPy enriches the programming language Python with powerful data structures,
implementing multi-dimensional arrays and matrices. These data structures guarantee efficient
calculations with matrices and arrays. The implementation is even aiming at huge matrices and arrays,
betterknow under the heading of "big data". Besides that the module supplies a large library of high-
level mathematical functions to operate on these matrices and arrays.
10.1.2.3 KERAS:
Keras is an Open Source Neural Network library written in Python that runs on top of Theano
orTensorflow. It is designed to be modular, fast and easy to use. It was developed by François Chollet, a
Google engineer.Keras doesn’t handle low-level computation. Instead, it uses another library to do it,
Keras is high-level API wrapper for the low-level API, capable of running on top of TensorFlow, CNTK, or
Theano.Keras High-Level API handles the way we make models, defining layers, or set up multiple input-
output models. In this level, Keras also compiles our model with loss and optimizer functions, training
process with fit function. Keras in Python doesn’t handle Low-Level API such as making the
computational graph, making tensors or other variables because it has been handled by the “backend”
engine.
Keras runs on top of open source machine libraries like TensorFlow, Theano or Cognitive Toolkit
(CNTK).Theano is a python library used for fast numerical computation tasks. TensorFlow is the most
famous symbolic math library used for creating neural networks and deep learning models. TensorFlow
is very flexible and the primary benefit is distributed computing. CNTK is deep learning framework
developed by Microsoft. It uses libraries such as Python, C#, C++ or standalone machine learning
toolkits. Theano and TensorFlow are very powerful libraries but difficult to understand for creating
neural networks.
Keras is based on minimal structure that provides a clean and easy way to create deep learning models
based on TensorFlow or Theano. Keras is designed to quickly define deep learning models. Well, Keras
• FEATURES:
Keras leverages various optimization techniques to make high level neural network API easier and more
performant. It supports the following features −
• Consistent, simple and extensible API.
• Minimal structure - easy to achieve the result without any frills.
• It supports multiple platforms and backends.
• It is user friendly framework which runs on both CPU and GPU.
• Highly scalability of computation.
• BENEFITS:
Keras is highly powerful and dynamic framework and comes up with the following advantages:
• Larger community support.
• Easy to test.
• Keras neural networks are written in Python which makes things simpler.
• Keras supports both convolution and recurrent networks .
• APPLICATIONS OF KERAS:
• Keras is used for creating deep models which can be productized on smartphones.
• Keras is also extensively used in deep learning competitions to create and deploy working models,
10.1.2.4 MATPLOTLIB:
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in
Python.Matplotlib makes easy things easy and hard things possible.It is one of the most powerful
plotting libraries in Python. It is a cross-platform library that provides various tools to create 2D plots
from the data in lists or arrays in python. It also provides an object-oriented API that enables it, in
extending the functionality to put the static plots in applications by using various Python GUI toolkits
available (Tkinter, PyQt, etc). It provides a user to visualize data using a variety of different types of plots
to make data understandable. You can use, these different types of plots (scatterplots, histograms, bar
• FEATURES:
• It provides quite the simplest and most common way to plot data in python.
• It provides such tools that can be used to create publication-standard plots and figures in variety
of export formats and various environments (pycharm, jupyter notebook) across platforms.
• It provides a procedural interface called Pylab, which is used designed to make it work
like MATLAB, a programming language used by scientists, researchers. MATLAB is a paid
application software and not open source.
• It is similar to plotting in MATLAB, as it allows users to have a full control over fonts, lines,
colors, styles, and axes properties like MATLAB.
• Matplotlib along with NumPy can be treated as the open source equivalent of MATLAB.
• It provides excellent way to produce quality static-visualizations that can be used for
publications and professional presentations.
• It also provides compatibility with various of other third-party libraries and packages, that
extend its functionality. For example, seaborn, ggplot that provide more features for plotting,
and basemap and cartopy, that are used to plot geospatial data.
• Matplotlib is a cross-platform library that can be used in various python scripts, any python shell
(available in IDLE, pycharm, etc) and IPython shells (cond, jupyter notebook), the web
application servers (Django, flask), and various GUI toolkits (Tkinter, PyQt, WxPythonotTkinter).
• It is clear that, matplotlib with its various of compatible third-party libraries provide user the
powerful tools to visualize a variety of data.
10.2 SYSTEM TESTING :
10.4 ADVANTAGES:
measure how they react to certain questions. This information can be used to optimize
interview structure for future candidates and streamline the application process. Using
Sightcorps technology you can also measure attention using head orientation/pose analysis.
marketing campaigns, This helps to ensure that they are evoking the right reactions before
launching to the market. Using facial expression recognition software such as DeepSight,
advertisers can see which ads are receiving high engagement and positive emotional responses
from viewers and provides them with the tools to run tests at scale on different target
audiences to ensure that the campaigns with the highest impact are selected.
student journey and improve it where necessary. Assess schools course materials, teaching
styles, structure and layout by way of emotional feedback as student’s go through each module
in real-time. Use true facial responses and engagement levels to find points of interest or
recognition software helping to decide when patients need medicine, assess their emotional
response in clinical trials or to help physicians in deciding how to best triage their patients.
particular behavior and set of emotions from the users. During the testing phase, users are
asked to play the game for a given period and their feedback is incorporated t o make the final
product. Using facial emotion recognition can aid in understanding which emotions a user is
manufacturers around the world are increasingly focusing on making cars more personal and
safe for people to drive. In their pursuit to build smart car features, it makes sense that car
manufacturers use AI to help them understand human emotions. Using facial emotion detection
smart cars can alert the driver when he is feeling drowsy and in turn help to decrease road
casualties.
Using a webcam, a similar experiment with the same three subjects was performed and
compared. An existing API was used for comparison. The API took an input image as well as a
box detecting the face. These emotions are communicated cross-culturally and universally via
the same basic facial expressions, where they are identified by Emotion API . The API used
picture of the subject depicting the expression was taken and then processed. Here, for the sake
of comparison, same attributes of all the subjects were considered, i.e., the values of
For this project as the database we are using firebase where data is stored. It is a cloud hosted Real time
database. It stores data as JSON tree format. Using machine learning language, PCA, MPCA easily can
analyze facial mood expression. After analyzing facial mood expression it detect the face mood and
provides almost 75.7% accurate result and suggest music based on facial mood expression.
At first, user needs to take an image as input. For improving lost contrast, use histogram equalization by
remapping the brightness value of an image. Then detect face boundary, cropping eye and cropping lip
region by PCA and MPCA. Then it sends the image to machine learning kit (ML kit). Machine learning kit
is recently developed by google which has trained data. It provides powerful feature and bear new
information. That’s why machine learning becomes most popular nowadays. Machine learning SDK can
recognize text, detect faces, recognize landmarks, scan bar codes and leveling images. In this project we
use ML kit for detecting face mood. It can detect happiness percentage. But applying some conditions,
using ML kit we develop four facial expressions (Happy, Sad, Calm and Angry) and based on this facial
expression it suggest music from database which is developed in firebase.
The human face is an important organ of an individual‘s body and it especially plays an important role in
extraction of an individual‘s behavior and emotional state. Manually segregating the list of songs and
generating an appropriate playlist based on an individual‘s emotional features is a very tedious, time
consuming, labor intensive and upheld task. We are using 3 types of algorithm in our development. Such
as PCA, MPCA, Machine learning language. We are working with human’s eyes and mouth for emotion
detection. We are working and testing many images to detect human’s emotions. The accuracy of our
research work is 80%. 6.3 Conclusions We have successfully completed our work and the followings are
the output. Those are available in current system. The system thus aim at providing android user with a
cheaper free and user friendly accurate emotion detection system, which is really helpful to the users.
For changing mood system our apps is really helpful. The main advantage of our apps is to detect
accurate human emotions and also suggest music and jokes for changing their mood.
12.1 CONCLUSION:
The face detection and emotion recognition are very challenging problems. They require a heavy effort
for enhancing the performance measure of face detection and emotion recognition. This area of
emotion recognition is gaining attention owing to its applications in various domains such as gaming,
software engineering, and education. This paper presented a detailed survey of the various approaches
and techniques implemented for those approaches to detect human facial expressions for identification
of emotions. Furthermore, a brief discussion on the combination of steps involved in a machine learning
based approach and geometric-based approach for face detection and emotion recognition along with
classification were described. While reporting this survey, a comparison of accuracies was made for the
databases that were used as datasets for training and testing as shown in. Different kinds of databases
were described in detail to give a brief outline of how the datasets were made, whether they were
posed or spontaneous, static or dynamic, experimented in a lab or non-lab conditions and of how
diverse the datasets are. A conclusion derived from this survey of databases is that RGB databases lack
the intensity labels, making it less convenient for the experiments to be performed and, hence,
We have successfully completed our work and the followings are the output. Those are
available in current system. The system thus aim at providing android user with a cheaper free and user
friendly accurate emotion detection system, which is really helpful to the users. For changing mood
system our apps is really helpful. The main advantage of our apps is to detect accurate human emotions
and also suggest music and jokes for changing their mood.
The future scope in the system would to design a mechanism that would be automatic playing
music or videos based on the human facial mood. This system would be also helpful in music
therapy treatment and provide the music therapist the help needed to treat the patients suffering
from disorders like mental stress, anxiety, acute depression and trauma.
An algorithm for real-time emotion recognition using virtual markers through an optical flow
algorithm has been developed to create a real-time emotion recognition system with less computational
complexity (execution time, memory) using facial expressions and EEG signals. This algorithm works
effectively in uneven lightning and subject head rotation (up to 25°), different backgrounds, and various
skin tones. The system aims to help physically disabled people (deaf, dumb, and bedridden), in addition
to its benefit for Autism children to recognize the feelings of others. Moreover, it can drive business
outcomes and judge the emotional responses of the audience. Rather than helping to maximize
learning, it has a good benefit in personalized e-learning. The system can recognize six emotions in real
time for facial landmarks and in offline settings for EEG raw data. Users of the system need to wear an
EPOC+ headset and are faced in front of a camera to record the EEG raw data wirelessly and collect the
EEG signals. The cases that were used to collect the data were performed at Kuwait University. It was
difficult to collect data from many subjects due to the student's schedules and timings. Thus, only a few
For future work, the system's precision and accuracy can be improved by collecting more data from
more subjects. Additionally, techniques can be used to extract more features from EEG signals . In
addition to improving the system techniques, putting the subjects in real situations to express the exact
feelings can help to improve the system's accuracy for the EEG.
17 translation and variations in illumination level [6]. The important features are extracted using LBP
algorithm which is described below: