Professional Documents
Culture Documents
CHAPTER 1
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
CHAPTER 1
1.1 INTRODUCTION
Facial expression is the most effective form of non-verbal communication and it provides
intimation about emotional state, mindset and intention. Facial expressions not only can change
the flow of conversation but also provides the listeners a way to communicate a wealth of
information to the speaker without even uttering a single word. When the facial expression does
not match with the spoken words, then the information pass on by the face gets more power in
interpreting the information. Image processing is the field of signal processing where both the
input and output signals are images. One of the most important applications of Image processing
is Facial expression recognition. Our emotion is revealed by the expressions in our face. Facial
Expressions plays an important role in interpersonal communication.
Facial expression is a non-verbal scientific gesture which gets expressed in our face as
per our emotions. Automatic recognition of facial expression plays an important role in artificial
intelligence and robotics and thus it is a need of the generation. Some application related to this
includes Personal identification and Access control, Videophone and Teleconferencing, Forensic
application, Human-Computer Interaction, Automated Surveillance, Cosmetology and so on. The
objective of this project is to develop Automatic Facial Expression Recognition System which
can take human facial images containing some expression as input and recognize and classify it
into seven different expression classes such as:
1. Neutral
2. Angry
3. Disgust
4. Fear
5. Contempt
6. Happy
7. Sadness
8. Surprise
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
1. Anger: involves three main features- teeth revealing, eyebrows down and inner side
tightening, squinting eyes. The function is clear- preparing for attack. The teeth are ready
to bite and threaten enemies, eyes and eyebrows squinting to protect the eyes, but not
closing entirely in order to see the enemy.
2. Disgust: involves wrinkled nose and mouth. Sometimes even involves tongue coming
out. This expression mimics a person that tasted bad food and wants to spit it out, or
smelling foul smell.
3. Fear: involves widened eyes and sometimes open mouth. The function- opening the eyes
so wide is suppose to help increasing the visual field (though studies show that it doesn't
actually do so) and the fast eye movement, which can assist finding threats. Opening the
mouth enables to breath quietly and by that not being revealed by the enemy.
4. Surprise: very similar to the expression of fear. Maybe because a surprising situation can
frighten us for a brief moment, and then it depends whether the surprise is a good or a
bad one. Therefore the function is similar.
5. Sadness: involves a slight pulling down of lip corners, inner side of eyebrows is rising.
Darwin explained this expression by suppressing the will to cry. The control over the
upper lip is greater than the control over the lower lip, and so the lower lip drops. When a
person screams during a cry, the eyes are closed in order to protect them from blood
pressure that accumulates in the face. So, when we have the urge to cry and we want to
stop it, the eyebrows are rising to prevent the eyes from closing.
6. Contempt: involves lip corner to rise only on one side of the face. Sometimes only one
eyebrow rises. This expression might look like half surprise, half happiness.This can
imply the person who receives this look that we are surprised by what he said or did (not
in a good way) and that we are amused by it. This is obviously an offensive expression
that leaves the impression that a person is superior to another person.
7. Happiness: usually involves a smile- both corner of the mouth rising, the eyes are
squinting and wrinkles appear at eyes corners. The initial functional role of the smile,
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
which represents happiness, remains a mystery. Some biologists believe that smile was
initially a sign of fear. Monkeys and apes clenched teeth in order to show predators that
they are harmless. A smile encourages the brain to release endorphins that assist
lessening pain and resemble a feeling of well being.
8. Neutral: Those good feeling that one smile can produce can help dealing with the fear. A
smile can also produce positive feelings for someone who is witness to the smile, and
might even get him to smile too. Newborn babies have been observed to smile
involuntarily, or without any external stimuli while they are sleeping. A baby's smile
helps his parents to connect with him and get attached to him. It makes sense that for
evolutionary reasons, an involuntary smile of a baby helps creating positive feelings for
the parents, so they wouldn't abandon their offspring.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
CHAPTER 2
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
CHAPTER 2
2.1.1 MOTIVATION
Significant debate has risen in past regarding the emotions portrayed in the world-famous
masterpiece of Mona Lisa. British Weekly „New Scientist‟ has stated that she is in fact a blend
of many different emotions, 83%happy, 9% disgusted, 6% fearful, 2% angry.
We have also been motivated observing the benefits of physically handicapped people
like deaf and dumb. But if any normal human being or an automated system can understand their
needs by observing their facial expression then it becomes a lot easier for them to make the
fellow human or automated system understand their needs.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
Human facial expressions can be easily classified into 7 basic emotions: happy, sad,
surprise, fear, anger, disgust, and neutral. Our facial emotions are expressed through activation of
specific sets of facial muscles. These sometimes subtle, yet complex, signals in an expression
often contain an abundant amount of information about our state of mind. Through facial
emotion recognition, we are able to measure the effects that content and services have on the
audience/users through an easy and low-cost procedure. For example, retailers may use these
metrics to evaluate customer interest. Healthcare providers can provide better service by using
additional information about patients' emotional state during treatment. Entertainment producers
can monitor audience engagement in events to consistently create desired content.
Humans are well-trained in reading the emotions of others, in fact, at just 14 months old,
babies can already tell the difference between happy and sad. But can computers do a better job
than us in accessing emotional states? To answer the question, We designed a deep learning
neural network that gives machines the ability to make inferences about our emotional states. In
other words, we give them eyes to see what we can see.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
1. Locating faces in the scene (e.g., in an image; this step is also referred to as face
detection),
2. Extracting facial features from the detected face region (e.g., detecting the shape of facial
components or describing the texture of the skin in a facial area; this step is referred to as
facial feature extraction),
3. Analyzing the motion of facial features and/or the changes in the appearance of facial
features and classifying this information into some facial-expression- interpretative
categories such as facial muscle activations like smile or frown, emotion
(affect)categories like happiness or anger, attitude categories like (dis)liking or
ambivalence, etc.(this step is also referred to as facial expression interpretation).
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
Several Projects have already been done in this fields and our goal will not only be to
develop a Automatic Facial Expression Recognition System but also improving the accuracy of
this system compared to the other available systems.
Each person’s expressions of emotions can be highly idiosyncratic, with particular quirks
and facial cuts. There can be a wide variety of divergent orientations and positions of people’s
heads in the photographs to be classified. For these types of reasons, FER is more difficult than
most other Image Classification tasks. However, well-designed systems can achieve accurate
results when constraints are taken into account during development. For example, higher
accuracy can be achieved when classifying a smaller subset of highly distinguishable
expressions, such as anger, happiness, and fear. Lower accuracy is achieved when classifying
larger subsets, or small subsets with less distinguishable expressions, such as anger and disgust.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
Understanding the human facial expressions and the study of expressions has many
aspects, from computer analysis, emotion recognition, lie detectors, airport security, nonverbal
communication and even the role of expressions in art. Improving the skills of reading
expressions is an important step towards successful relations.
2.3.1 Expressions and Emotions
A facial expression is a gesture executed with the facial muscles, which convey the
emotional state of the subject to observers. An expression sends a message about a person's
internal feeling. In Hebrew, the word for "face" has the same letters as the word represents
"within" or "inside"- That similarity implies about the facial expression most important role-
being a channel of nonverbal communication. Facial expressions are a primary means of
conveying nonverbal information among humans, though many animal species display facial
expressions too. Although human developed a very wide range and powerful of verbal
languages, facial expression role in interactions remains essential, and sometimes even critical.
Expressions and emotions go hand in hand, i.e. special combinations of face muscular actions
reflect a particular emotion. For certain emotions, it is very hard, and maybe even impossible, to
avoid it's fitting facial expression. For example, a person who is trying to ignore his boss's
annoying offensive comment by keeping a neutral expression might nevertheless show a brief
expression of anger. This phenomenon of a brief, involuntary facial expression shown on the
face of humans according to emotions experienced is called 'micro expression'.
Micro expressions express the seven universal emotions: happiness, sadness, anger,
surprise, contempt, fear and disgust. However, Paul Ekman, a Jewish American psychologist
who was a pioneer in the study of emotions and their relation to facial expressions, expanded the
list of classical emotions. Ekman has added to the list of emotions nine more: amusement,
shame, embarrassment, excitement, pride, guilt, relief, satisfaction and pleasure. Micro
expression is lasting only 1/25-1/15 of a second. Nonetheless, capturing it can illuminate one's
real feelings, whether he wants it or not. That is exactly what Paul Ekman did. Back in the 80's,
Ekman was already known as a specialist for study of facial expressions, when approached by a
psychiatrist, asking if Ekman has the ability to detect liars. The psychiatrist wanted to detect if a
patient is lying by threatening to suicide. Ekman watched a tape of a patient over and over again,
looking for a clue until he found a split second of desperation, meaning that the patient's threat
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
wasn't empty. Since then, Ekman have found those critical split seconds in almost every liar's
documentation. The leading character in the TV series "Lie to me" is based on Paul Ekman
himself, the man who dedicated his life to read people's expressions- the "human polygraph".
The research of facial expressions and emotions began many years before Ekman's work.
Charles Darwin published his book, called "The Expression of the Emotions in Man and
Animals" in 1872. This book was dedicated to nonverbal patterns in humans and animals and to
the source of expressions. Darwin's two former books- "The Descent of Man, and Selection in
Relation to Sex" and "On the Origin of Species" represented the idea that man did not came into
existence in his present condition, but in a gradual process- Evolution. This was, of course, a
revolutionary theory since in the middle of the 19th century no one believed that man and animal
"obey to the same rules of nature”. Darwin’s work attempted to find parallels between behaviors
and expressions in animals and humans. Ekman's work supports Darwin's theory about
universality of facial expressions, even across cultures.
The main idea of "The Expression of the Emotions in Man and Animals" is that the
source of nonverbal expressions of man and animals is functional, and not communicative, as we
may have thought. This means that facial expressions creation was not for communication
purposes, but for something else. An important observation was that individuals who were born
blind had similar facial expressions to individuals who were born with the ability to see.
This observation was intended to contradict Sir Charles Bell's idea (a Scottish surgeon, we
anatomist, neurologist and philosophical theologian, who influenced Darwin's work), who
claimed that human facial muscles e created to provide humans the unique option to express
emotions, meaning, for communicational reasons. According to Darwin, there are three "chief
principles", which are three general principles of expression:
1. The first one is called "principle of serviceable habits". He described it as a habit that was
reinforced at the beginning and then inherited by offspring. For example: he noticed a
serviceable habit of raising the eyebrows in order to increase the vision field. He connected it
to a person who is trying to remember something, while performing those actions, as though
he could "see" what he is trying to remember.
2. The second principle is called "antithesis". Darwin suggested that some actions or habits
might not be serviceable themselves, but carried out only because they are opposite in nature
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
to a serviceable habit. I have found this principle very interesting, and I will go into more
detail later on.
3. The third principle is called "The principle of actions due to the constitution of the Nervous
System". This principle is independent from will or a certain extent of habit. For example:
Darwin noticed that animals rarely make noises, but in special circumstances, like fear or
pain they response by making involuntary noises.
Each person’s expressions of emotions can be highly idiosyncratic, with particular quirks and
facial cues. For these types of reasons, FER is more difficult than most other Image
Classification tasks..For example, higher accuracy can be achieved when classifying a smaller
subset of highly distinguishable expressions, such as anger, happiness, and fear. Lower accuracy
is achieved when classifying larger subsets, or small subsets with less distinguishable
expressions, such as anger and disgust.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
and raising eyebrows, like you can see in the followed image: Darwin explained the features of
this expression using the antithesis principle. He discovered that all of those movements
opposing to the movements of a man who is ready to face something. The movements of a
person who is preparing himself for something will look like that: closed hands and fingers (as if
he is preparing for a fight, for example), hands close to the body for protection and the neck is
raised and tight. At a helplessness situation the shrugging of the shoulders releases the neck.
As for the face: eyebrows are low (like in a mode of attack or firmness), upper lip might reveal
teeth.
The functional source of the antithesis can be explained with the investigation of muscles,
and to be precise- the antagonist's muscles. Every muscle has an antagonist muscle that performs
the opposite movement. Spreading fingers is a movement done by some muscles, and closing the
fingers is done by the antagonist muscles. For some expressions we can't always tell just by
looking at it, what is the opposite expression, but if we'll look at the muscles involving in the
process then it becomes very clear
. An interesting explanation to the antithesis functional source relies on inhibition. If a
person or an animal is trying to prevent itself doing a particular action, one way is to use the
antagonistic muscles. In fact, when a stimuli signal is send to a muscle, an inhibitory signal is
send automatically to the antagonist muscle. Facial expressions that can be explained with
antithesis usually relates to aggression and avoiding it.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
expressions better or worse using only some facial parts and also to see if the time for
recognition changes. 20 men and 20 women took part in my experiment.
2.7 My Assumption
It's no secret that women considered to be more intuitive than men. More often, women
considered to be more compassioned and emphatic to their surroundings. Therefore, the gift of
interpreting facial expressions is related usually to women. I believe that my experiment results
will support this assumption.
2.7.1 Common expression analysis components
Like most image classification systems, FER systems typically use image preprocessing
and feature extraction followed by training on selected training architectures. The end result of
training is the generation of a model capable of assigning emotion categories to newly provided
image examples.
The image preprocessing stage can include image transformations such as scaling,
cropping, or filtering images. It is often used to accentuate relevant image information, like
cropping an image to remove a background. It can also be used to augment a dataset, for
example to generate multiple versions from an original image with varying cropping or
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
transformations applied. The feature extraction stage goes further in finding the more descriptive
parts of an image. Often this means finding information which can be most indicative of a
particular class, such as the edges, textures, or colors. The training stage takes place according to
the defined training architecture, which determines the combinations of layers which feed into
each other in the neural network. Architectures must be designed for training with the
composition of the feature extraction and image preprocessing stages in mind.
This is necessary because some architectural components work better with others when
applied separately or together.For example, certain types of feature extraction are not useful in
conjunction with deep learning algorithms. They both find relevant features in images, such as
edges, and therefore it is redundant to use the two together. Applying feature extraction prior to a
deep learning algorithm is not only unnecessary, but can even negatively impact the performance
of the architecture.
Once any feature extraction or image preprocessing stages are complete, the training
algorithm produces a trained prediction model. A number of options exist for training FER
models, each of which has strengths and weaknesses making them more or less suitable for
particular situations. In this article we will compare some of the most common algorithms used
in FER:
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
explored are all deep neural networks which perform better under those
circumstances. Convolutional Neural Networks (CNN) are currently considered the go-to neural
networks for image classification, because they pick up on patterns in small parts of an image,
such as the curve of an eyebrow.CNNs apply kernels, which are matrices smaller than the image,
to chunks of the input image.
The idea of this approach is to capture the transitions between facial patterns over time,
allowing these changes to become additional data points supporting classification. For example,
it is possible to capture the changes in the edges of the lips as an expression goes from neutral to
happy by smiling, rather than just the edges of a smile from an individual image frame.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
CHAPTER 3
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
CHAPTER 3
3.1 METHODOLOGY
We use these approaches: Support Vector Machine, Artificial Neural Network and K
Nearest Neighbors (KNN). In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN
for short) is a non-parametric method used for classification and regression. In both cases, the
input consists of the k closest training examples in the feature space. In Preprocessing we take
the image and preprocessed with using these techniques then after this features extraction. In
features extraction technique some predefined positions as facial features. In next step feature
selection and then classified them.
Images used for facial expression recognition are static images. To take the images of
expressions of people we use a Panasonic camera (Model DMC- LS5) with focal length of 5mm
is used. The format of images is 24 bit color JPEG with resolution of 4320x 3240 pixels. The
distance between the camera and person was four feet and images of six basic expressions of
each person were taken.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
3.1.2 Image Preprocessing
The image preprocessing procedure comes as a very important step in the facial
expression recognition task. The objective of the preprocessing phase is to take images which
have normalized intensity, uniform size and shape, and represent only a face expressing certain
emotion.
The preprocessing procedure should also reduce the effects of illumination and lighting.
Expression representation can be delicate to translation, scaling, and rotation of the head in a
picture. To battle the effect of these pointless changes, the facial image may be geometrically
institutionalized before classification.
In developing accurate facial expression recognition system feature extraction is the most
important stage. Unprocessed facial images hold vast amounts of data and feature extraction is
required to decrease it to smaller sets of data called features. Feature extraction change pixel
information into a more elevated amount representation of color shape, motion, texture, and
spatial configuration of the face or its features. The separated representation is utilized for further
expression categorization. Feature extraction ordinarily decreases the information's
dimensionality space. The reduction procedure ought to keep up essential data having high
segregation force and high security.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
3.1.5 Classification
The last step of Facial Expressions Recognition systems is to recognize facial expression
based on the extracted features. Classification refers to an algorithmic approach for recognizing a
given expression as one of a given number of expressions. We use K- Nearest Neighbor
classifier for classification. The K- Nearest Neighbor algorithm is a non-parametric method used
for classification and regression. The input comprises of K closest training examples in the
feature space. The output is class participation. By a majority vote of its neighbors an object is
classified, with the object being allotted to the class most common among its k nearest
neighbors.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
CHAPTER 4
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
Machine learning is an application of artificial intelligence (AI) that provides systems the ability
to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that can
access data and use it learn for themselves.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
learning to play a game against a human opponent. Other specialized algorithms in machine
learning include topic modeling, where the computer program is given a set of natural
language documents and finds other documents that cover similar topics.
Machine learning algorithms can be used to find the unobservable probability density
function in density estimation problems. Meta learning algorithms learn their own inductive
bias based on previous experience. In developmental robotics, robot learning algorithms generate
their own sequences of learning experiences, also known as a curriculum, to cumulatively
acquire new skills through self-guided exploration and social interaction with humans. These
robots use guidance mechanisms such as active learning, maturation, motor synergies, and
imitation.
4.3 History of Machine Learning
Arthur Samuel, an American pioneer in the field of computer gaming and artificial
intelligence, coined the term "Machine Learning" in 1959 while at IBM As a scientific.
Endeavour, machine learning grew out of the quest for artificial intelligence. Already in the early
days of AI as an academic discipline, some researchers were interested in having machines learn
from data. They attempted to approach the problem with various symbolic methods, as well as
what were then termed "neural networks"; these were mostly perceptions and other models that
were later found to be reinventions of the generalized linear models of
statistics.[9] Probabilistic reasoning was also employed, especially in automated medical
diagnosis.
However, an increasing emphasis on the logical, knowledge-based approach caused a rift
between AI and machine learning. Probabilistic systems were plagued by theoretical and
practical problems of data acquisition and representation. By 1980, expert systems had come to
dominate AI, and statistics was out of favor. Work on symbolic/knowledge-based learning did
continue within AI, leading to inductive logic programming, but the more statistical line of
research was now outside the field of AI proper, in recognition and information retrieval. Neural
networks research had been abandoned by AI and computer science around the same time. This
line, too, was continued outside the AI/CS field, as "connectionism", by researchers from other
disciplines including Hopfield, Rumelhart and Hinton. Their main success came in the mid-
1980s with the reinvention of back propagation.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
Machine learning, reorganized as a separate field, started to flourish in the 1990s. The
field changed its goal from achieving artificial intelligence to tackling solvable problems of a
practical nature. It shifted focus away from the symbolic approaches it had inherited from AI,
and toward methods and models borrowed from statistics and probability theory. It also benefited
from the increasing availability of digitized information, and the ability to distribute it via
the Internet.
4.3.1 Relation to data mining
Machine learning and data mining often employ the same methods and overlap
significantly, but while machine learning focuses on prediction, based on known properties
learned from the training data, data mining focuses on the discovery of
(previously) unknown properties in the data (this is the analysis step of knowledge discovery in
databases). Data mining uses many machine learning methods, but with different goals; on the
other hand, machine learning also employs data mining methods as "unsupervised learning" or as
a preprocessing step to improve learner accuracy. Much of the confusion between these two
research communities (which do often have separate conferences and separate journals, ECML
PKDD being a major exception) comes from the basic assumptions they work with: in machine
learning, performance is usually evaluated with respect to the ability to reproduce
known knowledge, while in knowledge discovery and data mining (KDD) the key task is the
discovery of previously unknown knowledge. Evaluated with respect to known knowledge, an
uninformed (unsupervised) method will easily be outperformed by other supervised methods,
while in a typical KDD task; supervised methods cannot be used due to the unavailability of
training data.
4.3.2 Relation to optimization
Machine learning also has intimate ties to optimization: many learning problems are
formulated as minimization of some loss function on a training set of examples. Loss functions
express the discrepancy between the predictions of the model being trained and the actual
problem instances (for example, in classification, one wants to assign a label to instances, and
models are trained to correctly predict the pre-assigned labels of a set of examples). The
difference between the two fields arises from the goal of generalization: while optimization
algorithms can minimize the loss on a training set, machine learning is concerned with
minimizing the loss on unseen samples.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
4.3.3 Relation to statistics
Machine learning and statistics are closely related fields. According to Michael I. Jordan,
the ideas of machine learning, from methodological principles to theoretical tools, have had a
long pre-history in statistics He also suggested the term data science as a placeholder to call the
overall field. Leo distinguished two statistical modeling paradigms: data model and algorithmic
model. Wherein "algorithmic model" means more or less the machine learning algorithms
like forest. Some statisticians have adopted methods from machine learning, leading to a
combined field that they call statistical learning.
4.4 Theory of Machine Learning
A core objective of a learner is to generalize from its experience. Generalization in this
context is the ability of a learning machine to perform accurately on new, unseen examples/tasks
after having experienced a learning data set. The training examples come from some generally
unknown probability distribution (considered representative of the space of occurrences) and the
learner has to build a general model about this space that enables it to produce sufficiently
accurate predictions in new cases.
The computational analysis of machine learning algorithms and their performance is a
branch of theoretical computer science known as computational learning theory. Because
training sets are finite and the future is uncertain, learning theory usually does not yield
guarantees of the performance of algorithms. Instead, probabilistic bounds on the performance
are quite common. The bias–variance decomposition is one way to quantify generalization error.
For the best performance in the context of generalization, the complexity of the hypothesis
should match the complexity of the function underlying the data. If the hypothesis is less
complex than the function, then the model has under fit the data. If the complexity of the model
is increased in response, then the training error decreases. But if the hypothesis is too complex,
then the model is subject to over fitting and generalization will be poorer. In addition to
performance bounds, learning theorists study the time complexity and feasibility of learning. In
computational learning theory, a computation is considered feasible if it can be done
in polynomial time. There are two kinds of time complexity results. Positive results show that a
certain class of functions can be learned in polynomial time. Negative results show that certain
classes cannot be learned in polynomial time.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
4.5 Types of learning algorithms
The types of machine learning algorithms differ in their approach, the type of data they
input and output, and the type of task or problem that they are intended to solve.
Supervised
Semi-supervised learning
Reinforcement learning
4.5.1 Supervised learning
Supervised learning algorithms build a mathematical model of a set of data that contains
both the inputs and the desired outputs. The data is known as training data, and consists of a set
of training examples. Each training example has one or more inputs and a desired output, also
known as a supervisory signal. In the case of semi-supervised learning algorithms, some of the
training examples are missing the desired output. In the mathematical model, each training
example is represented by an array or vector, and the training data by a matrix.
Through iterative optimization of an objective function, supervised learning algorithms learn a
function that can be used to predict the output associated with new inputs. An optimal function
will allow the algorithm to correctly determine the output for inputs that were not a part of the
training data. An algorithm that improves the accuracy of its outputs or predictions over time is
said to have learned to perform that task.
Supervised learning algorithms include classification and regression. Classification algorithms are used
when the outputs are restricted to a limited set of values, and regression algorithms are used when the
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
outputs may have any numerical value within a range. Similarity learning is an area of
supervised machine learning closely related to regression and classification, but the goal is to
learn from examples using a similarity function that measures how similar or related two objects
are. It has applications in ranking, recommendation systems, visual identity tracking, face
verification, and speaker verification.
4.5.2 Unsupervised learning
Unsupervised learning algorithms take a set of data that contains only inputs, and find structure
in the data, like grouping or clustering of data points. The algorithms therefore learn from test
data that has not been labeled, classified or categorized. Instead of responding to feedback,
unsupervised learning algorithms identify commonalities in the data and react based on the
presence or absence of such commonalities in each new piece of data. A central application of
unsupervised learning is in the field of density estimation in statistics,[21]though unsupervised
learning encompasses other domains involving summarizing and explaining data features.
Cluster analysis is the assignment of a set of observations into subsets (called clusters) so
that observations within the same cluster are similar according to one or more predestinated
criteria, while observations drawn from different clusters are dissimilar. Different clustering
techniques make different assumptions on the structure of the data, often defined by
some similarity metric and evaluated, for example, by internal compactness, or the similarity
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
between members of the same cluster, and separation, the difference between clusters. Other
methods are based on estimated density and graph connectivity.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
5. Naive Bayes
6. kNN
7. K-Means
8. Random Forest
9. Dimensionality Reduction Algorithms
10. Gradient Boosting algorithms
1. GBM
2. XGBoost
3. LightGBM
4. Cat Boost
It is used to estimate real values (cost of houses, number of calls, total sales etc.) based on
continuous variable(s). Here, we establish relationship between independent and dependent
variables by fitting a best line. This best fit line is known as regression line and represented by a
linear equation Y= a *X + b.
The best way to understand linear regression is to relive this experience of childhood. Let us say,
you ask a child in fifth grade to arrange people in his class by increasing order of weight, without
asking them their weights! What do you think the child will do? He / she would likely look
(visually analyze) at the height and build of people and arrange them using a combination of
these visible parameters. This is linear regression in real life! The child has actually figured out
that height and build would be correlated to the weight by a relationship, which looks like the
equation above.
In this equation:
Y – Dependent Variable
a – Slope
X – Independent variable
b – Intercept
These coefficients a and b are derived based on minimizing the sum of squared difference of
distance between data points and regression line. Look at the below example. Here we have
identified the best fit line having linear equation y=0.2811x+13.9. Now using this equation, we
can find the weight, knowing the height of a person.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
that minimize the sum of squared errors (like in ordinary regression).Now, you may ask, why
take a log? For the sake of simplicity, let’s just say that this is one of the best mathematical way
to replicate a step function. I can go in more details, but that will beat the purpose of this article.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
Fig 4.6.3.1 Decision Tree
In the image above, you can see that population is classified into four different groups based on
multiple attributes to identify ‘if they will play or not’. To split the population into different
heterogeneous groups, it uses various techniques like Gini, Information Gain, Chi-square,
entropy.The best way to understand how decision tree works, is to play Jezebel – a classic game
from Microsoft (image below). Essentially, you have a room with moving walls and you need to
create walls such that maximum area gets cleared off without the balls.So, every time you split
the room with a wall, you are trying to create 2 different populations with in the same room.
Decision trees work in very similar fashion by dividing a population in as different groups as
possible.
4.6.4 SVM (Support Vector Machine)
It is a classification method. In this algorithm, we plot each data item as a point in n-dimensional
space (where n is number of features you have) with the value of each feature being the value of
a particular coordinate. For example, if we only had two features like Height and Hair length of
an individual, we’d first plot these two variables in two dimensional space where each point has
two co-ordinates (these co-ordinates are known as Support Vectors)ղ
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
Here,
Example: Let’s understand it using an example. Below I have a training data set of weather
and corresponding target variable ‘Play’. Now, we need to classify whether players will play or
not based on weather condition. Let’s follow the below steps to perform it.
Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and
probability of playing is 0.64.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each
class.The class with the highest posterior probability is the outcome of prediction.
Problem: Players will pay if weather is sunny, is this statement is correct?
We can solve it using above discussed method, so P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P
(Sunny)ղ
Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.Naive Bayes uses
a similar method to predict the probability of different class based on various attributes. This
algorithm is mostly used in text classification and with problems having multiple classes
4.6.6. kNN (k- Nearest Neighbors)
It can be used for both classification and regression problems. However, it is more widely used
in classification problems in the industry. K nearest neighbors is a simple algorithm that stores
all available cases and classifies new cases by a majority vote of its k neighbors. The case being
assigned to the class is most common amongst its K nearest neighbors measured by a distance
function.
These distance functions can be Euclidean, Manhattan, Minkowski and Hamming distance. First
three functions are used for continuous function and fourth one (Hamming) for categorical
variables. If K = 1, then the case is simply assigned to the class of its nearest neighbor. At times,
choosing K turns out to be a challenge while performing kNN modeling.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
.
Fig 4.6.6.1 kNN (k- Nearest Neighbors)
KNN can easily be mapped to our real lives. If you want to learn about a person, of whom you
have no information, you might like to find out about his close friends and the circles he moves
in and gain access to his/her information!
Things to consider before selecting kNN:
KNN is computationally expensive
Variables should be normalized else higher range variables can bias it
Works on pre-processing stage more before going for kNN like outlier, noise removal
4.6.7 K-Means
It is a type of unsupervised algorithm which solves the clustering problem. Its procedure
follows a simple and easy way to classify a given data set through a certain number of clusters
(assume k clusters). Data points inside a cluster are homogeneous and heterogeneous to
peer groups.
Remember figuring out shapes from ink blots? k means is somewhat similar this activity. You
look at the shape and spread to decipher how many different clusters / population are present!
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
In K-means, we have clusters and each cluster has its own centroid. Sum of square of
difference between centroid and the data points within a cluster constitutes within sum of square
value for that cluster. Also, when the sums of square values for all the clusters are added, it
becomes total within sum of square value for the cluster solution.
We know that as the number of cluster increases, this value keeps on decreasing but if
you plot the result you may see that the sum of squared distance decreases sharply up to some
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
value of k, and then much more slowly after that. Here, we can find the optimum number of
cluster.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
with new sources but also they are capturing data in great detail. For example: E-commerce
companies are capturing more details about customer like their demographics, web crawling
history, what they like or dislike, purchase history, feedback and many others to give them
personalized attention more than your nearest grocery shopkeeper.
As a data scientist, the data we are offered also consist of many features, this sounds good for
building good robust model but there is a challenge. How’d you identify highly significant
variable(s) out 1000 or 2000? In such cases, dimensionality reduction algorithm helps us along
with various other algorithms like Decision Tree, Random Forest, PCA, Factor Analysis, Identify
based on correlation matrix, missing value ratio and others
4.6.10. Gradient Boosting Algorithms
4.6.10.1. GBM
GBM is a boosting algorithm used when we deal with plenty of data to make a
prediction with high prediction power. Boosting is actually an ensemble of learning algorithms
which combines the prediction of several base estimators in order to improve robustness over a
single estimator. It combines multiple weak or average predictors to a build strong predictor.
These boosting algorithms always work well in data science competitions like Kaggle, AV
Hackathon, and CrowdAnalytix.
4.6.10.2. XGBoost
Another classic gradient boosting algorithm that’s known to be the decisive choice
between winning and losing in some Kaggle competitions. The XGBoost has an immensely high
predictive power which makes it the best choice for accuracy in events as it possesses both linear
model and the tree learning algorithm, making the algorithm almost 10x faster than existing
gradient booster techniques. The support includes various objective functions, including
regression, classification and ranking.
One of the most interesting things about the XGBoost is that it is also called a regularized
boosting technique. This helps to reduce over fit modeling and has a massive support for a range
of languages such as Scale, Java, R, Python, Julia and C++.Supports distributed and widespread
training on many machines that encompass GCE, AWS, Azure and Yarn clusters. XGBoost can
also be integrated with Spark, Flink and other cloud dataflow systems with a built in cross
validation at each iteration of the boosting process.
4.6.10.3. LightGBM
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is
designed to be distributed and efficient with the following advantages:
Faster training speed and higher efficiency
Lower memory usage
Better accuracy
Parallel and GPU learning supported
Capable of handling large-scale data
The framework is a fast and high-performance gradient boosting one based on decision tree
algorithms, used for ranking, classification and many other machine learning tasks. It was
developed under the Distributed Machine Learning Toolkit Project of Microsoft. Since the
LightGBM is based on decision tree algorithms, it splits the tree leaf wise with the best fit
whereas other boosting algorithms split the tree depth wise or level wise rather than leaf-wise. So
when growing on the same leaf in Light GBM, the leaf-wise algorithm can reduce more loss than
the level-wise algorithm and hence results in much better accuracy which can rarely be achieved
by any of the existing boosting algorithms. Also, it is surprisingly very fast, hence the word
‘Light’.
4.6.10.4. Cat boost
Cat Boost is a recently open-sourced machine learning algorithm from Index. It can
easily integrate with deep learning frameworks like Google’s Tensor Flow and Apple’s Core
ML. The best part about Cat Boost is that it does not require extensive data training like other
ML models, and can work on a variety of data formats; not undermining how robust it can be.
Make sure you handle missing data well before you proceed with the implementation.Catboost
can automatically deal with categorical variables without showing the type conversion error,
which helps you to focus on tuning your model better rather than sorting out trivial errors.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
4.8 Feature learning
Several learning algorithms aim at discovering better representations of the inputs
provided during training. Classic examples include principal components analysis and cluster
analysis. Feature learning algorithms, also called representation learning algorithms, often
attempt to preserve the information in their input but also transform it in a way that makes it
useful, often as a pre-processing step before performing classification or predictions. This
technique allows reconstruction of the inputs coming from the unknown data-generating
distribution, while not being necessarily faithful to configurations that are implausible under that
distribution. This replaces manual feature engineering, and allows a machine to both learn the
features and use them to perform a specific task.
Feature learning can be either supervised or unsupervised. In supervised feature learning,
features are learned using labeled input data. Examples include artificial neural
networks, multilayer perceptions, and supervised dictionary learning. In unsupervised feature
learning, features are learned with unlabeled input data. Examples include dictionary
learning, independent component analysis, auto encoders, matrix factorization and various forms
of clustering.
Manifold learning algorithms attempt to do so under the constraint that the learned
representation is low-dimensional. Sparse coding algorithms attempt to do so under the
constraint that the learned representation is sparse, meaning that the mathematical model has
many zeros. Multilinker subspace learning algorithms aim to learn low-dimensional
representations directly from tensor representations for multidimensional data, without reshaping
them into higher-dimensional vectors. Deep learning algorithms discover multiple levels of
representation, or a hierarchy of features, with higher-level, more abstract features defined in
terms of (or generating) lower-level features. It has been argued that an intelligent machine is
one that learns a representation that disentangles the underlying factors of variation that explain
the observed data.
Feature learning is motivated by the fact that machine learning tasks such as
classification often require input that is mathematically and computationally convenient to
process. However, real-world data such as images, video, and sensory data has not yielded to
attempts to algorithmically define specific features. An alternative is to discover such features or
representations through examination, without relying on explicit algorithms.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
4.9 Artificial neural networks
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
threshold. Typically, artificial neurons are aggregated into layers. Different layers may perform
different kinds of transformations on their inputs. Signals travel from the first layer (the input
layer), to the last layer (the output layer), possibly after traversing the layers multiple times.
The original goal of the ANN approach was to solve problems in the same way that
a human brain would. However, over time, attention moved to performing specific tasks, leading
to deviations from biology. Artificial neural networks have been used on a variety of tasks,
including computer vision, speech recognition, machine translation, social
network filtering, playing board and video games and diagnosis. Deep consists of multiple
hidden layers in an artificial neural network. This approach tries to model the way the human
brain processes light and sound into vision and hearing. Some successful applications of deep
learning are computer vision and speech recognition.[49]
4.10 Support vector machines
Support vector machines (SVMs), also known as support vector networks, are a set of
related supervised learning methods used for classification and regression. Given a set of training
examples, each marked as belonging to one of two categories, an SVM training algorithm builds
a model that predicts whether a new example falls into one category or the other.
An SVM training algorithm is a non-probabilistic, binary, linear classifier, although methods
such as Platt scaling exist to use SVM in a probabilistic classification setting. In addition to
performing linear classification, SVMs can efficiently perform a non-linear classification using
what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature
spaces.
4.11 Bayesian networks
A simple Bayesian network. Rain influences whether the sprinkler is activated, and both rain
and the sprinkler influence whether the grass is wet.A Bayesian network, belief network or
directed acyclic graphical model is a probabilistic graphical model that represents a set
of random variables and their conditional independence with a directed acyclic graph (DAG).
For example, a Bayesian network could represent the probabilistic relationships between diseases
and symptoms. Given symptoms, the network can be used to compute the probabilities of the
presence of various diseases.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
founder of Sun Microsystems, Vend, predicted that 80% of medical doctors jobs would be lost in
the next two decades to automated machine learning medical diagnostic software.
In 2014, it was reported that a machine learning algorithm had been applied in the field of
art history to study fine art paintings, and that it may have revealed previously unrecognized
influences between artists. Although machine learning has been transformative in some fields,
machine-learning programs often fail to deliver expected results. Reasons for this are numerous:
lack of (suitable) data, lack of access to the data, data bias, privacy problems, badly chosen tasks
and algorithms, wrong tools and people, lack of resources, and evaluation problems.
In 2018, a self-driving car from Umber failed to detect a pedestrian, who was killed after
a collision. Attempts to use machine learning in healthcare with the IBM Watson system failed to
deliver even after years of time and billions of investment.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
CHAPTER 5
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
Those three expressions- Fear, Disgust and Anger, were recognized better by women, and
we can see that most of the times the difference in percentage of success between men and
women are significant. An interesting question that should be asked is- why are those specific
emotions were recognized better by women? From an evolutionary point of view, evolutionary
psychologists have suggested that females, due to their role as primary caretakers, are
"programmed" to accurately decode and detect distress in preverbal infants or threatening signals
from other adults to enhance their chances to survive. Fear, anger and disgust are indeed
situations of distress. The most common two special features that helped the participants to
decode the expression and the emotion behind it were the lips and the eyebrows.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
Appendix
## This program first ensures if the face of a person exists in the given image or not then if it
exists, it crops
## the image of the face and saves to the given directory.
## Importing Modules
import cv2
import os
##############################################################################
###
##############################################################################
##
def facecrop(image):
## Crops the face of a person from any image!
## OpenCV XML FILE for Frontal Facial Detection using HAAR CASCADES.
facedata = "haarcascade_frontalface_alt.xml"
cascade = cv2.CascadeClassifier(facedata)
try:
## Some downloaded images are of unsupported type and should be ignored while raising
Exception, so for that
## I'm using the try/except functions.
minisize = (img.shape[1],img.shape[0])
miniframe = cv2.resize(img, minisize)
faces = cascade.detectMultiScale(miniframe)
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
for f in faces:
x, y, w, h = [ v for v in f ]
cv2.rectangle(img, (x,y), (x+w,y+h), (0,255,0), 2)
f_name = image.split('/')
f_name = f_name[-1]
except:
pass
if __name__ == '__main__':
images = os.listdir(directory)
i=0
import argparse
import sys
import time
import numpy as np
import tensorflow as tf
def load_graph(model_file):
graph = tf.Graph()
graph_def = tf.GraphDef()
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
graph_def.ParseFromString(f.read())
with graph.as_default():
tf.import_graph_def(graph_def)
return graph
return result
def load_labels(label_file):
label = []
proto_as_ascii_lines = tf.gfile.GFile(label_file).readlines()
for l in proto_as_ascii_lines:
label.append(l.rstrip())
return label
def main(img):
file_name = img
model_file = "C:/Users/Rajashekar Reddy/madhav/retrained_graph.pb"
label_file = "C:/Users/Rajashekar Reddy/madhav/retrained_labels.txt"
input_height = 224
input_width = 224
input_mean = 128
input_std = 128
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
input_layer = "input"
output_layer = "final_result"
parser = argparse.ArgumentParser()
parser.add_argument("--image", help="image to be processed")
parser.add_argument("--graph", help="graph/model to be executed")
parser.add_argument("--labels", help="name of file containing labels")
parser.add_argument("--input_height", type=int, help="input height")
parser.add_argument("--input_width", type=int, help="input width")
parser.add_argument("--input_mean", type=int, help="input mean")
parser.add_argument("--input_std", type=int, help="input std")
parser.add_argument("--input_layer", help="name of input layer")
parser.add_argument("--output_layer", help="name of output layer")
args = parser.parse_args()
if args.graph:
model_file = args.graph
if args.image:
file_name = args.image
if args.labels:
label_file = args.labels
if args.input_height:
input_height = args.input_height
if args.input_width:
input_width = args.input_width
if args.input_mean:
input_mean = args.input_mean
if args.input_std:
input_std = args.input_std
if args.input_layer:
input_layer = args.input_layer
if args.output_layer:
output_layer = args.output_layer
graph = load_graph(model_file)
t = read_tensor_from_image_file(file_name,
input_height=input_height,
input_width=input_width,
input_mean=input_mean,
input_std=input_std)
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
with tf.Session(graph=graph) as sess:
start = time.time()
results = sess.run(output_operation.outputs[0],
{input_operation.outputs[0]: t})
end=time.time()
results = np.squeeze(results)
top_k = results.argsort()[-5:][::-1]
labels = load_labels(label_file)
for i in top_k:
return labels[i]
import cv2
import label_image
size = 4
while True:
(rval, im) = webcam.read()
im=cv2.flip(im,1,0) #Flip to act as a mirror
FaceFileName = "test.jpg" #Saving the current image from the webcam for testing.
HITSCOE 2015-19
FACIAL EXPRESSION RECOGINITION USING MACHINE LEARINING
cv2.imwrite(FaceFileName, sub_face)
text = label_image.main(FaceFileName)# Getting the Result from the label_image file, i.e.,
Classification Result.
text = text.title()# Title Case looks Stunning.
font = cv2.FONT_HERSHEY_TRIPLEX
cv2.putText(im, text,(x+w,y), font, 1, (0,0,255), 2)
HITSCOE 2015-19