Professional Documents
Culture Documents
Emotion Detection Through Facial Feature Recognition: James Pao
Emotion Detection Through Facial Feature Recognition: James Pao
Recognition
James Pao
jpao@stanford.edu
Abstract—Humans share a universal and fundamental set of expressions that reflect the experiencing of fundamental
emotions which are exhibited through consistent facial emotions. These fundamental emotions are anger, contempt,
expressions. An algorithm that performs detection, extraction, disgust, fear, happiness, sadness, and surprise [1][2]. Unless a
and evaluation of these facial expressions will allow for automatic person actively suppresses their expressions, examining a
recognition of human emotion in images and videos. Presented
person’s face can be one method of effectively discerning their
here is a hybrid feature extraction and facial expression
recognition method that utilizes Viola-Jones cascade object genuine mood and reactions.
detectors and Harris corner key-points to extract faces and facial The universality of these expressions means that
features from images and uses principal component analysis, facial emotion recognition is a task that can also be
linear discriminant analysis, histogram-of-oriented-gradients accomplished by computers. Furthermore, like many other
(HOG) feature extraction, and support vector machines (SVM) to important tasks, computers can provide advantages over
train a multi-class predictor for classifying the seven humans in analysis and problem-solving. Computers that can
fundamental human facial expressions. The hybrid approach recognize facial expressions can find application where
allows for quick initial classification via projection of a testing efficiency and automation can be useful, including in
image onto a calculated eigenvector, of a basis that has been
entertainment, social media, content analysis, criminal justice,
specifically calculated to emphasize the separation of a specific
emotion from others. This initial step works well for five of the and healthcare. For example, content providers can determine
seven emotions which are easier to distinguish. If further the reactions of a consumer and adjust their future offerings
prediction is needed, then the computationally slower HOG accordingly.
feature extraction is performed and a class prediction is made It is important for a detection approach, whether
with a trained SVM. Reasonable accuracy is achieved with the performed by a human or a computer, to have a taxonomic
predictor, dependent on the testing set and test emotions. reference for identifying the seven target emotions. A popular
Accuracy is 81% with contempt, a very difficult-to-distinguish facial coding system, used both by noteworthy psychologists
emotion, included as a target emotion and the run-time of the and computer scientists such as Ekman [1] and the Cohn-
hybrid approach is 20% faster than using the HOG approach
exclusively.
Kanade [3] group, respectively, is the Facial Action Coding
System (FACS). The system uses Action Units that describe
movements of certain facial muscles and muscle groups to
I. INTRODUCTION AND MOTIVATION classify emotions. Action Units detail facial movement
Interpersonal interaction is oftentimes intricate and specifics such as the inner or the outer brow raising, or nostrils
nuanced, and its success is often predicated upon a variety of dilating, or the lips pulling or puckering, as well as optional
factors. These factors range widely and can include the intensity information for those movements. As FACS
context, mood, and timing of the interaction, as well as the indicates discrete and discernible facial movements and
expectations of the participants. For one to be a successful manipulations in accordance to the emotions of interest, digital
participant, one must perceive a counterpart’s disposition as image processing and analysis of visual facial features can
the interaction progresses and adjust accordingly. Fortunately allow for successful facial expression predictors to be trained
for humans this ability is largely innate, with varying levels of .
proficiency. Humans can quickly and even subconsciously II. RELATED WORK
assess a multitude of indicators such as word choices, voice
inflections, and body language to discern the sentiments of As this topic is of interest in many fields spanning both
others. This analytical ability likely stems from the fact that social sciences and engineering, there have been many
humans share a universal set of fundamental emotions. approaches in using computers to detect, extract, and recognize
human facial features and expressions. For example, Zhang [4]
Significantly, these emotions are exhibited through
details using both geometric positions of facial fiducial points
facial expressions that are consistently correspondent. This
as well as Gabor wavelet coefficients at the same points to
means that regardless of language and cultural barriers, there perform recognition based on a two-layer perceptron.
will always be a set of fundamental facial expressions that Significantly, Zhang shows that facial expression detection is
people assess and communicate with. After extensive research, achievable with low resolution due to the low-frequency nature
it is now generally agreed that humans share seven facial of expression information. Zhang also shows that most of the
useful expression information is encoded within the inner facial likely that the nose is more illuminated than sides of the face
features. This allows facial expression recognition to be directly adjacent, or brighter than the upper lip and nose bridge
successfully performed with relatively low computational area. Then if an appropriate Haar-like feature, such as those
requirements. shown in Figure 1, is used and the difference in pixel sum for
The feature extraction task, and the subsequent the nose and the adjacent regions surpasses the threshold, a
characterization, can and has been performed with a multitude nose is identified. It is to be noted that Haar-like features are
of methods. The general approach of using of Gabor transforms very simple and are therefore weak classifiers, requiring
coupled with neural networks, similar to Zhang’s approach is a multiple passes.
popular approach. Other extraction methods such as local
binary patterns by Shan [6], histogram of oriented gradients by
Carcagni [7], and facial landmarks with Active Appearance
Modeling by Lucey [3] have been used. Classification is often
performed using learning models such as support vector
machines.
III. METHODOLOGY
The detection and recognition implementation proposed
here is a supervised learning model that will use the one-
versus-all (OVA) approach to train and predict the seven basic
emotions (anger, contempt, disgust, fear, happiness, sadness,
and surprise).
Fig. 1 Sample Haar-like features for detecting face features.