Emotion Recognition Based On Wireless, Physiological and Audiovisual Signals: A Comprehensive Survey

Emotion Recognition Based on Wireless,
Physiological and Audiovisual Signals:

A Comprehensive Survey
Aisha Alabsi(B) , Wei Gong, and Ammar Hawbani
School of Computer Science and Technology, University of Science and Technology of China,
Hefei, China
{weigong,anmande}@ustc.edu.cn
1 Introduction
In this survey, we will give attention to the main division of artificial intelligence named
Affective computing. It is computed that relate to, or impact feeling [1]. Robot emotion
acknowledgment is a domain of study that pattern section of Responsive computation
[2]. Efficiency in emotion recognition paperwork is hard to compare because of the lack
of common databases [3]. Emotions accomplish a fundamental role in human aware-
ness, especially in perception, human interaction, personal intelligence, and making
decisions [4]. Emotion recognition is a nascent domain that has obtained much interest
from both the industry and the research community [5]. We can detect emotion based on
Physiological signals. It’s more suitable than another pattern. They are generated from
the secondary nervous system and primary nervous system. Therefore, they cannot be
counterfeit or hidden [6]. The primary physiological signals are (1) Electrocardiogram
(ECG), (2) Electroencephalograms (EEG), (3) Heart Rate Variability (HRV), (4) photo-
plethysmography (PPG), and Galvanic Skin Response (GSR) [6, 7]. Valence and arousal,
the focus are subjective experiences categorized by Affective states [8]. In addition, a
person incorporates emotions into their valence-based conscious affective experience,
and arousal goals reflect the degree [9]. The focus of the valence stimulus is associated
with pleasurable or unpleasant aspects compared to arousal, a focus that causes the acti-
vation or inactivation of emotion is difficult to compare these studied approaches because
they differed in several ways. Therefore, emotions can be relieved by watching emo-
tional movies [10], video clips [11], and video games [12]. The techniques available to
identify a person’s emotions depend either on the audiovisual signal such as audio clips
and images or on the use of physiological sensors such as an electrocardiogram (EKG)
watch [13, 14] restriction. Audiovisual techniques cannot measure internal feelings and
use the external expression of emotions. For example, a person can be happy even if they
are not smiling [15, 16]. The second approach detects emotions by monitoring phys-
iological signals that change with the emotional state. A person’s heart rate naturally
increases with anger or excitement; there are also more compound changes that emerge
as variability in the period of a beat. This procedure uses body sensors such as EKG
monitors [15, 17]. This study has been used to enhance human–machine interaction in a
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022
P. K. Pattnaik et al. (eds.), Proceedings of 2nd International Conference on Smart
Computing and Cyber Security, Lecture Notes in Networks and Systems 395,
https://doi.org/10.1007/978-981-16-9480-6_13
122 A. Alabsi et al.
variety of application aspects including clinical, industrial, military, and playful aspects
[18]. Several ways of detecting motion have been proposed. It may be separated into
two kinds: First, determining a particular emotion. Use emotional, behavioral traits such
as tone of voice, facial expressions, and body gestures. Second, you identify emotions
through signals. Physiological activities can be recorded through electrical signals than
non-invasive sensors.
These models are EKG, EEG and GSR [19]. Traditional multimedia encompasses
only two human senses, namely auditory and visual. The simultaneous involvement of
more than one pair of human senses enhances human experimentation in viewing mul-
timedia content. The various sensorial media (multimedia) are multimedia satisfaction
that can occupy more than two human senses concurrently [20, 21]. With this in notion,
there is presently a focus on multimedia to create a practical environment for multimedia
engagement. Many types of research talk about emotion recognitions by using different
techniques such as wireless signals, physiological signals, and Audiovisual Input, but
most research focus on physiological signals in EEG. Hence, we will focus on the three
techniques for recognizing emotion. The objective is to discriminate what has already
been done in systems accomplishment and see a glance of what could in the future.
For the framework, we are surveying from 2015 to 2020 for recognition of emotions.
The basic emotions are Happiness, Sadness, Surprise, Fear, Anger, and Disgust. Emo-
tions can be specified and evaluated from different levels of view, but complement each
other: (1) Psychological perception of the participant (self-assessment of the subject), (2)
Feedback of the structure of the participant (physiological signals), which is the goal, but
which can be through the personal body becomes contaminated. Condition and function
(influenced by medication or illness), (3) behavioral signals such as facial expressions,
voice, certain body movements, or even keypress patterns [22], (4) external evaluation
by peers, e.g. B. an adult, realizing the child’s condition [23].
1.1 Emotion Recognition Based on Wireless Signal

Some research shows we can recognize any emotion from gesture, movement, and
Recognition of pose via Device-Free Activity (RDFA). Radio-frequency (RF) signal
variation has been explored actively in current research [24–26]. Recently, however,
specialized wireless materials and software have mainly been used. RF-based free device
discovery is better implemented at the edge of the wireless network. Radio wave diffusion
as noise, path loss, attenuation, or multipath fading. For the edge-based application of
RDFA, it is important to study the performance of the system under “realistic” conditions
(as opposed to “optimal” conditions). Caused by movement near a receiver. The emotion
identification of RDFA systems, which was proposed for the first time in [27], was also
illustrated in [28] through the understanding of the pulse and breathing frequency of
Heart Failure (HF) reflexes for detection of emotional states.
1.2 Emotion Recognition Based on ECG

These models are employed to categorize human emotions using the real data set or using
body sensors such as EKG monitors: for measuring these signals and associating their
actions with joy, anger, etc. This is more linked to the person’s inner feelings than with
Emotion Recognition Based on Wireless, Physiological 123
the interaction between the pulses of the heart. And the autonomic nervous scheme [29].
However, the use of body sensors is difficult to manage and can affect the activity and
emotions of the user, making this method unsuitable for normal practice. EKG has been
demonstrated to be a dependable source of information for emotion recognition systems
[30]. Affective states of users such as happiness, sadness, stress, etc. The utility of the
EKG resides in many effective computer applications. However, ECG has considerable
potential in emotional computing; we often lack classification data for in-depth training
of monitoring models [31]. Perfectly study has suggested feature extraction procedures
using signal processing or medical diagnostic procedures (such as heart rate changes),
as well as statistical techniques for ECG emotion detection [32]. The acquisition and
identification method uses radio frequency (RF) signals. RF signals return from the
person’s body and are modulated by physical emotions. The benefits of RF reflections
can be used to determine a person’s breathing and average heart rate without being
connected to the body.
1.3 Emotion Recognition Based on EEG
Techniques for processing brain signals are currently opening the window to new ways
of looking at emotions and other affective ones. The automatic acknowledgment of
emotions in artificial intelligence unit’s objective at human interaction will also play an
indispensable role [33]. The difficulty lies in deciphering this information and assigning
it to certain emotions [34]. As a result, emotion detection clinicians might be used every
day for a variety of aims, such as monitoring emotions in healthcare settings, games
and entertainment, teaching and learning, and optimizing performance in the field [35]
applications. Detection is high-precision detection, purpose estimation, and constant
neural models and has then been studied as a reliable technique [19, 36].
1.4 Emotion Recognition Based Speech Concepts
The aim is to analyze the establishment and change in the speaker’s emotional state
from the perspective of the speech signal to make the interaction between humans and
computers more intelligent. They used a computing device to assess the speaker’s speech
signal and process of change, get internal emotions and ideological actions, and finally
reach a smarter and more natural human–computer interaction (HCI), which is of great
importance to him developing a newer one HCI system and compliance with artificial
intelligence [37, 38]. In particular useful for upgrading the naturalness of speech due to
human–machine interaction. Various feature extraction methods have been proposed to
extract speech features. These properties like formants, tone, and short-term energy [39]
are useful for recognizing emotions.
1.5 Emotion Recognition Based Facial Expression Concepts
It is to acknowledge human emotions from the flow of the muscles of the face, the
flow of the eyes, lips or eyebrows, and facial structures. Facial expressions employed
to acknowledge emotions are better advisable for your health care system since this
procedure recognizes emotions through a natural user interface—the face Behavioral

signs [22]. However, most real-world emotional nations cannot detect these channels
because it is easy to hide a facial expression or falsify a tone of voice [40]. In addition,
they are not efficient for people who can’t orally express their sense [41].
1.6 Emotion Representations

Emotions are interpreted using various global models [42]. The most common are the
separate model and dimension patterns. The separation pattern identifies the basic, innate,
and universal emotions from which all other emotions may be derived. Some authors
demand that these fundamental emotions be happy. Sadness, anger, surprise, disgust,
and fear [43]. Otherwise, dimensional models can express complex emotions in con-
stant two-dimensional space: valence arousal (VA) or in three dimensions: valence,
arousal, and dominance (VAD) [44]. The VA model has valence and excitement as axes.
Evaluation of positive and negative emotions and ranges from happy to sad. Arousal
measures emotions from calm to arousing. Three-dimensional patterns include an axis
of dominance that may be exploited to assess powerless to amplify sense. For example,
fear and anger have matching explanations for valence arousal at the VA level. Therefore,
three-dimensional models improve the “emotional resolution” through the dominance
dimension. In this case, fear is a feeling of submission, but anger requires power [45].
The domain dimension underlines the distinction between these two emotions (Figs. 1
and 2).
Fig. 1. Illustration of emotional state in value and excitement space

Fig. 2. Illustration of emotional states in the valence-excitation-dominance space
2 Techniques for Recognition of Emotions
Figure 3 demonstrates the construction of the radio signal used for recognizing emotion.
The operations on input data collection (audiovisual and physiology), preprocessing,
feature extraction, feature selection, classification, and achievement assessment can be
highlighted and discussed in the following subsections.
Input acquisition Feature Extraction
Wireless Signals
Preprocessing Classification
Physiological
Input Feature Selection
Audiovisual Input
Fig. 3. The structure of wireless signals for emotion recognition
2.1 Input Acquisition
2.1.1 Wireless Signal

They invented a new emotion recognition technology to maximize the two worlds, that
is, directly measure the interaction of emotions and physiological signals. Based on
the wireless signal design of the EQ-RADIO system, RF signals are used to capture
emotions. HF stimuli from movement and the environment are recorded by the Radio-
frequency Identification (RFID) module, which immediately interacts with the extraction
of the radio channel. Possible data are channel status information (CSI), Bluetooth, or
FM radio. In particular, the RF signal is reflected by the human body and modulated by
human movement. RF reflection can also be exploited to decide a person’s breathing and
average heart rate. Emotion recognition wants to extract the individual heartbeats and
measure small variance in the beat-to-beat intervals with millisecond scale accuracy.
Regrettably, new research that aims for segmenting RF reflections into specific beats
can’t achieve satisfactory accuracy for emotion recognition [46].
2.1.2 Physiological Signal

There are many disposed of datasets that are based on Physiological signals related
to emotion recognition such as DEAP, lAPS, SEED, DREAMER, and RCLS for EEG
signal and another dataset such as MAHNOB, AMIGOS, SWELL, and AuBT for ECG,
GSR, and EMG signal.
2.1.3 Audiovisual Input

Hence, other datasets based on audiovisual input (Speech, videos, and facial) such as
AIBO, SUSAS, RAVDESS, JAFFE, Extended Cohn–Kanade (CK+), and AFEW (Table
1).
2.2 Preprocessing
Emotional intelligence (EQ) radio acts on the RF reflections from the human body. To
capture these reflections, EQ radio uses a radar technique called Frequency Modulated
CW Radar FMCW [60]. RF signal is used to modulate breathing and heartbeat. The
radio isolates the RF reflections from several objects or bodies into cubes established on
their reflection time; it then removes reflections from stable assets that don’t change over
time and magnifies human observations. To acknowledge emotions, EQ radio needs to
extract these intervals from the RF phase signal. The main limitation of beat interval
extraction is that it does not identify beat morphology in reflected RF signals. This is in
contrast to EKG signals wherever the heartbeat morphology has a known expected shape
and simple spike detection algorithms can extract beat-to-beat intervals. Recognizing
the dimorphism of the heartbeat would help us share the signal. Once they had a partition
of the reflected signal, they can use that to restore the person’s heartbeat to dimorphism.
This problem is exacerbated by two additional factors.
Beginning, the reflected signal is noisy; secondary, the chest movement enhances, so
breathing improves by orders of magnitude than the heartbeat replacement. In another
sentence, they run on a small signal-to-interference-to-noise ratio (SINR) scheme that
“interference” results from the displacement of the chest concerning breathing. SINR
describes the upper limit for how an RF action or gesture can be recognized since any
perceptible fluctuation is necessarily greater than the noise [61]. Attenuate the influence
of breathing on the type of preprocessing. The purpose of the preprocessing step is to
attenuate the respiratory signal and improve the SINR of the heartbeat signal. Remember
that due to the inhalation-exhalation process and the pulsatile effect, the heart rate (HR)
level is directly proportional to the complex change. Prepress is the heartbeat area.
The equalizer radio divides the acceleration signal into multiple beats. Remember, the
Table 1. Available datasets
Source Dataset Input type Number Number of Target emotions

channels participants
[47] DEAP EEG 32 channels 32 Valence,
arousal,
dominance,
liking
[48] lAPS EEG 14 channels 5 Valence and
arousal
[49] MAHNOB-HCI ECG, GSR, 32 channels 24 Valence and
EMG, respiration arousal
pattern
[50] SEED EEG 62 channels 15 Positive,
negative,
neutral
[51] AMIGOS ECG, GSR 19 channels 58 The emotional
dimensions of
arousal and
valence
[52] DREAMER EEG 14 channels 23 Rating 1 to 5 to
valence,
arousal, and
dominance
[53] SWELL ECG 13 channels 25 Arousal and
valence
[53] RCLS EEG 64 channels 14 Happy, sad, and
neutral
[54] AuBT (EMG), (ECG), 15 channels 75 Joy, anger,
skin conductivity sadness and
and respiration pleasure
change
[55] AIBO Speech 40 14 Five emotions
command (angry, happy,
sad, bored and
neutral)
[56] SUSAS Speech 140 48 (Angry, happy,
command sad, bored and
neutral)
(continued)
main problem is that they cannot recognize the shape of the heartbeat to initiate this
separation process. Let x = (x1 , x2 , …, Xn ) suppose that the part S of element x = {s1 ,
s2 , …} sequence of length nA has its component (partially) in non-overlapping adjacent
Table 1. (continued)
Source Dataset Input type Number Number of Target emotions

channels participants
[57] RAVDESS Speech 652 clips 24 Six emotions
were selected:
neutral, calm,
happy, sad,
angry, and
fearful
[58] JAFFE Image 2118 213 Seven primary
labeled emotions
images (anger, disgust,
fear, happy,
neutral, sad,
surprise)
[59] CK Images 593 video 230 Even
sequences expression
classes: anger,
contempt,
disgust, fear,
happiness,
sadness, and
surprise
subsequences), where each part comes from | is | the purpose of his algorithm is to
determine the optimal partition S* that minimizes the segment variance, the official can
determine as follows [46]:
S∗ = arg min Var(S) (1)
They preprocessed the synthesized raw data to eliminate noise, extract detailed edges,
and obtain accurate solutions. Using Discrete Waveform Transformation (DWT) for
noise reduction, and for the time and frequency representation of the signal, for the
detailed multi-scale analysis shown in Fig. 4 [62].
Fig. 4. Using discrete wavelet transform (DWT) for de-noising and smoothing signal

Many studies have used various preprocessing techniques in EEG signals. In [48], the
EEG signal is preprocessed using the EEGLAB toolkit of the SCCN laboratory, which
runs on MATLAB. They eliminated EEG data for each subject based on Component
Dependency Analysis (CDA) and guided the rejection of artifacts such as blinks, eye
movements, muscle movements, and defective channels. They showed the original signal
of the subject in Fig. 5a. They were able to test some noise artifacts in the red ellipses
generated during the entire process and implemented their own artifact suppression
method. In Fig. 5b, after the raw data is preprocessed, the clean EEG signal is placed in
the green ellipse.
Other searchers used different methods for smoothing EEG signals like Filtration
(low pass, bandpass, notch, median, drift removal) [63], Normalization (to ±1 or [0,
1] range) and standardization [64], and Winsonization (removing outliers and dubious
or corrupted fragments, interpolation of removed samples) [65]. Time series signals (ie
EEG, GSR, and PPG) are filtered using the Savicki-Golay (SG) filter [66], which is
used to smooth the data without distorting the waveform. The feedback circuit in the
Muse EEG headband cancels the noise current in the EEG signal. By requiring relevant
personnel to avoid unnecessary movement when recording data, they were also kept to a
minimum. Preprocessing for ECG signal is e detecting of peaks of signals. The effective
diagnostic instrument to specify the beat-to-beat interval (RR interval) that analyzes the
Variability of Heart Rate (VHR) [67]. The emotions create a significant modification
in these sections in Fig. 6. The RR interval value is consistent with the time between
the two R peaks calculated by the standard QRS conversion signal. The ECG signal is
converted into the Pan-Tomkins QRS diagnostic algorithm planned in [68].
Fig. 5. EEG signal before and after preprocessing
Fig. 6. The determine the RR interval in the ECG signal
In [54], they used a second-order IIR notch filter, which removes the narrow-band
noise generated by EKG sensors or motion artifacts in users and power lines, and also
implemented a Butterworth low-pass filter to remove High frequency noise at 60 Hz.
Cutoff frequency You have also implemented a filter that eliminates high-frequency
noise with a cutoff frequency of 60 Hz, called a Butterworth low-pass filter.

For preprocessing and images, DWT is executed to every dimension independently. In
the standard of shift-invariance and decimation, Static Wavelet Transform (SWT) is
better than DWT for pattern detection, feature extraction, and modification diagnosis
[69]. In speech signal, the preprocessing is partitioning only speech portion from an
input utterance based on an endpoint detector called zero-crossing rate (ZCR) and frame
energy. For each frame of the speech signal, we estimated F0, log energy, three formant
frequencies (F1, F2, F3), five mel-band energies, and two MFCCs [3].
2.3 Feature Extraction and Selection

EQ-Radio extracts feature from both the heartbeat sequence and the breathing signal.
There is vast information on extracting emotion-dependent features from human heart-
beats, [3] wherever past methodologies use on-body sensors. These characteristics can be
separated into frequency-domain analysis, time-domain analysis, time–frequency anal-
ysis, Detrend Fluctuation Analysis, and Sample Entropy. EQ radio extracted 27 features
from the IBI sequence, as shown in Table 2. These specific functions were selected based
on the results of Ref.[70], [71] to explain these functions in detail. The equalizer radio
also uses breathing characteristics. For breathing, EQ radio first recognizes each breath
and detects the peak after low-pass filtering [46]. Physiological characteristics vary from
subject to subject under similar emotional situations.
Table 2. EQ-radio features
Domain Name
Time Mean, Median, Standard Deviation of NN Intervals (SDNN), pNN50, Root Mean
Square of Successive Differences (RMSSD), SDNNi, meanRate, sdRate, HRVTi,
TINN
Frequency Welch PSD: LF/HF, peakLF, peakHF. Burg PSD: LF/HF, peakLF, peakHF.
Lomb-Scargle PSD: LF/HF, peakLF, peakHF
Poincaré SD1, SD2, SD2/SD1
Nonlinear SampEn1, SampEn2, DFAall , DFA1 , DFA2
They calculated the frequency range and statistical characteristics (mean, standard
deviation, entropy, zero-crossing, and mean derivative) to identify operations in the data.
The main characteristics are the mean and standard deviation. You choose a window size
(without overlap) of 100,000 samples with a sampling rate of 1 MHz or 10 windows per
second to calculate the function [62].

After the EEG signal is noise-free, the system must highlight the important attributes
implemented in the classifier. The attributes can be calculated in the range of (1) fre-
quency, (2) time, (3) time–frequency. Event-related potential analysis (ERP) (PCA),
fractal measurement (FD), Hjort properties and high-order frequency division (HOC)
[72], and independent component analysis (ICA), the so-called time-domain characteris-
tics. There are also some mathematical measures, such as variance, variance, skewness,
power, mean, standard, and entropy. Recent estimates of signal randomness [73]. The
time and frequency of domain attacks vary, and neural action is mentioned in detail. The
more extensive spatial calculation method (Surface Laplace Algorithm (SL)) also takes
into account the spatial information presented when describing the characteristics of the
EEG signal, thereby greatly reducing the influence of volume conduction.
The Hjorth parameter is a statistical technique available in the time and frequency
domains. These specifications evaluate the attributes of EEG signals and can be used
as attributes to classify emotions [48]. Using the first and second derivatives to derive
the Hjorth parameter. In our proposed method, 42 sets of functions are used in total. In
addition, the classifier as well offers a set of functions extracted for each emotion type.
These roles cover three Hjorth parameters and 14 EEG channels. The feature extraction
stage is used to reduce the size of the problem while retaining relevant data. The result
is a vector of features, which symbolizes the initial character or part of it. The single
frequency range in the proposed method is zero.
In [74], the feature extraction stage is used to reduce the complexity dimension while
retaining relevant information. Returns the feature vector that characterizes the original
signal or a part of it. Generally, the feature curve is considered over time or in additional
areas. Features may not be capable to discriminate related classes. The best powerful
approach at this phase is principal component analysis (PCA). They divided the methods
of this step in Table 3 into the time domain, frequency domain, frequency-time domain
or time scale, statistics, and nonlinear measurement.
Table 3. Features extraction methods in EEG
Domain Methods
Time Signal morphology (amplitude, extrema, intervals,
etc.), Rate of specific events, RMS
Frequency Frequency spectrum
Time–frequency or time-scale domain The Short-time Fourier transform (STFT), Wavelet
Transform (WT)
Statistical indices Mean, median, SD, skewness, kurtosis, etc
Nonlinear Measures of chaos, complexity, Entropy
In Ref. [75], they obtained 29 features from each emotion model in the Skin potential
signal. It includes 15, 13, and 1 in time-domain, frequency-domain, and nonlinear feature
in Consecutive. The features of the time-domain include the first quartile (q1), standard
deviation (SD), variable (var), the median value (median), the third quartile (q3), mean
value (mean), and root mean square (RMS) of the initial feeling pattern. They initiated
utilizing fast Fourier transform (FFT) on the feeling samples for extracting the unilateral
spectrum of frequency-domain attributes in the Skin potential signals. In Ref. [76], they
extracted the feature of differential entropy (DE) based on a short-term Fourier transform
with a 4-s Henning window and no overlap. It can be written as follows:

h(X ) = f (X ) log(f (X ))dx, (2)
x
They extracted ED features from the EEG signals in the five frequency scopes of all
channels. They are delta (1–4 Hz), theta (4–8 Hz), alpha (8–14 Hz), beta (14–31 Hz))
And Gamma (31–50 Hz). The 62 EEG channels have an overall of 62 × 5 = 310
dimensions. Entropy can extract attributes, and it shows how much the results of each
trajectory can expect from each other. In the final analysis, a more complex or chaotic
system. In the final analysis, a more complex or chaotic system. Spectral entropy has
been successfully used in the extraction of EEG features. The Higuchi fractal dimension
(HFD) is a nonlinear method of extracting features. She has taken an important position
in the analysis of biological signals. Various clinical situations [4]. In Ref. [51] the
properties generated from the ECG peaks are shown in Table 4.
Table 4. Features extraction methods in ECG
Domain Methods
Time Mean, median, SD, entropy
Frequency Num artifact, HRV
Peak High frequency (HF), total power radio, normalization HF
Peak Low frequency (LF), total power radio
Nonlinear Dimension, correlation, entropy, fractal Dimension

Wavelet transform is the best method for extracting speech features. It assists in pre-
serving the frequency and delaying the signal in an order of rising resolution. Discrete
wavelet transform (DWT) is implemented either via filter bank procedure or lifting plan.
The filter bank procedure is a sequence of filtration in which the signal is sequentially
passed first through low l [m], then high h [m]. It is then reduced by a component of 2
for the computations of the coefficients. For preprocessing and images, DWT is applied
to every dimension individually. Hence, Stationary Wavelet Transform (SWT) is better
than DWT for pattern diagnosis, feature extraction, and change diagnosis. Typically,
in DWT every single level of transforming the input signal is convoluted with low and
high pass filters, that shown in Fig. 7. And also this method can use in facial features
extraction.
Fig. 7. Discrete wavelet transform (DWT)
2.4 Classification
Many algorithms in classification such as support vector machine (SVM), k-nearest
neighbors (KNN), deep neural nets (DNN), Hidden Markov Model (HMM), Signal-to-
noise ratio (SNR), convolutional neural network (CNN), a deep convolutional neural
network (DCNN), and so on (Table 5).
2.5 Conclusion
There are many techniques for recognizing emotion. We classify the three techniques
which includes wireless, Physiological, and Audiovisual. Hence the first technique is
Table 5. Accuracy classification
Refs Signal type Method Accuracy

[46] Wireless signal SVM 87%
[48] Physiological signal (EEG) KNN, SVM 61.1%, 38.9%
[49] Physiological signal (ECG, GSR) SVM 64.23% (Arousal), 68.75%
(Valence)
[51] Physiological signal (EOG, EMG DNN 76% (Arousal), 75%
and, EEG) (Valence)
[77] Physiological signal (ECG) CNN 85.5% in both Arousal and
Valence
[76] Physiological signal (EEG) SVM 90.5% in both Arousal and
Valence
[78] Video SVM 47%
[3] Speech SVM, HMM 42.3%, 40.8%
[79] Speech DCNN 99.4%
[62] Wireless signal SNR 84%
[58] Face image SVM 57%
accuracy with few research papers. We envisioned this to be a significant step in the
emerging field of emotion recognition. Additionally, when our research used the beat
extraction process to define beat-to-beat intervals and used those intervals for emotion
detection, it retrieved the entire beat RF human heart, and the beats show a very rich
morphology. The audiovisual signal is not efficient since people may deliberately hide
or disguise emotions by manipulating their voices and facial expressions. In addition,
reaching audiovisual signals requires the subjects to work together. Hence, it is difficult
to apply them in very medical applications. In addition to using physiological signals
that require body sensors, this is cumbersome and can interfere with the actions and
emotions of the user, making this method unsuitable for normal use.
References
1. Salvendy G (1994) HCI international ‘93: 5th international conference on human-computer
interaction. ACM SIGCHI Bull 26(4):76–77
2. Hosseini S, Naghibi-Sistani M (2011) Emotion recognition method using entropy analysis
of EEG signals. Int J Image Graph Signal Process 3(5):30–36. https://doi.org/10.5815/ijigsp.
2011.05.05
3. Kwon O, Chan K, Hao J, Lee T (2003) Emotion recognition by speech signals. Institute for
Neural Computation, University of California, San Diego, USA
4. Alhalaseh R, Alasasfeh S (2020) Machine-learning-based emotion recognition system using
EEG signals. Computers 9(4):95. https://doi.org/10.3390/computers9040095
5. Cowie R et al (2001) Emotion recognition in human-computer interaction. IEEE Signal
Process Mag 18(1):32–80. https://doi.org/10.1109/79.911197
6. Ben M, Lachiri Z (2017) Emotion classification in arousal valence model using MAHNOB-
HCI database. Int J Adv Comput Sci Appl 8(3). https://doi.org/10.14569/ijacsa.2017.080344
7. Hamidi M (2012) Emotion recognition from Persian speech with neural network. Int J Artific
Intell Appl 3(5):107–112. https://doi.org/10.5121/ijaia.2012.3509
8. Crookall D, Sandole DJD, Sandole-Staroste I (eds) (1987) Conflict management and problem
solving: interpersonal to international applications. Frances Pinter, New York: New York
University Press (25 Floral Str, London WC2E 9DS, UK; Washington Square, New York, NY
10003, USA, London. Simulat Games 20(1):107–108, 1989 (Book Reviews Miscellaneous
Reviews). https://doi.org/10.1177/104687818902000150
9. Barrett L (1998) Discrete emotions or dimensions? The role of valence focus and arousal
focus. Cognit Emot 12(4):579–599. https://doi.org/10.1080/026999398379574
10. Koelstra S et al (2012) DEAP: a database for emotion analysis; using physiological signals.
IEEE Trans Affect Comput 3(1):18–31. https://doi.org/10.1109/t-affc.2011.15
11. Yannakakis G, Isbister K, Paiva A, Karpouzis K (2014) Guest editorial: emotion in games.
IEEE Trans Affect Comput 5(1):1–2. https://doi.org/10.1109/taffc.2014.2313816
12. Kim J, Andre E (2008) Emotion recognition based on physiological changes in music listening.
IEEE Trans Pattern Anal Mach Intell 30(12):2067–2083. https://doi.org/10.1109/tpami.200
8.26
13. Jerritta S, Murugappan M, Nagarajan R, Wan K (2011) Physiological signals based human
emotion recognition: a review. In: 2011 IEEE 7th international colloquium on signal
processing and its applications, pp 410–415. https://doi.org/10.1109/CSPA.2011.5759912
14. Kahou S et al (2015) EmoNets: multimodal deep learning approaches for emotion recognition
in video. J Multimod User Interf 10(2):99–111. https://doi.org/10.1007/s12193-015-0195-2
15. Calvo RA, D’Mello S (2010) Affect detection: an interdisciplinary review of models, methods,
and their applications. IEEE Trans Affect Comput 1(1):18–37
16. Human face processing: from recognition to emotion. Psychophysiology 50:S20–S21 (2013).
https://doi.org/10.1111/psyp.12117
17. Quintana DS, Guastella AJ, Outhred T, Hickie IB, Kemp AH (2012) Heart rate variability is
associated with emotion recognition: direct evidence for a relationship between the autonomic
nervous system and social cognition. Int J Psychophysiol 86(2):168–172
18. Duan R-N, Zhu J-Y, Lu B-L (2013) Differential entropy feature for EEG-based emotion
classification. In: 2013 6th international IEEE/EMBS conference on neural engineering (NER)
19. Zheng W-L, Lu B-L (2015) Investigating critical frequency bands and channels for EEG-
based emotion recognition with deep neural networks. IEEE Trans Autonom Mental Develop
7(3):162–175
20. Ghinea G, Timmerer C, Lin W, Gulliver SR (2014) Mulsemedia. ACM Trans Multimed
Comput Commun Appl 11(1s):1–23
21. Covaci A, Zou L, Tal I, Muntean G-M, Ghinea G (2019) Is Multimedia multisensorial?—a
review of mulsemedia systems. ACM Comput Surv 51(5):1–35
22. Kamdar MR, Wu MJ (2015) Prism: a data-driven platform for monitoring mental health. In:
Biocomputing 2016
23. Feng H, Golshan HM, Mahoor MH (2018) A wavelet-based approach to emotion classification
using EDA signals. Expert Syst Appl 112:77–86
24. Abdelnasser H, Youssef M, Harras KA (2015) WiGest: a ubiquitous WiFi-based gesture
recognition system. In: 2015 IEEE conference on computer communications (INFOCOM),
2015
25. Sigg S, Scholz M, Shi S, Ji Y, Beigl M (2014) RF-sensing of activities from non-cooperative
subjects in device-free recognition systems using ambient and local signals. IEEE Trans
Mobile Comput 13(4):907–920
26. Pu Q, Gupta S, Gollakota S, Patel S (2013) Whole-home gesture recognition using wireless
signals. In: Proceedings of the 19th annual international conference on Mobile computing &
networking—MobiCom ‘13
27. Raja M, Sigg S (2016) Applicability of RF-based methods for emotion recognition: a sur-
vey. In: 2016 IEEE international conference on pervasive computing and communication
workshops (PerCom Workshops)
28. Zhao M, Adib F, Katabi D (2016) Emotion recognition using wireless signals. In: Proceedings
of the 22nd annual international conference on mobile computing and networking
29. Kreibig SD (2010) Autonomic nervous system activity in emotion: a review. Biol Psychol
84(3):394–421
30. Nussinovitch U, Elishkevitz KP, Katz K, Nussinovitch M, Segev S, Volovitz B, Nussinovitch
N (2011) Reliability of ultra-short ECG indices for heart rate variability. Ann Noninvasive
Electrocardiol 16(2):117–122
31. Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning. In: Proceedings of
the 24th international conference on machine learning—ICML ‘07
32. Ohkura M, Hamano M, Watanabe H, Aoto T (2011) Measurement of Wakuwaku feeling of
interactive systems using biological signals. Emotion Eng 327–343
33. Goenaga S, Navarro L, Quintero MCG, Pardo M (2020) Imitating human emotions with a
NAO robot as interviewer playing the role of vocational tutor. Electronics 9(6), 971
34. Salzman CD, Fusi S (2010) Emotion, cognition, and mental state representation in amygdala
and prefrontal cortex. Annu Rev Neurosci 33(1):173–202
35. Torres EP, Torres EA, Hernández-Álvarez M, Yoo SG (2020) EEG-based BCI emotion
recognition: a survey. MDPI, 07-Sep-2020 [Online]. https://www.mdpi.com/1424-8220/20/
18/5083/htm. Accessed 30 Apr 2021
36. Zheng W-L, Zhu J-Y, Lu B-L, Identifying stable patterns over time for emotion recognition
from EEG. IEEE Trans Affect Comput. https://doi.org/10.1109/TAFFC.2017.2712143
37. Luo Q (2014) Speech emotion recognition in E-learning system by using general regression
neural network. In: Future energy, environment and materials
38. Koolagudi SG, Rao KS (2012) Emotion recognition from speech: a review. Int J Speech
Technol 15(2):99–117
39. Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and
methods. Speech Commun 48(9):1162–1181
40. Wioleta S (2013) Using physiological signals for emotion recognition. In: 2013 6th
international conference on human system interactions (HSI)
41. Picard RW, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of
affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191
42. Panoulas KJ, Hadjileontiadis LJ, Panas SM (2020) Brain-computer interface (BCI): types,
processing perspectives and applications. In: Multimedia services in intelligent environments,
pp 299–321
43. Ekman P (1992) Are there basic emotions? Psychol Rev 99(3):550–553
44. Verma GK, Tiwary US (2016) Affect representation and recognition in 3D continuous
valence–arousal–dominance space. Multimed Tools Appl 76(2):2159–2183
45. Bălan O, Moise G, Moldoveanu A, Leordeanu M, Moldoveanu F (2019) Fear level
classification based on emotional dimensions and machine learning techniques. Sensors
19(7):1738
46. Zhao M, Adib F, Katabi D (2016) Emotion recognition using wireless signals. In: The 22nd
annual international conference on mobile computing and networking (Mobicom’16)
47. Hyvärinen A, Oja E (1998) Independent component analysis by general nonlinear Hebbian-
like learning rules. Signal Process 64(3):301–313
48. Mehmood RM, Lee HJ (2015) Emotion classification of EEG brain signal using SVM and
KNN. In: IEEE international conference on multimedia & expo workshops (ICMEW), pp
1–5. https://doi.org/10.1109/ICMEW.2015.7169786
49. Henia WMB, Lachiri Z (2017) Emotion classification in arousal-valence dimension using
discrete affective keywords tagging. In: 2017 international conference on engineering & MIS
(ICEMIS), pp 1–6. https://doi.org/10.1109/ICEMIS.2017.8272991
50. Yadava M, Kumar P, Saini R, Roy PP, Dogra DP (2017) Analysis of EEG signals and its
application to neuromarketing. Multimed Tools Appl 76(18):19087–19111
51. Santamaria-Granados L, Munoz-Organero M, Ramirez-González G, Abdulhay E, Arunkumar
N (2019) Using deep convolutional neural network for emotion detection on a physiological
signals dataset (AMIGOS). IEEE Access 7:57–67. https://doi.org/10.1109/ACCESS.2018.
2883213
52. Katsigiannis S, Ramzan N (2018) DREAMER: a database for emotion recognition through
EEG and ECG signals from wireless low-cost off-the-shelf devices. IEEE J Biomed Health
Inform 22(1):98–107
53. Li Y, Zheng W, Cui Z, Zong Y, Ge S (2018) EEG emotion recognition based on graph
regularized sparse linear regression. Neural Process Lett 49(2):555–571
54. Tivatansakul S, Ohkura M (2016) Emotion recognition using ECG Signals with local pattern
description methods. Int J Affect Eng 15(2):51–61
55. Pandolfi E, Sacripante R, Cardini F (2016) Food-induced emotional resonance improves
emotion recognition. Plos One 11(12)
56. Zhou G, Hansen JHL, Kaiser JF (2001) Nonlinear feature based classification of speech under
stress. IEEE Trans Speech Audio Process 9(3):201–216
57. Bhavan A, Chauhan P, Shah RR (2019) Bagged support vector machines for emotion
recognition from speech. Knowledge-Based Syst 184:104886
58. De Silva LC, Miyasato T, Nakatsu R (1997) Facial emotion recognition using multi-modal
information. In: Proceedings of ICICS, 1997 international conference on information, com-
munications and signal processing. Theme: trends in information systems engineering and
wireless multimedia communications (Cat., 1997), vol 1, pp 397–401. https://doi.org/10.1109/
ICICS.1997.647126
59. Ko B (2018) A brief review of facial emotion recognition based on visual information. Sensors
18(2):401
60. Katabi D (2014) Tracking people and monitoring their vital signs using body radio reflections.
In: Proceedings of the 2014 workshop on physical analytics—WPA ‘14
61. Kieser R, Reynisson P, Mulligan TJ (2005) Definition of signal-to-noise ratio and its critical
role in split-beam measurements. ICES J Mar Sci 62(1):123–130
62. Raja M, Sigg S (2017) RFexpress!—exploiting the wireless network edge for RF-based
emotion sensing. In: 2017 22nd IEEE international conference on emerging technologies and
factory automation (ETFA)
63. Xu T, Yin R, Shu L, Xu X (2019) Emotion recognition using frontal EEG in VR affective
scenes. In: 2019 IEEE MTT-S international microwave biomedical conference (IMBioC)
64. Nie Y, Wu Y, Yang ZY, Sun G, Yang Y, Hong X (2017) Emotional evaluation based on SVM.
In: Proceedings of the 2017 2nd international conference on automation, mechanical control
and computational engineering (AMCCE 2017)
65. He C, Yao Y, Ye X (2016) An emotion recognition system based on physiological signals
obtained by wearable sensors. In: Wearable sensors and robots, pp 15–25
66. Kaur B, Singh D, Roy PP (2016) A Novel framework of EEG-based user identification by
analyzing music-listening behavior. Multimed Tools Appl 76(24):25581–25602
67. Zhao L, Yang L, Shi H, Xia Y, Li F, Liu C (2017) Evaluation of consistency of HRV indices
change among different emotions. In: 2017 Chinese Automation Congress (CAC)
68. Sznajder M, Lukowska M (2018) Python online and offline ECG QRS detector based on the
pan-Tomkins algorithm. Zenodo, Tech Rep
69. Alva MY, Nachamai M, Paulose J (2015) A comprehensive survey on features and methods
for speech emotion detection. In: 2015 IEEE international conference on electrical, computer
and communication technologies (ICECCT)
70. Kim Y, Lee H, Provost EM (2013) Deep learning for robust feature generation in audiovi-
sual emotion recognition. In: IEEE international conference on acoustics, speech and signal
processing IEEE, 2013, pp 3687–3691
71. Kim J, Andre E (2008) Emotion recognition based on physiological changes in music listening.
IEEE Trans Pattern Anal Mach Intell 30(12):2067–2083
72. Petrantonakis PC, Hadjileontiadis LJ (2010) Emotion recognition from brain signals using
hybrid adaptive filtering and higher order crossings analysis. IEEE Trans Affect Comput
1(2):81–97
73. Torres EP, Torres EA, Hernandez-Alvarez M, Yoo SG (2020) Emotion recognition related
to stock trading using machine learning algorithms with feature selection. IEEE Access
8:199719–199732
74. Emotion recognition using wearables: a systematic literature review—work-in-progress.
IEEE Xplore [Online]. https://ieeexplore.ieee.org/document/9156096. Accessed 30 Apr 2021
75. Chen S, Jiang K, Hu H, Kuang H, Yang J, Luo J, Chen X, Li Y (2021) Emotion recognition
based on skin potential signals with a portable wireless device. Sensors 21(3):1018
76. Lan Y-T, Liu W, Lu B-L (2020) Multimodal emotion recognition using deep generalized
canonical correlation analysis with an attention mechanism. In: 2020 International Joint
Conference on Neural Networks (IJCNN)
77. Sarkar P, Etemad A (2020) Self-supervised learning for ECG-based emotion recognition. In:
ICASSP 2020–2020 IEEE international conference on acoustics, speech and signal processing
(ICASSP), pp 3217–3221. https://doi.org/10.1109/ICASSP40776.2020.9053985
78. Kahou SE, Bouthillier X, Lamblin P, Gulcehre C, Michalski V, Konda K, Jean S, Froumenty
P, Dauphin Y, Boulanger-Lewandowski N, Ferrari RC, Mirza M, Warde-Farley D, Courville
A, Vincent P, Memisevic R, Pal C, Bengio Y (2015) EmoNets: Multimodal deep learning
approaches for emotion recognition in video. J Multimod User Interf 10(2), 99–111
79. Chauhan K, Sharma KK, Varma T (2021) Speech emotion recognition using convolution
neural networks. In: 2021 international conference on artificial intelligence and smart systems
(ICAIS)

Emotion Recognition Based On Wireless, Physiological and Audiovisual Signals: A Comprehensive Survey

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Emotion Recognition Based On Wireless, Physiological and Audiovisual Signals: A Comprehensive Survey

Uploaded by

Copyright:

Available Formats

Emotion Recognition Based on Wireless,

Physiological and Audiovisual Signals:

Aisha Alabsi(B) , Wei Gong, and Ammar Hawbani

1.1 Emotion Recognition Based on Wireless Signal

1.2 Emotion Recognition Based on ECG

1.3 Emotion Recognition Based on EEG

1.4 Emotion Recognition Based Speech Concepts

1.5 Emotion Recognition Based Facial Expression Concepts

procedure recognizes emotions through a natural user interface—the face Behavioral

1.6 Emotion Representations

Fig. 1. Illustration of emotional state in value and excitement space

Fig. 2. Illustration of emotional states in the valence-excitation-dominance space

2 Techniques for Recognition of Emotions

Input acquisition Feature Extraction

Fig. 3. The structure of wireless signals for emotion recognition

2.1 Input Acquisition

2.1.1 Wireless Signal

2.1.2 Physiological Signal

2.1.3 Audiovisual Input

Table 1. Available datasets

Source Dataset Input type Number Number of Target emotions

Source Dataset Input type Number Number of Target emotions

2.2.2 Physiological Signal

Fig. 5. EEG signal before and after preprocessing

Fig. 6. The determine the RR interval in the ECG signal

2.2.3 Audiovisual Input

2.3 Feature Extraction and Selection

Table 2. EQ-radio features

2.3.2 Physiological Signal

Table 3. Features extraction methods in EEG

Table 4. Features extraction methods in ECG

2.3.3 Audiovisual Input

Fig. 7. Discrete wavelet transform (DWT)

Table 5. Accuracy classiﬁcation

Refs Signal type Method Accuracy

You might also like