You are on page 1of 6

Proceedings of the Second International Conference on Electronics and Sustainable Communication Systems (ICESC-2021)

IEEE Xplore Part Number: CFP21V66-ART; ISBN: 978-1-6654-2867-5

Review on Automated Depression Detection from


audio visual clue using Sentiment Analysis

Uma Yadav Dr. Ashish K. Sharma


Research Scholar, Computer Science & Engineering Research Supervisor
2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC) | 978-1-6654-2867-5/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICESC51422.2021.9532751

G H Raisoni University, Amravati India School of Engineering and Technology


Assistant Professor, Computer Science & Engineering G H Raisoni University
G H Raisoni College of Engineering, Nagpur, India Amravati, India
uma.yadav@raisoni.net, uma.yadav12@gmail.com ashishk.sharma@raisoni.net

Abstract— Depression i s a common mood disorder that has a The current study is a systematic analysis of emerging
ne gative impact on a person's life. It's a global pandemic that's approaches for automatically detecting the depression. The
claiming the lives of countless people. They find it difficult to current survey focuses on the methods that use visual signs
conce ntrate on their careers, engage with others, and have from the video to detect the depression. The key goal is to see
be come introverted. The lack of up-to-date resources has a if video-based depression analysis can help in monitoring and
ne gative impact on the country's economic development. As a diagnosis of the disease or visual cues alone are adequate or
re sult, it's important to develop new ways to identify and treat need to be combined with input from other modalities. From
me ntal conditions, as well as reach out to individuals, so that the Google scholar and Web of Science paper survey it has
pe ople can overcome their daily chall enges and become more been found that depression detection studies re increasing
productive. Depression can be detected using a series of
rapidly. The dramatic rise in related studies over the last few
nonve rbal signals, such as systematic facial expression patterns
and body posture. The study's aim is to look at automated
years, as seen in Fig. 1, demonstrates that automatic depression
de pre ssion detection strategies that could aid doctors in assessment using visual cues [30] is an increasingly growing
diagnosing and treating depression. The field of automatic research area.
de pre ssion detection focused on visual cues is expanding rapidly.
The current comprehensive analysis of existing methods focuses
on machine learning algorithms and image processing, as stated.

Keywords— Depression Detection, Audio Features, Spoken


Words, Video Features, Deep Learning, Machine Learning, Video
Extraction

I. INTRODUCTION
One of the most prevalent psychiatric disorders is
depression, affecting millions of people around the world.
According to a report published by the World Health
Organization (WHO), approximately 200 million people in
India, or one in every five people, may suffer from depression.
It's no surprise that, according to another WHO report, India is
the world's most depressed nation. Lack of knowledge, literacy, Fig 1. Depression Detection Studies per Year
misunderstanding, and pressure, social and cultural values are
all possible explanations why Mental Health is “Not a Thing” II. LITERATURE REVIEW
in India. There are many examples of relatives, friends, and
guardians who refuse to listen. For example, if a child On the AVEC2013, AVEC2014, and DAIC-WOZ
approaches his or her parents about a mental health problem, it datasets, a variety of methods for automated depression
will either be brushed under the rug or will cause more anxiety. diagnosis have been suggested and reviewed. The phase of
feature extraction is critical for estimating the depression scale
Clinicians may benefit from automatic depression screening in audiovisual-based depression identification. Visual features
in the form of a decision support system when diagnosing and have been more popular in recent years, and they have proven
managing patients with depression. Nonverbal interaction, both to be effective in detecting depression. The following sections
in terms of facial gestures and speech, has been well describe visual-based methods, as well as machine learning
established in psychiatric literature to communicate symptoms and deep-learned functionality, for predicting depression
of distress [1]. Depressed people seem to have little variety in severity.
their facial expressions, as well as flat speech and long pauses.

978-1-6654-2867-5/21/$31.00 ©2021 IEEE 1462

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 25,2023 at 08:52:42 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Electronics and Sustainable Communication Systems (ICESC-2021)
IEEE Xplore Part Number: CFP21V66-ART; ISBN: 978-1-6654-2867-5

The author [2] developed an interactive end-to-end shown that the proposed architecture is a viable solution that
platform for video-based depression recognition by combining outperforms state-of-the-art techniques.
CNN with an attention mechanism. The DLGA -CNN
framework was proposed as a new framework. LA-CNN and The author proposed a cataloging model for perceiving
GA-CNN are two divisions of the framework. Just the nearby depression based on texture features of local-binary patterns
patches are covered by LA-CNN. The global trends from the (LBP) [10]. The video footage from the SEMAINE catalog
entire facial region are considered by GA-CNN. Experiments was used by the author. Every frame of the face picture is
were run on the public datasets AVEC2013 and AVEC2014. picked from the video and the Uniformed-LBP features were
extracted. To boost frame sampling in a recording, a video
The author [3] proposed a technique to detect the keyframe extraction technique was used.
depression from video by extracting the frames. The author
Using a combination of Convolutional Neural Network
concentrated only on facial region. The author used VGG-16
CNN model to perform the experiment on AVEC-2014. The (CNN) and Recurrent Neural Network (RNN), the author
VGG16-based model was trained on a huge dataset of images proposed “Combinational Deep Neural Network i.e. CDNN”
[11] for automated depression detection from text data and
derived from videos of the task outputs of the above-
mentioned subjects. A 4-way FC (Fully Connected) layer is facial images (RNN). The proposed model will dramatically
added to the last layers of the VGG16 model to handle the task predict depression with superior performance, according to
of classification of the feature vectors transferred to it. simulation results based on real-field network measurements.
The author suggested a method that collects frontal face
The author [4] proposed a methodology to detect the
depression and anxiety from Videos. The authors used a videos of college students [12], obtains facial features from
motion history picture along with appearance-based feature each frame, and analyses these facial features to diagnose
depression signs in the students. Frontal face pictures of
extraction algorithms (HOG and LBP), and visual geometry
community features extracted using deep learning networks by happy, upset, and disgusted faces were used to train this
transfer learning to focus on dynamic descriptors of facial machine. The appearance of happy, anger, and disgust
characteristics in the video frames is used to diagnose
expressions. The performance was consistent not only in
predicting BDI-II ratings, but also in predicting anxiety levels depression. According to author the students were identified as
dependent on STAI. The latest findings back up previous depressed if the presence of happy features were minimal, and
contempt or disgust features were present.
claims that regression models are well suited to the essence of
embattled clinical results (e.g. depression severity). Emotional state analysis of facial expression [13],
The author's goal was to assess the levels of stress, anxiety, according to the author, is a significant research topic in
and depression on the “Depression Anxiety Stress Scale emotion recognition[14]. A person-specific active appearance
model extracts the main facial features from the recorded
(DASS)” [5] by examining facial features using the Facial-
Action-Coding-System (FACS) using a novel noninvasive facial images. By using a support vector machine to locate
design with three layers that offered high precision and quick facial characteristics, author were able to classify depression
based on movement variations in the pupils, eyebrows, and
convergence: Active-Appearance-Models and a series of
multiclass Support Vector Machine (SVM) [6] are used in the corners of the mouth. The findings suggest that these
first layer for Action-Unit (i.e. AU) classification; second characteristics were useful for automatically classifying
depression patients.
layer contains, a matrix comprising the AUs' intensity stages;
and the third layer contains an optimal feed forward neural
network (FFNN) which analyses the second layer matrix, III. DATASETS A VAILABLE FOR DEPRESSION A SSESSMENT
performs pattern recognition task and predicts the DASS The AVEC 2013 “audio-visual depression corpus” consists
levels. of 150 task-oriented depression proof videos recorded in a
The author proposed a multi-level attention dependent human-computer interface scenario. It includes recordings of
network [7] for multimodal depression prediction. The author people doing a Human-Computer Interaction role while being
combined features from text, audio and video modalities recorded by a camera and microphone. Every recording
whereas learning intra modality and inter modality contains only single human, and the dataset contains 84
significance. Selecting the most persuasive features [8] within subjects, meaning that certain subjects appear in multiple
each modality for decision making, multi-level emphasis recordings. The speakers were measured one to four times,
reinforces general learning. To understand the effect of each with a two-week interval between each test. There are 18 topics
function and modality, several fusion models with various in three recordings, 31 in two, and 34 in one. The complete
configurations were built. recordings range in duration from 50 to 20 mins (avg = 25
mins). The average time spent watching all of the clips is 240
Via distribution learning, the author presented a deep hours. Subjects ranged in age from 18 to 63 years old, with a
learning architecture [9] for reliably predicting depression mean of “31.5 years”, a standard deviation of “12.3 years”, and
levels. It is based on a new expectation loss function that a maximum of 18 to 63 years. The footages were made in a
allows for the estimation of the underlying data distribution variety of discreet locations.
over depression thresholds, with the distribution's predicted
values calibrated to approximate ground-truth levels. Just two of the 14 activities contained in the original
Extensive tests on the AVEC2013 and AVEC2014 datasets recordings are included in the AVEC 2014 subset [15],
enabling a more focused evaluation of distress and depression

978-1-6654-2867-5/21/$31.00 ©2021 IEEE 1463

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 25,2023 at 08:52:42 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Electronics and Sustainable Communication Systems (ICESC-2021)
IEEE Xplore Part Number: CFP21V66-ART; ISBN: 978-1-6654-2867-5

examination. Both activities are recorded separately, scored ten times. 247 people from North America who were
comprising 300 videos (with durations varying from 6 sec to 4 representative of untrained research participants provided
mins 8 sec). The two roles were chosen for their optimum ratings. A total of 72 people participated in the test-retest study.
conformity (means the vast majority of participants completed Each expression has two emotional strength ranges (normal
these activities.). With the addition of 5 pairs of previously and strong), as well as a neutral expression. All three
unpublished footage to replace a limited number of videos modalities are available: audio-only, audio-video, and video-
deemed unsuitable for the contest, the source videos are exactly only (no sound). Actor 18 does not have any music files .
the same as those used for AVEC 2013. The below are the two
tasks that were chosen: The SEMAINE corpus [18] was used in this research to
diagnose distress by facial recognition. The dialogue between
Northwind - Participants read an extract from the German- people and a simulated agent is captured in the database. The
language story "Die Sonne und der Wind" (The sun and the corpus was created with the aim of understanding natural social
North Wind). cues that occur during conversations with a robot or an
Freeform - Applicants answer to one of several questions in artificial person. The database can be accessed via HTTP:
German, such as "What is your favourite dish?" "What was /Semaine-db.eu for testing, interpretation, and case studies. It
your greatest gift, and why?" and "Discuss a painful childhood entails someone engaging with traditional characters that show
memory." emotion. The documentation employs the Sensitive Artificial
Hearer (SAL) technique. Total 95 session files with video with
DAIC_WOZ the Distress Analysis Interview Corpus audio, video without audio and separate audio files are
(DAIC) [16] includes psychiatric interviews intended to available.
facilitate the evaluation of psychological distress disorders such
as depression, anxiety, and PSTD (post-traumatic stress
disorder). DAIC-WOZ is part of this vast corpus. The data is IV. EXISTING M ETHODOLOGY USED FOR DEPRESSION
gathered by a robot agent that communicates with individuals A SSESSMENT
and recognizes verbal and nonverbal signs of mental illness. As per survey it has been found that many researchers only
This package includes 189 files of audio and video recordings used Visual features to evaluate the depression, some has used
as well as detailed questionnaire answers, with the Wizard of both audio and visual features to detect the depression where
Oz interviews making up a portion of the corpus. Samples are as some researchers used audio and spoken words features to
re-created and analyzed for a combination of verbal and non- analyze the depression whereas hardly two or three
verbal characteristics. researchers used visual features, audio features and spoken
The RAVDESS index of emotional expression [17] and words to identify the depression. The same is represented in
song is a validated multimodal database. The database features Fig 2. The depression detection is divided into various stages
24 trained actors who vocalize lexically matched sentences in a such as Preprocessing, Feature extraction, Features fusion and
unbiased North American pronunciation. Speech includes classification. Based on classification we can get different
expressions such as peaceful, joyful, sad, furious, fearful, depression output as per fusion techniques. Due to poor
surprise, and disgust, while music includes emotions such as collected accuracy, pre-processing is needed for any type of
calm, happy, sad, angry, and terror. There are two degrees of data to optimize it for further processing. Video Input is
emotional sensitivity for each speech, as well as a neutral preprocessed into Image, Audio and spoken words. The
expression. Face-and-voice, face-only, and voice-only versions preprocessing was used by various researchers to extract faces
are available under all situations. The emotional authenticity, in image form, extract acoustic signals to form audio database
intensity, and genuineness of each of the 7356 recordings is and extract spoken words from audio. Feature extraction

Fig 2. Flowchart for Depression Detection

978-1-6654-2867-5/21/$31.00 ©2021 IEEE 1464

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 25,2023 at 08:52:42 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Electronics and Sustainable Communication Systems (ICESC-2021)
IEEE Xplore Part Number: CFP21V66-ART; ISBN: 978-1-6654-2867-5

algorithms were applied to visual signals to extract the facial SVM with RBF Kernel. By far the most common approach for
landmarks such as face, eye, head posture, mouth etc. Feature categorical depression evaluation is Support Vector Machines
extraction algorithm applied to audio to extract acoustic (SVM). The SVM gives better results as per the survey. CNN
signals. As well as spoken words were extracted from audio. also reported better results. Additionally, a variability of
Fusion methods were applied to get the better results. A Neural Networks have been used by researchers to assess the
selection of features originating from various modalities (e.g. depression.
text, audio and visual) in addition within the same modality
(e.g. visual from altered body parts) were used in many V. COMPARATIVE A NALYSIS OF EXISTING M ETHOD
techniques. To be able to merge different feature sets, Sentiment analysis on video is a relatively new area of
combined techniques were often used. Fusion occurs more study in which the speaker's mood and sentiment are derived
often right after feature extraction by analyzing the video's frames, audio, and text. Emotional
[19],[20],[21],[22],[23],[24],[25],[26] in which the derived state study of facial expression is a crucial aspect of emotion
feature-vectors of the various modalities were concatenated. recognition. The comparison of various approaches and
Finally, the researcher used a classification algorithm to characteristics used by researchers to diagnose depression is
generate the final result. The types of classifiers used by the seen in Table 1.
various researchers were SVM, NN, GMM, Bi-LSTM, and
TABLE I. The comparison of the various characteristics and approaches used to assess depression

Authors Features Datasets Used Method Used Result Demerits

He et al., Hand-crafted features AVEC2013 DLGA-CNN RMSE– 8.30 to Only considered Local
2021 [2] along with facial feature AVEC2014 8.39 and Global patches of
maps entire facial region
Lin et al., Audio and Text features DAIC-WOZ 1D-CNN, BiLSTM F1 Score - 0.85 Only focused on audio
2020 [27] and interviewed session

Shah et al., Visual signals on facial AVEC2014 VGG16 Accuracy- 68.75% Only used facial features
2020 [3] gestures

Neha et al., Face Circumference Facial expression CNN Accuracy-64% Only used facial features,
2020 [28] dataset from Kaggle No head posture and eye
movement considered
Pampouchidou et Facial landmarks, head Own Datasets with LBP and HOG Accuracy- 70% Dataset involved was
al., 2020 [4] posture, facial action units, 322 recordings of very small patient group
and eye-gaze 65 participants
Rustagi et al., Facial expression and text Fer2013 CNN and RNN Accuracy = 85% Only text and facial
2020 [11] data obtained by user features were considered
and 300 facial images
were taken for
experimentation
Gavrilescu & Facial features Own database with SVM, CNN Accuracy Only used facial features.
Vizireanu, frontal face Depression = Questionary based
2019 [5] recordings 87.2%, Anxiety = method can be introduced
77.9% , stress = to increase the accuracy.
90.2%
Ray et al., Features extracted from E-DAIC database Bi-LSTM RMSE= 4.28 Accuracy is less as
2019 [7] text, audio and video highest weight give to
text as compared to
Video and audio
Melo et al., handcrafted features with AVEC2013 and deep distribution RMSE=8.25 Only considered facial
2019 [9] facial regions features AVEC2014 learning, LBP features

Dadiz & Ruiz, full face analysis SEMAINE SVM with RBF Kernel Accuracy = 98% Some background Noise
2019 [10] Database was captured during the
Segmentation process
which results degradation
of accuracy
Venkataraman, Facial features JAFFE database Motion History Only considered facial
2018 [12] Histogram, SVM features
Wang et al., Pupils, eyebrows, and Own Dataset FCFS, SVM with RBF Accuracy-78.88% Number of sample
2018 [14] corners of the mouth involved in dataset is
small

Morales et al., Facial features, Acoustic MOUD dataset FCFS, CNN Accuracy-76.79% Experiment Performed on
2018 [29] Features small dataset

978-1-6654-2867-5/21/$31.00 ©2021 IEEE 1465

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 25,2023 at 08:52:42 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Electronics and Sustainable Communication Systems (ICESC-2021)
IEEE Xplore Part Number: CFP21V66-ART; ISBN: 978-1-6654-2867-5

VI. DISCUSSION movement, and other facial expression adjustments will


The concluding remarks are ordered by the datasets and increase the accuracy of depression diagnosis. As a result,
algorithms used in each of the studies examined. despite being statistically relevant, body manifestations and
pupil-related characteristics have not been sufficiently used for
automatic evaluation. Finally, although investigating facial
A. Algorithm and different features considered signals in the form of dyadic behavior is a promising area for
Due to the extreme transient variations in depressive research, only one approach has taken it into account, and
symptoms, many of facial signs seen in the experiments context has not been taken into account in any analysis.
studied are dynamic, whereas the presence of static signs is Using facial video footages, audio features and textual
usually interpreted over time. As a result, video records are emotion analysis of the words spoken by the specimen can be
necessary rather than static images. Furthermore, the majority combined to improve the detection accuracy. The
of high-performing approaches combine various features implementation may be carried out to model the target
around or sometimes within a single modality, for example, distribution with greater precision. Finally, the extensive
tactile signals from the face and body. According to the study, analysis of available data reveals that the use of video-based
the majority of researchers only considered facial features tools for assessing and tracking the course of depression has a
when evaluating depression, while some considered both lot of promise. It was also discovered that sensory cues must
facial and audio features. Furthermore, it seems that be combined with input from other modalities in order to
judgement fusion is the most prominent. Only a few recent produce clinically beneficial outcomes.
publications have used deep learning-based methods, while
several researchers have used machine learning. Deep learning REFERENCES
seems to be successful and warrants further investigation. It's [1] M. Cox, “ Book Review: Nonverbal Communication in Depression,” J.
important to note that multimodal methods are once again R. Soc. Med., vol. 83, no. 6, 1990, doi: 10.1177/014107689008300644.
proving to be the most efficient. [2] L. He, J. C. W. Chan, and Z. Wang, “ Automatic depression recognition
using CNN with attention mechanism from videos,” Neurocomputing,
Efforts to establish strategies capable of distinguishing vol. 422, pp. 165–175, 2021, doi: 10.1016/j.neucom.2020.10.015.
mood disorder types, as well as depression from other [3] A. Shah, S. Mota, and A. Panchal, “ Depression Detection Using Visual
psychological illnesses such as anxiety disorders, have been Cues,” vol. IX, no. V, pp. 1–8, 2020.
minimal. It's worth noting that the evidence available for long- [4] A. Pampouchidou et al., “ Automated facial video-based recognition of
term depression diagnosis shows a poor prediction accuracy, depression and anxiety symptom severity: cross-corpus validation,”
throwing doubt on the value of facial signs as prodromal Mach. Vis. Appl., 2020, doi: 10.1007/s00138-020-01080-7.
markers of depression. Combining the different characteristics [5] M. Gavrilescu and N. Vizireanu, Predicting depression, anxiety, and
necessitates the use of the most appropriate method for stress levels from videos using the facial action coding system, vol. 19,
no. 17. 2019.
assessing the depression. To be completely validated and
[6] S. V. Khedikar and U. Yadav, “ Detection of disease from radiology,” in
recognized as an evaluation tool, a method must be checked Proceedings of 2017 International Conference on Innovations in
on larger samples with a greater range of demographic and Information, Embedded and Communication Systems, ICIIECS 2017,
clinical characteristics. 2018, vol. 2018-Janua, pp. 1–4, doi: 10.1109/ICIIECS.2017.8276174.
[7] A. Ray, S. Kumar, R. Reddy, P. Mukherjee, and R. Garg, “ Multi-level
B. Data Related Issues attention network using text, audio and video for depression prediction,”
In terms of similar algorithms, despite the fact that a arXiv, pp. 81–88, 2019.
variety of techniques have been published in the literature, [8] U. D. Yadav and P. S. Mohod, “ Adding persuasive features in graphical
automated depression evaluation is still a long way from being password to increase the capacity of KBAM,” 2013, doi: 10.1109/ICE-
well developed, and existing strategies have a lot of space for CCN.2013.6528553.
progress. Because of the nature of deep learning, and its [9] W. C. de Melo, E. Granger and A. Hadid, "Depression Detection Based
remarkably high performance in many similar fields, an effect on Deep Distribution Learning," 2019 IEEE International Conference on
Image Processing (ICIP), 2019, pp. 4544-4548, doi:
in automated depression assessment is also anticipated in the 10.1109/ICIP.2019.8803467.
foreseeable future if sufficient datasets are given. Finally, [10] B. G. Dadiz and C. R. Ruiz, “ Detecting depression in videos using
though recorded detection recognition rate can be quite high, uniformed local binary pattern on facial features,” 2019, doi:
demonstrating the field's therapeutic potential, sample datasets 10.1007/978-981-13-2622-6_40.
are often too limited to allow these findings to be generalized. [11] A. Rustagi, C. Manchanda, N. Sharma, and I. Kaushik, “ Depression
Continuous methods tend to be more consistent with clinical Anatomy Using Combinational Deep Neural Network,” International
experience. To improve the comparability and reliability of Conference on Innovative Computing and Communications, pp. 19–33,
performance, it is essential to facilitate the sharing of relevant 2021.
data and to streamline data collection procedures across many [12] D. Venkataraman, “ Extraction of Facial Features for Depression
Detection among Students,” Int. J. Pure Appl. Math., vol. 118, no. 7, pp.
domains. 455–463, 2018.
[13] Y. An, Z. Qu, N. Xu, and Z. Nima, “ Automatic depression estimation
VII. CONCLUSION A ND FUTURE W ORK using facial appearance,” J. Image Graph., vol. 25, no. 11, 2020, doi:
The review on depression detection method proposed by 10.11834/jig.200322.
various researcher is done and various flaws has been found [14] Q. Wang, H. Yang, and Y. Yu, “ Facial expression video analysis for
such that there is no study undertaken on body position depression detection in Chinese patients,” J. Vis. Commun. Image
Represent., vol. 57, no. November, pp. 228–233, 2018, doi:
activity. As a result, combining a combination of 10.1016/j.jvcir.2018.11.003.
characteristics such as speech, posture, body posture [15] M. Valstar et al., “ AVEC 2014,” in Proceedings of the 4th International

978-1-6654-2867-5/21/$31.00 ©2021 IEEE 1466

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 25,2023 at 08:52:42 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Electronics and Sustainable Communication Systems (ICESC-2021)
IEEE Xplore Part Number: CFP21V66-ART; ISBN: 978-1-6654-2867-5

Workshop on Audio/Visual Emotion Challenge - AVEC ’14, 2014, pp. 3–


10, doi: 10.1145/2661806.2661807.
[16] J. Gratch et al., “ The distress analysis interview corpus of human and
computer interviews,” 2014.
[17] S. R. Livingstone and F. A. Russo, The Ryerson Audio-Visual Database
of Emotional Speech and Song (RAVDESS). 2018.
[18] G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schröder, “ The
SEMAINE database: Annotated multimodal records of emotionally
colored conversations between a person and a limited agent,” IEEE
Trans. Affect. Comput., vol. 3, no. 1, 2012, doi: 10.1109/T -
AFFC.2011.20.
[19] S. Ghosh, M. Chatterjee, and L. P. Morency, “ A multimodal context-
based approach for distress assessment,” 2014, doi:
10.1145/2663204.2663274.
[20] J. Joshi, “ An automated framework for depression analysis,” 2013, doi:
10.1109/ACII.2013.110.
[21] J. Joshi et al., “ Multimodal assistive technologies for depression
diagnosis and monitoring,” J. Multimodal User Interfaces, vol. 7, no. 3,
2013, doi: 10.1007/s12193-013-0123-2.
[22] A. Jan, H. Meng, Y. F. A. Gaus, F. Zhang, and S. Turabzadeh,
“ Automatic depression scale prediction using facial expression dynamics
and regression,” 2014, doi: 10.1145/2661806.2661812.
[23] M. Sidorov and W. Minker, “ Emotion recognition and depression
diagnosis by acoustic and visual features: A multimodal approach,”
2014, doi: 10.1145/2661806.2661816.
[24] V. Jain, J. L. Crowley, A. K. Dey, and A. Lux, “ Depression estimation
using audiovisual features and fisher vector encoding,” 2014, doi:
10.1145/2661806.2661817.
[25] H. Kaya, F. Çilli, and A. A. Salah, “ Ensemble CCA for continuous
emotion prediction,” 2014, doi: 10.1145/2661806.2661814.
[26] J. Q. Liu et al., “ Dynamic facial features in positive-emotional speech
for identification of depressive tendencies,” Smart Innov. Syst. Technol.,
vol. 192, no. September, pp. 127–134, 2020, doi: 10.1007/978-981-15-
5852-8_12.
[27] L. Lin, X. Chen, Y. Shen, and L. Zhang, “ Towards automatic depression
detection: A bilstm/1d cnn-based model,” Appl. Sci., vol. 10, no. 23, pp.
1–20, 2020, doi: 10.3390/app10238701.
[28] S. Neha, P. H. C. Shekar, K. S. Kumar, and A. Vg, “ Emotion
Recognition and Depression Detection using Deep Learning,” pp. 3031–
3036, 2020.
[29] Morales, Michelle Renee, "Multimodal Depression Detection: An
Investigation of Features and Fusion Techniques for Automated
Systems" (2018). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/2560.
[30] A. Pampouchidou et al., "Automatic Assessment of Depression Based on
Visual Cues: A Systematic Review," in IEEE Transactions on Affective
Computing, vol. 10, no. 4, pp. 445-470, 1 Oct.-Dec. 2019, doi:
10.1109/TAFFC.2017.2724035.

978-1-6654-2867-5/21/$31.00 ©2021 IEEE 1467

Authorized licensed use limited to: MIT-World Peace University. Downloaded on February 25,2023 at 08:52:42 UTC from IEEE Xplore. Restrictions apply.

You might also like