Speech Based Emotion Recognition

SPEECH BASED
EMOTION
RECOGNITION
“SPEECH BASED EMOTION RECOGNITION”
MAJOR PROJECT REVIEW
BY
G.HANNAH SANJANA 17P71A1209

MANASA MITTIPALLY 17P71A1218
VAMSHIDHAR SINGH 17P71A1248
UNDER THE GUIDANCE OF
MRS M.SUPRIYA , ASSOCIATE PROFESSOR

DEPARTMENT OF INFORMATION TECHNOLOGY
SWAMI VIVEKANANDA INSTITUTE OF TECHNOLOGY

Mahbub College Campus, R.P Road, Secunderabad-03
(Affiliated to JNTUH)
2017-2021
CONTENTS :
 Abstract
 Introduction
 Existing System
 Disadvantages of Existing System
 Proposed System
 Advantages of Proposed System
 System Specifications
 UML Diagrams
 Output Screens
 Conclusion
ABSTRACT
 Speech emotion recognition is a trending research topic these days, with its
main motive to improve the human-machine interaction. At present, most of
the work in this area utilizes extraction of discriminatory features for the
purpose of classification of emotions into various categories.
 Most of the present work involves the utterance of words which is used for
lexical analysis for emotion recognition. In our project, a technique is utilized
for classifying emotions into Angry', 'Calm', 'Fearful', 'Happy', and 'Sad'
categories.
ABSTRACT
 In previous works, the maximum cross correlation between audio files is

computed for labeling the speech data into one of the only few (three) emotion
categories. Accordingly, there was one more work developed in MATLAB for
Identification of an emotion for any audio file passed as an argument.
 A variety of classifiers are used through the MATLAB classification learner
toolbox, to classify only few emotion categories. The proposed techniques pave
way for a real-time prototype for speech emotion recognition with open-source
features.
INTRODUCTION
 Speech emotion recognition is a technology that extracts emotion features from computer
speech signals, compares them, and analyzes the feature parameters and the obtained emotion
changes. Recognizing emotions from audio signals requires feature extraction and classifier
training.
 The feature vector is composed of audio signal elements that characterize the specific
characteristics of the speaker (such as pitch, volume, energy), which is essential for training
the classifier model to accurately recognize specific emotions.
EXISTING SYSTEM
 The existing work in this area reveals that most of the present work relies on lexical
analysis for emotion recognition, that have been used for the purpose of classification of
emotions into three categories, i.e., Angry, Happy and Neutral. The maximum cross-
correlation between the discrete time sequences of the audio signals is computed and the
highest degree of correlation between the testing audio file and the training audio file is
used as an integral parameter for identification of a particular emotion type.
 The second technique is used with the feature extraction of discriminatory features with the
Cubic SVM classifier for recognition of Angry, Happy and Neutral emotion segments only.
DISADVANTAGES OF EXISTING
SYSTEM:
 The system is very static in nature and cannot

provide any good performance in real time
systems.
 The system is very slow as to compare the
correlations of the complete dataset with just one
audio file.
 Variable length audio files are not
understandable.
 Long pre-processing steps are required for the
model to understand the audio signal.
 Expensive and not upgradable.
PROPOSED SYSTEM
 In the project, MFCC has been used as the feature for classifying the speech data into
various emotion categories employing artificial neural networks. The usage of the Neural
Networks provides us the advantage of classifying many different types of emotions in a
variable length of audio signal in a real time environment.
 This technique manages to establish a good balance between computational volume and
performance accuracy of the real-time processes.
ADVANTAGES OF PROPOSED
SYSTEM:
 Can be implemented in any hardware

supporting the python language.
 Very fast in processing the audio and easy
to use.
 Variable length audio files are understood
by the system.
SYSTEM SPECIFICATIONS:
 HARDWARE:
Processor : CORE i3
Hard disk : 250GB
RAM : 8 GB
 SOFTWARE:
Operating system : WINDOWS 10
Programming language : PYTHON 3.8.6)

System Architecture:
CNN Model for
Speech Recognition:
UML DIAGRAMS
USE CASE
DIAGRAM:
SEQUENCE
DIAGRAM:
CLASS
DIAGRAM:
ACTIVITY
DIAGRAM:
OUTPUT SCREENS
GUI FOR EMOTION RECOGNISER:
OUTPUT FOR HAPPY EMOTION:
OUTPUT FOR
DIFFERENT
EMOTIONS:
OUTPUT FOR
DIFFERENT
EMOTIONS:
CONCLUSION
 The CNN model was trained and based on this we were able to give the
emotions of a person based on speech.
 The trained model is giving us the F1 score of 91.04.
 ‘Happy’, ‘Sad’, ‘Fearful, ’Calm’, ‘Angry’ are the five different emotions
which are given using this project.
 This speech based emotion recognition can be used in understanding the
opinions/ sentiments they express regarding a product or a political opinion
etc.. by giving the audio as the input to this model.
FUTURE ENHANCEMENTS
 Making the system more accurate.

 Various other emotions can be added to it like disgusted, surprised etc..
 Integrating the system with different platforms.
THANK YOU!

Speech Based Emotion Recognition

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speech Based Emotion Recognition

Uploaded by

Copyright:

Available Formats

SPEECH BASED

G.HANNAH SANJANA 17P71A1209

MRS M.SUPRIYA , ASSOCIATE PROFESSOR

SWAMI VIVEKANANDA INSTITUTE OF TECHNOLOGY

 In previous works, the maximum cross correlation between audio files is

 The system is very static in nature and cannot

 Can be implemented in any hardware

Hard disk : 250GB

Operating system : WINDOWS 10

Programming language : PYTHON 3.8.6)

 Making the system more accurate.

You might also like