1 1.1 Problem Statement: Chapter One

CHAPTER ONE
1 INTRODUCTION
1.1 PROBLEM STATEMENT
A developed system with a knowledge base of emotions attached to several sounds and
recommends appropriate sounds suitable for the consumption of users that are high blood
pressure victims. Listening to audio/sounds that consists of fear, anger, sadness and anxiety can
trigger attack on users that are high blood pressure victims. Even with medical options of
resolving HBP cases this research assists in containing it among audio listeners.
1.2 AIM AND OBJECTIVES
The aim of this study is to develop an intelligent machine learning model that can
detect and classify emotions from sounds for management of high blood pressure patients
The objective of the study is
 To implement a machine learning framework that classifies emotions in audio-based
datasets
 To build machine learning model that recommends to high blood pressure patients audio
inputs appropriate for their consumption.
 To implement the model
1.3 SCOPE
CHAPTER TWO
2 LITERATURE REVIEW
HISTORY OF AUDIO BASED EMOTION RECOGNITION SYSTEM
AUDIO DATA is used in a variety of modern computer and multimedia programs. Audio and
multimedia apps deal with a variety of audio recordings. The capacity to classify and retrieve
audio files based on their sound qualities or content is critical to the success of their deployment.
The rapid growth of audio data necessitates the development of a computerized approach that
enables for efficient and automatic content-based audio database classification and
retrieval..Commercial audio retrieval products, such as (http://www.musclefish.com) and
(http://www.comparisonics.com), are emerging as a result of these factors. While speech
recognition, a closely related field, has a long history, content-based classification and retrieval
of audio sounds is a relatively young field.
Wold et al. have recently published a significant paper. [23], as represented by their "Muscle
Fish" system. The content-based capabilities of this study sets it apart from past audio retrieval
work [6]–[8]. To represent a sound in the Muscle Fish system, several perceptual properties like
as loudness, brightness, pitch, and timbre are used. The query sound is classified into one of the
sound classes in the database using a normalized Euclidean (Mahalanobis) distance and the
closest neighbor (NN) criterion. Similar features plus subband energy ratios are used in another
study by Liu et al. [11], and the separability of different classes is evaluated in terms of intra- and
interclass scatters to identify highly correlated features, as well as a classification using a
previous audio retrieval work [6]–[8] in its content-based capability.
RELATED WORK
Emotion recognition of human speech has become increasingly important in recent years in order
to improve the naturalness and efficiency of human-machine interactions. Because of the
difficulty in separating performed and natural emotions, recognizing human emotions is a
difficult undertaking in and of itself. A number of research have been carried out in order to
extract the spectral and prosodic elements that would allow for accurate emotion assessment. The
emotion classification utilizing human speaking utterance based on calculated bytes was
explained by Nwe, T. L., et al [13]. Using computed pitch from human voice, Chiu Ying Lay, et
al [6] explained how to classify gender. Chang-Hyun Park, et al. [4] have discussed the
extraction of acoustic characteristics from speech in order to characterize emotions. Nobuo Sato
et al. [11]discussed the MFCC technique in detail. Their main goal was to apply MFCC to
human speech and accurately classify emotions with a rate of above 67 percent. In an attempt to
improve accuracy, YixiongPan et al. [15] applied Support Vector Machines (SVM) to the
problem of emotion classification. Keshi Dai et al. [8] explained how to recognize emotions
using Support vector machines in a neural network, with an accuracy of about 60%. When it
comes to speech emotion recognition, Aastha Joshi [1] discusses the Hidden Markov Model and
Support Vector Machine features. The algorithms that allow a robot to express its emotions by
altering its speech intonation were presented by Sony CSL Paris [12 . BjörnSchuller, et. al [3]
discussed about the approaches to recognize the emotional user state by analyzing spoken
utterances on both, the semantic and the signal level. Mohammed E. Hoque, et. al [10] presented
about robust recognition of selected emotions from salient spoken words.
Turgut zseven [8] developed a novel feature selection method for voice emotion recognition.
They suggested an alternative approach of statistically picking features from a speech that differs
from the current one in that the number of features selected is reduced but the accuracy is
improved significantly. The study suggested that speech emotion analysis necessitates a number
of qualities, some of which aren't always useful or important for the application. Furthermore,
different emotions can have an impact on different features, lowering the system's accuracy.
Anjali Bhavan, Pankaj Chauhan, Hitkul, and Rajiv Ratn Shah published a paper in 2019 [10]
titled "Bagged support vector machines for emotion recognition from voice." The research is
conducted using three datasets, with features retrieved from these databases being reduced and
processed even further.
They employed a bagged ensemble as the foundation for Ensemble learning, which
outperformed single estimators, and a support vector machine with a gaussian kernel as the main
approach. Ensemble learning creates a new and more efficient model by mixing beneficial
learning approaches from multiple models.
SUMMARY OF RELATED WORKS
Emotions can be classified as Natural and Artificial emotions and further can be divided into
emotion set i.e. anger, sadness, neutral, happy, joy, fear. Different machine learning techniques
have been applied to create recognition agents including k-nearest neighbour,support vector
machine , radial basis function and back propagation of neural networks. A Support Vector
Machine is a sophisticated pattern identification tool that employs a discriminative technique

(SVM). For data classification, SVMs use linear and nonlinear separating hyper-planes. It can be
used for more than only speech recognition; it can also be used for image processing and
bioinformatics. According to this review, support vector machine classification has a high degree
of accuracy. SVM can be used in conjunction with other methods in speech recognition, such as
the Hidden Markov Model (HMM). Much work have been done on classifying emotions in audio
, voice or speech but it is an entirely new trend to censor negative words and sounds that triggers
negative emotions that may cause harm to high blood pressure patients.
CHAPTER 3
Description of the existing system
The existing system just classifies emotions generated from audio, voice and speech
WORK DONE
a. Speech signal acquisition: The speech signals were captured from 300 people including
both male and female in the age group of 20- 30 years with the help of microphones. All
the voice samples for emotion detection were recorded under circumstances that means
samples may contains the noise like of fan or any other common noise. For emotion
detection people were told to speak in four different emotions i.e., 'happy', 'normal', 'sad'
and 'angry'. The sampling has been done while recording the voice sample as well.
Sampling is described as one of the formations of discrete signal from the continuous
signal. The speech signals captured were sampled at 44 kHz. Praat is used for recording
for audio signals.
Fig 2.1 Recording the Audio Signals

b. Pre-Processing: Pre-Processing is done to remove the noise present in the collected
samples. This includes pre-emphasis and windowing. Pre-emphasis process is used to
remove the noise from the captured signal and also spectral subtraction method has been
used to remove the noise and further hamming window is used for windowing.
c. Feature extraction: Speaking Rate, pitch and energy are extracted for emotion detection.
Features were extracted from the voice signals using praat tool. Pitch is one of the most
essential components of emotion recognition through audio signal. It defines the rate of
vibration of speaker's vocal cord. Although different sub features like fundamental
frequency, pitch, energy and speaking rate are used.
d. Classification: The feature vector is constructed using the above mentioned features
which were fed to ML algorithms for recognition. AdaBoost with C4.5 is used for
emotion detection. Classification is done by using the weka which is developed atClassi
New Zealand based University of Waikato. Weka is an open simulator and has a java-
based implementation of the ML algorithms and researchers use it extensively
Speech signal acquisition
Pre - processing
Energy, Pitch Speaking rate
Ada Boost with C4.5

(Training and Test case)
Result
Fig 2.2: Block diagram for existing system

1 1.1 Problem Statement: Chapter One

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 1.1 Problem Statement: Chapter One

Uploaded by

Copyright:

Available Formats

CHAPTER ONE

1.1 PROBLEM STATEMENT

1.2 AIM AND OBJECTIVES

The objective of the study is

 To implement a machine learning framework that classifies emotions in audio-based

inputs appropriate for their consumption.

 To implement the model

HISTORY OF AUDIO BASED EMOTION RECOGNITION SYSTEM

retrieval..Commercial audio retrieval products, such as (http://www.musclefish.com) and

(http://www.comparisonics.com), are emerging as a result of these factors. While speech

of audio sounds is a relatively young field.

interclass scatters to identify highly correlated features, as well as a classification using a

previous audio retrieval work [6]–[8] in its content-based capability.

to improve the naturalness and efficiency of human-machine interactions. Because of the

difficulty in separating performed and natural emotions, recognizing human emotions is a

about robust recognition of selected emotions from salient spoken words.

processed even further.

learning approaches from multiple models.

SUMMARY OF RELATED WORKS

Machine is a sophisticated pattern identification tool that employs a discriminative technique

Description of the existing system

for audio signals.

Fig 2.1 Recording the Audio Signals

samples. This includes pre-emphasis and windowing. Pre-emphasis process is used to

frequency, pitch, energy and speaking rate are used.

based implementation of the ML algorithms and researchers use it extensively

Speech signal acquisition

Energy, Pitch Speaking rate

Ada Boost with C4.5

Fig 2.2: Block diagram for existing system

You might also like