Professional Documents
Culture Documents
1 1.1 Problem Statement: Chapter One
1 1.1 Problem Statement: Chapter One
1 INTRODUCTION
A developed system with a knowledge base of emotions attached to several sounds and
recommends appropriate sounds suitable for the consumption of users that are high blood
pressure victims. Listening to audio/sounds that consists of fear, anger, sadness and anxiety can
trigger attack on users that are high blood pressure victims. Even with medical options of
resolving HBP cases this research assists in containing it among audio listeners.
The aim of this study is to develop an intelligent machine learning model that can
detect and classify emotions from sounds for management of high blood pressure patients
datasets
To build machine learning model that recommends to high blood pressure patients audio
1.3 SCOPE
CHAPTER TWO
2 LITERATURE REVIEW
AUDIO DATA is used in a variety of modern computer and multimedia programs. Audio and
multimedia apps deal with a variety of audio recordings. The capacity to classify and retrieve
audio files based on their sound qualities or content is critical to the success of their deployment.
The rapid growth of audio data necessitates the development of a computerized approach that
enables for efficient and automatic content-based audio database classification and
recognition, a closely related field, has a long history, content-based classification and retrieval
Wold et al. have recently published a significant paper. [23], as represented by their "Muscle
Fish" system. The content-based capabilities of this study sets it apart from past audio retrieval
work [6]–[8]. To represent a sound in the Muscle Fish system, several perceptual properties like
as loudness, brightness, pitch, and timbre are used. The query sound is classified into one of the
sound classes in the database using a normalized Euclidean (Mahalanobis) distance and the
closest neighbor (NN) criterion. Similar features plus subband energy ratios are used in another
study by Liu et al. [11], and the separability of different classes is evaluated in terms of intra- and
RELATED WORK
Emotion recognition of human speech has become increasingly important in recent years in order
difficult undertaking in and of itself. A number of research have been carried out in order to
extract the spectral and prosodic elements that would allow for accurate emotion assessment. The
emotion classification utilizing human speaking utterance based on calculated bytes was
explained by Nwe, T. L., et al [13]. Using computed pitch from human voice, Chiu Ying Lay, et
al [6] explained how to classify gender. Chang-Hyun Park, et al. [4] have discussed the
extraction of acoustic characteristics from speech in order to characterize emotions. Nobuo Sato
et al. [11]discussed the MFCC technique in detail. Their main goal was to apply MFCC to
human speech and accurately classify emotions with a rate of above 67 percent. In an attempt to
improve accuracy, YixiongPan et al. [15] applied Support Vector Machines (SVM) to the
problem of emotion classification. Keshi Dai et al. [8] explained how to recognize emotions
using Support vector machines in a neural network, with an accuracy of about 60%. When it
comes to speech emotion recognition, Aastha Joshi [1] discusses the Hidden Markov Model and
Support Vector Machine features. The algorithms that allow a robot to express its emotions by
altering its speech intonation were presented by Sony CSL Paris [12 . BjörnSchuller, et. al [3]
discussed about the approaches to recognize the emotional user state by analyzing spoken
utterances on both, the semantic and the signal level. Mohammed E. Hoque, et. al [10] presented
Turgut zseven [8] developed a novel feature selection method for voice emotion recognition.
They suggested an alternative approach of statistically picking features from a speech that differs
from the current one in that the number of features selected is reduced but the accuracy is
improved significantly. The study suggested that speech emotion analysis necessitates a number
of qualities, some of which aren't always useful or important for the application. Furthermore,
different emotions can have an impact on different features, lowering the system's accuracy.
Anjali Bhavan, Pankaj Chauhan, Hitkul, and Rajiv Ratn Shah published a paper in 2019 [10]
titled "Bagged support vector machines for emotion recognition from voice." The research is
conducted using three datasets, with features retrieved from these databases being reduced and
They employed a bagged ensemble as the foundation for Ensemble learning, which
outperformed single estimators, and a support vector machine with a gaussian kernel as the main
approach. Ensemble learning creates a new and more efficient model by mixing beneficial
Emotions can be classified as Natural and Artificial emotions and further can be divided into
emotion set i.e. anger, sadness, neutral, happy, joy, fear. Different machine learning techniques
have been applied to create recognition agents including k-nearest neighbour,support vector
machine , radial basis function and back propagation of neural networks. A Support Vector
used for more than only speech recognition; it can also be used for image processing and
bioinformatics. According to this review, support vector machine classification has a high degree
of accuracy. SVM can be used in conjunction with other methods in speech recognition, such as
the Hidden Markov Model (HMM). Much work have been done on classifying emotions in audio
, voice or speech but it is an entirely new trend to censor negative words and sounds that triggers
negative emotions that may cause harm to high blood pressure patients.
CHAPTER 3
The existing system just classifies emotions generated from audio, voice and speech
WORK DONE
a. Speech signal acquisition: The speech signals were captured from 300 people including
both male and female in the age group of 20- 30 years with the help of microphones. All
the voice samples for emotion detection were recorded under circumstances that means
samples may contains the noise like of fan or any other common noise. For emotion
detection people were told to speak in four different emotions i.e., 'happy', 'normal', 'sad'
and 'angry'. The sampling has been done while recording the voice sample as well.
Sampling is described as one of the formations of discrete signal from the continuous
signal. The speech signals captured were sampled at 44 kHz. Praat is used for recording
remove the noise from the captured signal and also spectral subtraction method has been
used to remove the noise and further hamming window is used for windowing.
c. Feature extraction: Speaking Rate, pitch and energy are extracted for emotion detection.
Features were extracted from the voice signals using praat tool. Pitch is one of the most
essential components of emotion recognition through audio signal. It defines the rate of
vibration of speaker's vocal cord. Although different sub features like fundamental
d. Classification: The feature vector is constructed using the above mentioned features
which were fed to ML algorithms for recognition. AdaBoost with C4.5 is used for
emotion detection. Classification is done by using the weka which is developed atClassi
New Zealand based University of Waikato. Weka is an open simulator and has a java-
Pre - processing
Result