You are on page 1of 7

CHAPTER ONE

1 INTRODUCTION

1.1 PROBLEM STATEMENT

A developed system with a knowledge base of emotions attached to several sounds and

recommends appropriate sounds suitable for the consumption of users that are high blood

pressure victims. Listening to audio/sounds that consists of fear, anger, sadness and anxiety can

trigger attack on users that are high blood pressure victims. Even with medical options of

resolving HBP cases this research assists in containing it among audio listeners.

1.2  AIM AND OBJECTIVES

The aim of this study is to develop an intelligent machine learning model that can

detect and classify emotions from sounds for management of high blood pressure patients

The objective of the study is

 To implement a machine learning framework that classifies emotions in audio-based

datasets

 To build machine learning model that recommends to high blood pressure patients audio

inputs appropriate for their consumption.

 To implement the model

1.3 SCOPE
CHAPTER TWO

2 LITERATURE REVIEW

HISTORY OF AUDIO BASED EMOTION RECOGNITION SYSTEM

AUDIO DATA is used in a variety of modern computer and multimedia programs. Audio and

multimedia apps deal with a variety of audio recordings. The capacity to classify and retrieve

audio files based on their sound qualities or content is critical to the success of their deployment.

The rapid growth of audio data necessitates the development of a computerized approach that

enables for efficient and automatic content-based audio database classification and

retrieval..Commercial audio retrieval products, such as (http://www.musclefish.com) and

(http://www.comparisonics.com), are emerging as a result of these factors. While speech

recognition, a closely related field, has a long history, content-based classification and retrieval

of audio sounds is a relatively young field.

Wold et al. have recently published a significant paper. [23], as represented by their "Muscle

Fish" system. The content-based capabilities of this study sets it apart from past audio retrieval

work [6]–[8]. To represent a sound in the Muscle Fish system, several perceptual properties like

as loudness, brightness, pitch, and timbre are used. The query sound is classified into one of the

sound classes in the database using a normalized Euclidean (Mahalanobis) distance and the
closest neighbor (NN) criterion. Similar features plus subband energy ratios are used in another

study by Liu et al. [11], and the separability of different classes is evaluated in terms of intra- and

interclass scatters to identify highly correlated features, as well as a classification using a

previous audio retrieval work [6]–[8] in its content-based capability.

RELATED WORK

Emotion recognition of human speech has become increasingly important in recent years in order

to improve the naturalness and efficiency of human-machine interactions. Because of the

difficulty in separating performed and natural emotions, recognizing human emotions is a

difficult undertaking in and of itself. A number of research have been carried out in order to

extract the spectral and prosodic elements that would allow for accurate emotion assessment. The

emotion classification utilizing human speaking utterance based on calculated bytes was

explained by Nwe, T. L., et al [13]. Using computed pitch from human voice, Chiu Ying Lay, et

al [6] explained how to classify gender. Chang-Hyun Park, et al. [4] have discussed the

extraction of acoustic characteristics from speech in order to characterize emotions. Nobuo Sato

et al. [11]discussed the MFCC technique in detail. Their main goal was to apply MFCC to

human speech and accurately classify emotions with a rate of above 67 percent. In an attempt to

improve accuracy, YixiongPan et al. [15] applied Support Vector Machines (SVM) to the

problem of emotion classification. Keshi Dai et al. [8] explained how to recognize emotions

using Support vector machines in a neural network, with an accuracy of about 60%. When it

comes to speech emotion recognition, Aastha Joshi [1] discusses the Hidden Markov Model and

Support Vector Machine features. The algorithms that allow a robot to express its emotions by

altering its speech intonation were presented by Sony CSL Paris [12 . BjörnSchuller, et. al [3]

discussed about the approaches to recognize the emotional user state by analyzing spoken
utterances on both, the semantic and the signal level. Mohammed E. Hoque, et. al [10] presented

about robust recognition of selected emotions from salient spoken words.

Turgut zseven [8] developed a novel feature selection method for voice emotion recognition.

They suggested an alternative approach of statistically picking features from a speech that differs

from the current one in that the number of features selected is reduced but the accuracy is

improved significantly. The study suggested that speech emotion analysis necessitates a number

of qualities, some of which aren't always useful or important for the application. Furthermore,

different emotions can have an impact on different features, lowering the system's accuracy.

Anjali Bhavan, Pankaj Chauhan, Hitkul, and Rajiv Ratn Shah published a paper in 2019 [10]

titled "Bagged support vector machines for emotion recognition from voice." The research is

conducted using three datasets, with features retrieved from these databases being reduced and

processed even further.

They employed a bagged ensemble as the foundation for Ensemble learning, which

outperformed single estimators, and a support vector machine with a gaussian kernel as the main

approach. Ensemble learning creates a new and more efficient model by mixing beneficial

learning approaches from multiple models.

SUMMARY OF RELATED WORKS

Emotions can be classified as Natural and Artificial emotions and further can be divided into

emotion set i.e. anger, sadness, neutral, happy, joy, fear. Different machine learning techniques

have been applied to create recognition agents including k-nearest neighbour,support vector

machine , radial basis function and back propagation of neural networks. A Support Vector

Machine is a sophisticated pattern identification tool that employs a discriminative technique


(SVM). For data classification, SVMs use linear and nonlinear separating hyper-planes. It can be

used for more than only speech recognition; it can also be used for image processing and

bioinformatics. According to this review, support vector machine classification has a high degree

of accuracy. SVM can be used in conjunction with other methods in speech recognition, such as

the Hidden Markov Model (HMM). Much work have been done on classifying emotions in audio

, voice or speech but it is an entirely new trend to censor negative words and sounds that triggers

negative emotions that may cause harm to high blood pressure patients.
CHAPTER 3

Description of the existing system

The existing system just classifies emotions generated from audio, voice and speech

WORK DONE

a. Speech signal acquisition: The speech signals were captured from 300 people including

both male and female in the age group of 20- 30 years with the help of microphones. All

the voice samples for emotion detection were recorded under circumstances that means

samples may contains the noise like of fan or any other common noise. For emotion

detection people were told to speak in four different emotions i.e., 'happy', 'normal', 'sad'

and 'angry'. The sampling has been done while recording the voice sample as well.

Sampling is described as one of the formations of discrete signal from the continuous

signal. The speech signals captured were sampled at 44 kHz. Praat is used for recording

for audio signals.

Fig 2.1 Recording the Audio Signals


b. Pre-Processing: Pre-Processing is done to remove the noise present in the collected

samples. This includes pre-emphasis and windowing. Pre-emphasis process is used to

remove the noise from the captured signal and also spectral subtraction method has been

used to remove the noise and further hamming window is used for windowing.

c. Feature extraction: Speaking Rate, pitch and energy are extracted for emotion detection.

Features were extracted from the voice signals using praat tool. Pitch is one of the most

essential components of emotion recognition through audio signal. It defines the rate of

vibration of speaker's vocal cord. Although different sub features like fundamental

frequency, pitch, energy and speaking rate are used.

d. Classification: The feature vector is constructed using the above mentioned features

which were fed to ML algorithms for recognition. AdaBoost with C4.5 is used for

emotion detection. Classification is done by using the weka which is developed atClassi

New Zealand based University of Waikato. Weka is an open simulator and has a java-

based implementation of the ML algorithms and researchers use it extensively

Speech signal acquisition

Pre - processing

Energy, Pitch Speaking rate

Ada Boost with C4.5


(Training and Test case)

Result

Fig 2.2: Block diagram for existing system

You might also like