Professional Documents
Culture Documents
Manoj D5
B.Tech (CSE-IoT
VNRVJIET
Hyderabad, India
20071a6920@vnrvjiet.in
Abstract— Communication for children, Apart from communication, a baby's cries can reveal
particularly infants, is often expressed through crying. significant information about their growth. Based on
A baby's cry is recognized by its natural rhythmic tone distinct features of the baby’s cry, nurses and
and fluctuations in vocal pitch. Interpreting these cries neonatologists have been able to recognize SIDS, drug
precisely can provide parents with significant details about
their baby's well-being, allowing them to pay greater attention
usage during the pregnancy period. These issues underline
to their needs. This ability is particularly important in the importance of correctly deciphering and classifying
guaranteeing the comfort of the baby. The ability to define a baby cries. Infants originally exhibit five types of cries. As
baby's cry efficiently ensures caretakers in providing they develop, their cries become increasingly sophisticated,
conscious care. To meet this need, Baby Cry Classification- a even including behaviors like "fake crying." The ability to
technology-driven solution-has been developed. This advanced
tool recognizes different infant cries in real-time by leveraging
distinguish between these primal cries has significant
powerful audio analysis and processing methods. This Baby implications for understanding the infant's needs and
Cry Classification feature has the potential to improve emotional and intellectual development.
childcare. Quickly identifying the unique cues inside a baby's
cry, enables parents to respond immediately and precisely,
meeting their baby's particular needs. It lessens the burden on
caregivers, providing them with a vital tool to guarantee the This study explores the field of baby cry classification,
wellbeing of the baby, even when they are not present in which is a subset of affect recognition that is similar to
person, through advancing better and more adept childcare speech-based emotion recognition. Mel Frequency
practices. Cepstral Coefficients (MFCCs), formants, jitter, shimmer,
Keywords—Communication, Baby Cry Classification. pitch, breathiness, and spectral statistics are among the
qualities that are essential for emotion identification.
However, the differences in vocal tracts between infants
I. INTRODUCTION
and adults have raised questions about the accuracy of
Babies use crying as a fundamental method of certain features when applied to baby cries. Previous
communication as it enables them to convey their needs studies have primarily focused on a limited set of
and acquire attention from their caregivers. The capability individual features, necessitating a broader analysis of the
to understand these cries is necessary for caregivers to diverse features present in infant cries.
provide proper care and guarantee the well-being of the
infant. For parents, understanding different baby cries is a On the classifier side, prior research has usually taken
completely challenging task. Priscilla Dunstan's baby cry advantage of a limited pick of classifiers, confining the
categorization has given valuable acumens into this potential concerning this field. Therefore, there is a need to
complex accent of cries and laid a foundation for further investigate a more varied range of classifiers for better
research in this field. performance in baby cry categorization.
The complication and assortment of baby cries, in addition classification provides high accuracy compared to other
to the issues in current methodologies, stress the need for combinations [4]. An automated method of baby cry
new methodologies in this field. Automated systems classification using Dunstan Baby Language and CNN is
development is required to categorize and decipher baby presented in [5]. Instead of MFCC (or) LFCC, PRAAT is
cries, which in turn improves the overall well-being of both used for the transformation of the audio signal into a
parents and babies. The advancements in computing spectrogram as the length of the audio file is below 1 second
technology made it achievable for baby monitors to provide [5].
vital information about baby needs, sleep schedules, and A low-complexity model for detecting newborn cries was
overall well-being. The research in this paper aims to jointly proposed by Tanmay Khandelwal, Rohan Kumar
address the challenge of inevitably deciphering baby cries, Das, and Eng Siong Chng. Depth-wise-separable
paving up the opportunity for Ambient Intelligence (AmI) convolutions, which need 3% of the parameters of
systems that can considerably improve the quality of life convolutional recurrent neural networks, were used by the
for both babies and parents. authors in place of 2D convolutions. As a result, training
iteration times were shortened, and classification
This research is significant because it can help us better performance was 15.13% better than that of CRNN [6].
comprehend by what babies try to convey and help us cater Other methods proposed:
to their needs more precisely. By using machine learning, Gaussian mixture model clustering is used in this paper
this study provides a feasible approach for enhancing the [7] to propose the baby cry classification system. This
well-being and baby growth, lowering the challenges faced model effectively clustered baby voices, and it
by parents and caregivers in comprehending and meeting subsequently produced an accuracy of 81.27% by labeling
their baby's needs. data using a unique method based on sampling
distributions and the central limit theorem [7]. Another
II. RELATED WORK work was proposed by [8], which uses the CNN-RNN
Yizhar Lavner proposed two machine learning model for feature extraction and classification. This model
algorithms namely logistic regression and CNN for the achieved an accuracy of 94.97% in binary cross entropy.
automatic detection of baby cries in audio recordings. The The dataset used for this research is obtained from
research’s dataset is an annotated database containing Dunstan Baby Language discovered by Priscilla Dunstan
recordings of babies of zero to six months in domestic in 2006 [8]. Related to baby cry classification, [9]
environments. They extracted findings from the recordings, proposes openSMILE to extract the audio features.
including pitch, formants, and Mel-frequency cepstrum General and emotion-based characteristics are two
coefficients, to train the logistic regression classifier. The categories for feature sets. And compared them with seven
second technique operates on the log Mel-filter bank different classifier categories.
representation of the recordings using a specialized In the paper [10], Dropout technology is used to avoid
convolutional neural network (CNN). The experimental the over-fitting problem of CNN. The main objective of
results demonstrated that CNN has an advantage over this technology is to drop the neuron connections causing
logistic regression classifier because CNN suits well for complex co-adaptations on the training data. The network
large datasets [1]. The authors in this paper [2] considered achieved an accuracy of 78.5%. Paper [11] proposed a
sounds that precede the cry and classified the cry based on baby cry sound detection (BCSD) method under indoor
Dunstan Baby Language. The audio data originating from background sound environments. As per reference [11],
Romanian babies was classified using a CNN architecture 1D-CNN with a 500ms frame duration outperforms feed-
that was trained using recordings of babies from Australia. forward neural networks and multi-class support vector
The experiment aimed to investigate the potential outcomes machines in terms of performance.
should the participants come from a different cultural According to paper [12], feed-forward neural network
background. They compared the results of the CNN model architecture yields the highest correct classification rates
with results provided by Dunstan experts and concluded for Mel-Cepstrum and Mel Filter-Band Energy inputs.
Dunstan as a universal language [2]. Time-delay neural networks are also found to be
An approach using ANN and Genetic Algorithm was ineffective in classifying newborn cries. The paper [13]
proposed by Azadeh Bashiri and Roghaye Hosseinkhani. An proposed a model in which features extracted include
efficient genetic algorithm chooses appropriate features. The MFCC, LPC, and BFCC. The classification techniques that
use of GA in the study, which produced an accuracy of are employed are Nearest Neighbour and artificial neural
99.9%, is a benefit [3]. Sita Purnama Dewi and Budhi networks. According to the simulation's findings, the
Irawan analyzed and examined the efficiency of KNN, classification rate is nearly 70%. In [14], temporal,
Vector Quantization, and Simple Neural Networks for cry prosodic, spectral, and cepstral features are used in the
classification. For feature extraction, the authors used Mel generation of a dataset. Following the generation of the
Frequency Cepstral Coefficient and Linear Frequency dataset, the dataset was subjected to classification
Cepstral Coefficient in an effort to increase classifier techniques such as SVM, Bagging, Boosted Trees, and
accuracy. LFCC for feature extraction with KNN for Decision Trees to determine the best classifier. The
authors in the paper [15], used MFCC for feature cepstral coefficients) in most of the training models. K-
extraction and KNN for arrangement. The model achieved Nearest Neighbors and Random Forest algorithms perform
an accuracy of 76.16% using a-cry-corpus dataset. In best when it comes to individual feature performance. In the
Paper [16], the authors proposed transfer learning with paper [25], features from audio signals and spectrograms are
ResNet50, and SVM and also performed ensemble used fed to SVM classifiers. The most accurate result
classification by combining the predictions of both SVM (71.68%) is achieved by using LBP texture descriptors taken
and transfer learned ResNet50. Model accuracy ranges from spectrogram images. The paper's main use [26] is to
from 90.1% to 91.1% [16]. pinpoint situations where newborns could be physically in
A modified KNN is used in the paper [17] to classify danger, including when their parents leave them in cars. A
newborn cries, and a maximum accuracy of 80.39% was voice activity detector was used to reduce power
attained. The accuracy of the i-vector-based approach and consumption when there was no sound activity [26].
the GMM-UBM approach was 47% and 70%, In the paper [27], a volume-based thresholding technique is
respectively. The dataset consists of 83 label infant cries. used to detect background noise, and CNN is used to
[17]. The work [18] presents a feasibility investigation of a identify baby cries. By working with the audio signals' log
unique continuous Hidden Markov Model approach for linear-scale filterbank energies, these CNN algorithms
classification. Linear prediction cepstral coefficients derive cry detection characteristics. Two convolutional
(LPCC) and the Mel frequency cepstral coefficient routes, two fully linked layers, and one output layer make up
(MFCC), with MFCC having a greater accuracy than the CNN architecture utilized in the second phase. While the
LPCC, are the features that are retrieved. The dataset lower approach concentrates on features in the time domain,
under consideration will comprise 700 baby screams, with the top path aims to extract more features from the
an accuracy ranging from 71.8 to 91. [18]. The authors frequency domain. In order to categorize baby cries, this
analyzed and examined the efficiency of feed-forward study [28] investigates the use of transfer learning in a real-
neural networks, recurrent neural networks, and time-delay time embedded system. The study compares and contrasts
neural networks. The accuracy values for feed-forward CNN, CNN+LSTM, and transfer learning-based models
neural networks, recurrent neural networks, and time-delay using the Dunstan Baby Language dataset.
neural networks are 69.1, 64.64, and 61.0, respectively. As
a result, feed-forward neural networks achieved high The goal of this research [29] is to understand infant cries
accuracy compared to others [19]. Researchers on a through the use of signal processing techniques and features
database of forty healthy babies who were wailing for any
such as LPC, LPCC, BFCC, and MFCC. It uses a modified
one of the five physiological needs—weariness, hunger,
Kaczmarz algorithm to improve performance in noisy
eructation, flatulence, and discomfort—conducted studies
environments and compressed sensing to distinguish
on the paper's authors [20]. The database was constructed
between normal and abnormal cry signals. An ANN and
and tagged by the research team using the Dunstan Baby
BFCC features together produced a 76.47% classification
Language (DBL), and it will include 128 infant cries.
While the ivectors strategy yields a 58.0% accuracy rate, rate for abnormal cries[29]. A novel method for early
the GMM-UBM method yields a 50.6% accuracy rate for asphyxia identification in infants using deep learning is
the Splann database. In comparison to the Dunstan presented in the paper [30]. For accurate asphyxiated infant
database, the Splann database yields a lower result [20]. scream categorization, the approach creates a merged
Self-supervised learning (SSL) was investigated by the feature matrix by combining weighted prosodic and auditory
authors in the study [21] for the purpose of evaluating a characteristics. The Baby Chillanto Database is used in the
novel cry recording database that included clinical study, which demonstrates an impressive testing accuracy of
indicators for over a thousand babies. Specially targeted on 96.74%, outperforming single-feature models.
the detection of neurological injury. The K-Nearest
Neighbour (K-NN) classification and LFCC feature
III. COMPARISION TABLE
extraction approach are utilized in the paper [22] to
ascertain if the baby is crying or not. In this research, the The performance of proposed approaches from earlier
author considers two distinct features: LFCC and MFCC. studies is shown in the table below.
93% accuracy is obtained while using Euclidean distances
for classification and LFCC for feature extraction. The
authors of the paper [23] suggested a completely automatic
method that makes an effort to distinguish between various
cry kinds. The Gaussian Mixture Models and i-vectors
serve as the basis for the baby cry categorization method.
The i-vector has an accuracy of 58.0 and the Gaussian
mixture model has an accuracy of 50.6.
According to paper [24], GTCC (Gammatone frequency
cepstral coefficients) outperforms MFCC (Mel-frequency
Fig 1 shows the corresponding accuracies of various
classifiers utilizing the Mel-Frequency Cepstral Coefficients
(MFCC) feature extraction method on the Baby Chillanto
Database. The classifiers CNN, Multi-layer Perceptron
ANN, ELM, and SVM are shown on the x-axis. The
accuracy expressed in percentiles is presented on the y-axis.
CNN emerges as the ultimate precise classifier, gaining 99%
accuracy. SVM obtained the least accuracy of 85%.
IV. CONCLUSION
The capability to categorize baby cries holds the key to
Table1 Comparison unlocking a hidden realm of communication. Recent
progress in cry classification, using various classifiers and
The first study is by Azadeh in 2020. They utilized a dataset advanced feature extraction techniques, have yielded
named Baby Chillanto and derived features utilizing MFCC
hopeful results. This technology provides a radical approach
and LPC. They therefore categorized the cries utilizing a
to monitor infant well-being. Imagine a day in the future
CNN and attained an accuracy of 99%. The second study,
when cries indicative of health concerns are precisely
by Sita Purnama in 2019 made use of real time data and
recognized, easing early intervention and enhanced clinical
derived features utilizing MFCC and LFCC. They utilized a
results.
combination of NN and KNN for classification and attained
a 75% accuracy. The next study is by Eduard Franti in 2018.
They utilized a database of 315 baby cries and derived The impact goes beyond medical applications. By
features utilizing MFCC. They used CNN for classification deciphering the reason behind each cry, the baby cry study
and attained 89% accuracy. The study by Tusty Nadia in provides caregivers with a brief understanding of their
2019 utilized a DBL dataset and used CNN for classification baby's needs. This fosters stronger bonds, enabling parents
to attain an accuracy of 94.97%. The next row depicts a and caregivers to respond with greater empathy and
study by Karinki in 2020. They made use of real time data effectiveness. It bridges the gap between infants and parents,
and derived features utilizing MFCC. They utilized enhancing the overall well-being of both parents and babies.
combination of ID-CNN, FENN, and Multiclass SVM for
classification and obtained an accuracy of 88.6%. The This research paves the way for a future where technology
seventh study by Lichuan Liu in 2018 used real time acts as a compassionate translator, enhancing understanding
database and derived features utilizing LPC, LPCC, BFCC, and communication. As the accuracy and sophistication of
and MFCC. They obtained an accuracy of 76.47% by these techniques continue to evolve, the potential to
combining Nearest Neighborhood and NN classifiers. The revolutionize childcare and positively impact countless lives
final row outlines a study by Ptthaya Rani in 2022. They becomes increasingly tangible. With continued
used MFCC for feature extraction from a-cry-corpus dataset. development, we stand on the precipice of a future where
They achieved an accuracy of 76.16% via the combination the cryptic language of baby cries is demystified,
of KNN, Naive Bayes, and SVM classifiers. empowering caregivers, and fostering a new era of
responsive and empowered care.