You are on page 1of 5

International Journal of Computer Science Trends and Technology (IJCST) – Volume 10 Issue 3, May-Jun 2022

RESEARCH ARTICLE OPEN ACCESS

A Novel Approach to Predict the Reason for Baby Cry using


Machine Learning
Sahithi Vesangi [1], Saketh Reddy Regatte [2], Baby V [3], Chalumuru Suresh [4]
[1][2][3][4]
Computer Science Engineering, VNR Vignana Jyothi institute of Engineering and Technology,
Hyderabad - India

ABSTRACT
Cry is the only medium where infants can communicate their pain to the world. They try to express various emotions and
feelings through cry, where sadness, grief, pain, and loneliness are few of them to mention. Parents often make an effort to
understand their baby’s pain and attempt to make them cheerful back again. Young parents who are new to parenting have less
experience in understanding their infant’s cry. Due to their hectic schedule, they are often busy which leaves them less amount
of time to spend with their kids. Due to this situation, these parents often feel irritated or frustrated with long period of cry. Lack
of understanding the reason makes them feel confused and helpless to stop them from crying and bring back to normal
condition. Therefore, a solution that can predict why a baby is crying by using modern machine learning techniques is the main
aim of this paper. We intend to build a system that can capture the baby’s cry in the form of audio. By performing audio pre-
processing methods and feature extraction using Fast Fourier Transform (FFT), the CNN model is trained which obtained an
accuracy of 86.4%. Later, the Random Forest Classifier is used to classify the baby’s cry and predict reasons for the same. The
model generates 7 types of outputs namely: hunger, sleep, scared, temperature, burp, lonely and discomfort. Finally, to make the
system interactive we built a visual interface that can help parents to record their baby cry in real time and get the reason for the
same using the intended mobile application.
Keywords:- Baby Cry, Speech Recognition, Fast Fourier Transforms (FFT), Convolutional Neural Networks (CNN), Random
Forest Classifier.

I. INTRODUCTION
A study suggests that there are more than 130 million practitioners [2]. Later, baby cry detection systems have been
babies born in a year, where on an average 250 babies are invented that provided a solution to the parents to detect
born every minute [1]. Cry is the first gesture of a baby baby’s cry when they are busy or when the cry goes inaudible
entering into this world. In course of time cry is the only due to external noises. There are several datasets that provide
medium where infants can communicate their sadness, grief, baby cry audio files like Baby Chillanto, ChatterBaby, iCOPE,
and pain to other people. Though it is normal for the babies to etc which have predefined set of samples classified into
cry 2-3 hours a day in first two weeks, parents are sometimes different reasons of cry. Basic cry detection systems involved
perplexed why their child is crying and are in quest for the using the process of capturing baby’s acoustic signals and
reason behind it. There are two types of baby cries like normal performing audio processing on the same. Many machine
cries where the child is experiencing pain or sadness and the learning techniques like CNN (Convolutional Neural
pathological cries occur when there is a disease that is Networks), RBF (Radial Based Function) etc. are used to train
bothering the baby. the models considering the spectrogram as the input feed.
Visualize a scenario where a mother is taking care of an Other methods like SVM (Support Vector Machine), and
infant. Suddenly if the baby starts to cry with no intentional SVM-kernel based techniques are also popular to perform the
reason, it is very confusing for the mother to take an detection of baby cries in practice.
appropriate decision to stop their cry. If this cry continues for Though the results were proven efficient in detecting the
more than expected amount of time it causes irritation and cry but identifying the reason behind it remains as a future
frustration for both parents and neighbours yet, the child is scope for the existing problem. Successful cry detection
still experiencing grief or pain with a particular problem. methods failed to explain or emphasize on the reason why the
However, traditional methods include feeding the baby, baby was crying. To address this problem, we intend to
offering a pacifier, making a baby sleep, taking for a walk in propose a solution that captures the baby’s cry in audio form
the stroller etc. All the mentioned approaches are attempts to and converting it into spectrogram by performing Fast Fourier
stop the baby’s cry but exact reason of cry at that time may be Transform (FFT). This input is fed to train the CNN model to
still unknown. Typically, baby cries are elicited by rhythmic obtain the reason why the baby is crying. 7 reasons of cry
patterns imbibed between inhalation and exhalation. types are predicted namely: Sleep, Hunger, Scared, lonely,
Research on baby cry started in 1960s where Wasz– temperature, burp, and Discomfort. After training the model,
Hockert identified types of baby cries by trained medical we intend to provide an efficient user interface to help the

ISSN: 2347-8578 www.ijcstjournal.org Page 108


International Journal of Computer Science Trends and Technology (IJCST) – Volume 10 Issue 3, May-Jun 2022

parents understand the reason for their child cry effectively by etc. A voice activity detector was enabled which could
building a mobile application. automatically stop the algorithm whenever no external noise is
detected.
II. LITERATURE SURVEY Chaithra lakshmi C et al. [9] designed a system that could
Chunyan Ji et al. [3] reviewed various research on baby cry detect baby’s cry and show possible reasons for the same. The
analysis and their classifications. Data acquisition process was dataset with 100 audio files were considered form the Speech
followed throughout the literature to understand the workflow Technology Course in KTH (Royal Institute of Technology
of the methodologies. Typical differences were projected Sweden). Feature extraction was performed using MFCC and
through images that depict the audio signals of adults and the algorithm was trained using KNN. Finally, the accuracy of
infants which led to the stage of feature extraction. Various this algorithm was obtained as 71.42%. R Cohen et al. [10]
frequency domains like MFCC, LPCC, etc were taken into evaluated various Convolutional Network modes in
consideration. Their study indicated that infant cries are comparison with traditional machine learning methods like
rhythmic and cyclic due to natural interruptions in breath. logistic regression and support vector machines. Similarly,
Current work focused on creating a baby cry dataset of 30,000 also analysed the feed forward networks in comparison to
audio files with about 50 hours of length, to eliminate the recurrent neural networks. The system showed higher
limitations of data inconsistency and false predictions. performance for CNN architecture specialized with non-
Yizhar Lavner et al. [4] proposed two machine learning symmetric kernels.
algorithms to automatically determine baby cry. First, MFCC
and pitch were extracted from the features to train a low- III. PROPOSED SYSTEM
frequency logistic regression classifier. Second, is a CNN In this paper, we intend to propose a system that is capable
trained through audio recordings. A database consisting of of predicting the reason why a baby or infant is crying. The
baby audio ranging from 0-6 months was considered for below architecture diagram focuses on the components that
performance evaluation. Among these two CNN classifiers are responsible to predict the accurate reason for the baby’s
yields better results by providing 12.0% of false positive rate cry. This system can help parents to automatically understand
for 90% detection rate of the model. Lina Abou-Abbas et al. the reason and take necessary actions for the same. Unlike
[5] proposed a framework in cry-based diagnostic system that traditional methods of presenting cry detection methods alone,
can automatically segment baby cry sounds. This algorithm we intend to monitor various baby cry patterns and present a
was used to extract 2 cry components like audible expiration visual interface to the parents through a mobile application
and inspiration from the audio signals. This paper focuses on that can notify them time to time by predicting the reasons for
providing an approach that can determine the start and end baby’s cry.
points of the EXP and INSV components of a cry signal.
Ashwini.k et al. [6] combined both machine learning and
deep learning techniques to obtain better results for
classification. Baby cries are classified as pain, hunger, and
sleepiness. The data is collected in the form of audio signals
and is transformed into spectrogram using Short Time Fourier
Transform (STFT) technique. Later using this spectrogram as
input a deep neural network is trained and the obtained
features are passed into the support vector classifier. Overall,
the system projected an accuracy of 89.89%. Misba Anjum et
al. [7] proposed Intelligent Cry Monitoring and Cry Detection
System (ICMCDS) which will notify the patents who mostly
put their infants in the day care centres or baby care centres.
This notification system is done via SMS messaging to the Fig. 1 System Architecture
parents whenever the machine detects a cry acoustic signal of
the infant. The model uses Wi-Fi system to enable the audio A. Audio Pre-Processing
sensors of the system to detect the baby’s cry and inform their Whenever a baby cries, the audio signals are captured as
parents. The advantage of this system is low maintenance cost. input using the interface of the mobile application. The audio
Rami Cohen et al. [8] proposed an algorithm that can is successfully stored in the form of uncompressed .wav file
automatically detect infant’s cry during physical dangers like for easy algorithmic analysis. Background noise like opening
in a situation where parents leave their children in vehicles. and closing of doors in domestic environment, coughing,
The model is divided into 2 stages where primarily the feature choking, talking, sounds made by pets etc are some of them
extraction is done using MFCC and short-time energy which can cause loss of accuracy while predicting the reasons.
parameters. Later, part is the classification using KNN Hence, any background noise present in the audio is cleaned
algorithm. The performance evaluation of the algorithm is and audio splitting is performed using Python libraries. Later,
done by testing on various noises like car horns, engine noise a spectogram is generated by using LibROSA (a python

ISSN: 2347-8578 www.ijcstjournal.org Page 109


International Journal of Computer Science Trends and Technology (IJCST) – Volume 10 Issue 3, May-Jun 2022

library used for audio analysis). Since fundamental frequency


is high for babies than adults the generated spectrogram
showed high frequency variations for captured audio files.
Fast Fourier Transform (FFT) is a method used to pre-
process the signal for audio analysis. After the evolution of
computational algorithms, FFT gains higher importance as it
has the capability to present audio signals in sinusoidal wave
patterns. To perform this algorithm efficiently only a single
sided spectrum is considered, and no imaginary parts must be
present ensuring the absolute value of the signal is captured.

B. Model Training
Before training the model there are certain inputs to be fed
to it. After the mel-spectogram is generated, it is converted
into MFCC (Mel Frequency Cepstral Coefficients). The
obtained outputs are converted as logs (into natural
logarithmic scale) to project the Mel Representation of the Fig. 2 Graph representing accuracy per epoch
audio. By using Machine Learning techniques, the Automatic
Speech Recognition has become easier by using the libROSA Similarly, the loss value is also plotted to obtain the
python library which can generate MFCCs according to the relationship between the epochs and loss function.
current problem. Now, these MFCC are fed as input to the
CNN model.
Using the MFCC as input and the model is trained using the
Convolutional Neural Networks. A sequential CNN model is
chosen with 1 Dimensional network and ReLu activation
function is used for 2 convolutional layers. The accuracy of
86.4% is shown for the trained model. This pre-trained model
is used to predict the reason for the baby’s cry.

C. Reason Prediction
After the model is trained, detection of baby’s cry is
confirmed. Later, to predict various reasons on why the baby
is crying is the main aim of this paper. A Random Forest
Classifier is used to understand various vocal patterns of the
cry and different outputs are generated according to pre-
trained model. The outputs generated by this classifier are 7
Fig. 3 Graph representing loss value per epoch
types i.e., the model predicts 7 reasons for baby’s cry namely:
hunger, sleep, scared, temperature, burp, lonely and In order to maintain log of performance for the model and
discomfort. to observe false prediction of reason for baby cry. We
To make it easy for the user to understand how the model tabulated the values representing the results obtained by the
works, a mobile application is built using Flutter Framework model in the predicted reason column.
that can input baby’s cry as audio and present the output in
TABLE I
the form of any one of the aforementioned reasons to the ACTUAL VS PREDICTED RESULTS
parents.
Actual Reason Predicted Reason
IV. EXPERIMENT AND RESULT Scared Scared

In this paper, we built a system that is capable of capturing Lonely Lonely


baby’s cry and predicting the reason for the same. The below
Temperature Discomfort
graph is the plot obtained for the epochs vs accuracy.
Hunger Hunger

Burp Burp

ISSN: 2347-8578 www.ijcstjournal.org Page 110


International Journal of Computer Science Trends and Technology (IJCST) – Volume 10 Issue 3, May-Jun 2022

Sleep Discomfort input is fed into the CNN model which acquired an accuracy
of 86.4% with a loss value of 41%. Finally, we used the
Temperature Sleep
Random Forest Classifier in order to predict various reasons
Discomfort Discomfort for baby’s cry. The classifier provides 7 different outputs
namely: hunger, sleep, scared, temperature, burp, lonely and
Scared Scared discomfort. Our system is capable of presenting a visual
interface to the user to help them record their infant’s cry in
Lonely Hungry
real time and get the accurate reason for their cry. Currently,
Sleep Sleep the system can only alert or alarm the parent with a proper
reason. Unless recorded the system cannot automatically
To make the system easily accessible for the users, we built detect baby’s cry in domestic environment. Providing a
a mobile application using Flutter Framework. The below solution to the aforementioned problem by incorporating a
figure represents the screen which allows the user to interact message/SMS alert system to the parent while at work is a
with the system. future scope of this paper.

REFERENCES
[1] “With 250 babies born each minute, how many people
can the Earth sustain?” Mon 23 Apr 2018 06.00 BST
https://www.theguardian.com/global-
development/2018/apr/23/population-how-many-people-
can-the-earth-sustain-lucy-
lamble#:~:text=There%20are%20on%20average%20abo
ut,11%20billion%20people%20by%202100.
[2] O. Wasz-Höckert, T. J. Partanen, V. Vuorenkoski, K.
Michelsson, E. Valanne, The identification of some
specific meanings in infant vocalization.
Experientia. 20(3), 154 (1964).
[3] Ji, C., Mudiyanselage, T.B., Gao, Y. et al. A review of
infant cry analysis and classification. J AUDIO SPEECH
MUSIC PROC. 2021, 8 (2021).
https://doi.org/10.1186/s13636-021-00197-5
[4] Lavner, Yizhar & Cohen, Rami & Ruinskiy, Dima &
IJzerman, Hans. (2016). Baby cry detection in domestic
environment using deep learning.
10.1109/ICSEE.2016.7806117.
[5] Lina Abou-Abbas, Chakib Tadj, and Hesam Alaie
Fersaie A fully automated approach for baby cry signal
segmentation and boundary detection of expiratory and
Fig. 4 Screenshot of Mobile Application
inspiratory episodes The Journal of the Acoustical
After tapping the button, the baby’s cry gets recorded and Society of America 142,
the system accurately predicts the reason why the baby is (2017); https://doi.org/10.1121/1.5001491
crying. [6] K Ashwini, Vincent P. M. Durai Raj, Srinivasan
Kathiravan, Chang Chuan-Yu Deep Learning Assisted
V. CONCLUSIONS AND FUTURE SCOPE Neonatal Cry Classification via Support Vector Machine
Due to the advent of technology, there is a vast amount of Models Frontiers in Public Health VOLUME=9,
development in the infant cry detection systems. Though these YEAR=2021 DOI=10.3389/fpubh.2021.670352
systems are capable of accurately detection when the baby is [7] Misba Anjum, Priyanka M. J, 2020, Baby Motion and
crying, for e.g., when parents are busy, presence of external Cry Detection, INTERNATIONAL JOURNAL OF
noises, child left alone in vehicles etc, they fail to provide a ENGINEERING RESEARCH & TECHNOLOGY
solution why the baby is crying. This problem often leads to (IJERT) NCETESFT – 2020 (Volume 8 – Issue 14),
confusion and irritation to the parents, who are unaware of the [8] R. Cohen and Y. Lavner, "Infant cry analysis and
reason for baby’s cry. In this paper, our system presents a detection," 2012 IEEE 27th Convention of Electrical and
novel approach to predict the reason why the baby is crying. Electronics Engineers in Israel, 2012, pp. 1-5, doi:
To understand this, we record the baby’s audio, perform audio 10.1109/EEEI.2012.6376996.
splitting, and convert into MFCC using python libraries. This [9] Chaithra lakshmi C*, Aravinda B, Deeksha, Deeksha,
Sadhana Predicting the Reason for the Baby Cry Using

ISSN: 2347-8578 www.ijcstjournal.org Page 111


International Journal of Computer Science Trends and Technology (IJCST) – Volume 10 Issue 3, May-Jun 2022

Machine Learning. Journal of Artificial Intelligence, [20] Lionel M. Ni, Zhang Qian, Tan Haoyu, et al., Smart
Machine Learning and Soft Computing Volume 4 Issue Healthcare: From Iot to Cloud Computing [J], Science
1 DOI: http://doi.org/10.5281/zenodo.2656353 China, vol.4, no.43, pp.515-528, 2013.
[10] Cohen, R., Ruinskiy, D., Zickfeld, J., IJzerman, H., [21] ZHAO Wenbo, WANG Tingting, ZHANG Sheng, SUN
Lavner, Y. (2020). Baby Cry Detection: Deep Learning Guo-qiang, VQ-Based Recognition Algorithm for Babies
and Classical Approaches. In: Pedrycz, W., Chen, SM. Cries [J], Microcomputer Information, vol.27, no.4,
(eds) Development and Analysis of Deep Learning pp.224-225, 2011.
Architectures. Studies in Computational Intelligence, vol [22] He Lingsong, Yang Shuzi, Envelop Estimation by a Pair
867. Springer, Cham. https://doi.org/10.1007/978-3-030- of Digital Filters [J], Journal of Vibration Engineering,
31764-5_7 vol.10, no.3, pp.362-367, 1997.
[11] Y. Lavner, R. Cohen, D. Ruinskiy and H. Ijzerman, [23] Kang Huaguang, Zou Shoubin, Electronic Technology
"Baby cry detection in domestic environment using deep Foundation (Part Number) [M], Beijing: higher
learning," 2016 IEEE International Conference on the education press, 2000. 9. Chen Yongfu, New 555
Science of Electrical Engineering (ICSEE), 2016, pp. 1-5, Integrated Circuit Application [M], Beijing: Electronic
doi: 10.1109/ICSEE.2016.7806117. Industry Press, 2000.
[12] I. A. Dewi, A. Zulkarnain, and A. A. Lestari, [24] Bnic, I.A., Cucu, H., Buzo, A., Burileanu, D., Burileanu,
“Identifikasi Suara Tangisan Bayi menggunakan Metode C.: Baby cry recognition in real-world conditions. In:
LPC dan Euclidean Distance,” Elkomika, vol. 6, no. 1, 39th International Conference on Telecommunications
pp. 153–164, 2018. and Signal Processing (TSP). (June 2016) 315–31
[13] M.A.Anusuya and S.K.Katti, June, 2014, “Mel [25] Battaglino, D., Lepauloux, L., Evans, N.: The open-set
Frequency Discrete Wavelet Coefficients for Kannada problem in acoustic scene classification. In: IEEE
Speech Recognition using PCA,” ACEEE. International Workshop on Acoustic Signal
[14] R. D. Putra, A. L. Prasasti, and T. Waluyo, 2018, Enhancement (IWAENC). (Sept 2016) 1–5
“Analysis Of Retinex Algorithm On Digital Image From [26] Rabaoui, A., Davy, M., Rossignol, S., Lachiri, Z.,
CCTV Camera For Face Recognition,” JATIT, vol. 96, Ellouze, N.: Improved one-class svm classifier for
no. 23, pp. 7942–7962. sounds classification. In: IEEE Conference on Advanced
[15] M. D. Renanti, A. Buone, and Wi. A. Kusuma, 2013, Video and Signal Based Surveillance (AVSS). (2007)
“Infant Cries Identification By Using Codebook As 117–122
Feature Matching And MFCC As Feature Extraction,” [27] Tax, D.M.J., Duin, R.P.W.: Data domain description
JATIT, vol. 56, no. 2. using support vectors. In: European Symposium on
[16] W. S. Limantoro, 2016, C. Fatichah, and L. Yuhana, Artificial Neural Networks. (1999) 251–256
“Rancang Bangun Aplikasi Pendeteksi Suara Tangisan [28] Deng, L., Yu, D.: Deep learning: Methods and
Bayi,” Institut Teknologi Sepuluh November. applications. Foundations and Trends in Signal
[17] I. S. Permana and Y. I. Nurhasanah,2018, “Implementasi Processing 7(34) (2014) 197–387
Metode MFCC Dan DWT Untuk Pengenalan Jenis Suara [29] Piczak, K.J.: Environmental sound classification with
Pria Dan Wanita,” MIND, vol. 3, no. 1, pp. 49–63 convolutional neural networks. In: IEEE 25th
[18] Wang Wei, Ning Xinbao, Audio and Video Processing International Workshop on Machine Learning for Signal
Technology in the Bab’s Supervision System [J], Journal Processing (MLSP). (Sept 2015) 1–6.
of Nanjing University ( Natural sciences), vol.39, no.3, [30] Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever,
pp.440-445, 2003. I., Salakhutdinov, R.R.: Improving neural networks by
[19] Cao Shihua, Zhao Fang, Research and Realization on the preventing co-adaptation of feature detectors.
Model of Neonates Surveillance System Based on RFID arXiv:1207.0580 (2012)
[J], Micro Computer Application, vol.29, no.9, pp.89-94,
2008.

ISSN: 2347-8578 www.ijcstjournal.org Page 112

You might also like