You are on page 1of 4

Effect of Noise on Generic Cough Models

Sayanton V. Dibbo1 * , Yugyeong Kim2 * , Sudip Vhaduri3


1
Dartmouth College, NH, USA, 2 Fordham University, NY, USA, and 3 Purdue University, IN, USA
1
f0048vh@dartmouth.edu,2 ykim180@fordham.edu,3 svhaduri@purdue.edu

Abstract—Respiratory diseases, such as chronic obstructive and computing capabilities of smartphones, we are able to ac-
pulmonary disease (COPD) and asthma, are two major reasons complish various tasks, including sleep monitoring [13], [14],
for people’s death across the globe. In addition to these common mental and physical health monitoring [15]–[19], user authen-
inflammatory respiratory diseases, some human transmissible
respiratory diseases, such as coronaviruses, cause a global pan- tication [20]–[26], and place discovery [27]–[31], among many
demic. One major symptom of these inflammatory respiratory others using various smartphone services. Motivated by this,
diseases is coughing. Identifying coughing using smartphone- researchers have started relying on smartphone-microphone
microphone recordings is easily doable from a remote setup and data, i.e., audio recordings, to detect various types of non-
can help physicians and researchers early guess a situation for an speech human sounds, including coughs [8] due to their low-
individual and a community. However, smartphone-microphone
recordings can be affected by environmental noises and that can cost and wide-scale applicability. While some researchers have
impact the performance of models that are developed to detect developed machine learning models to detect coughs [32],
coughing from microphone recording. Thereby, in this work, we other researchers have started using deep learning to develop
present a detailed analysis of noise impacts on cough detection cough detection models [33]. While most of their works rely
models. We develop models using voluntary coughs and other on server-based implementations to achieve a good perfor-
background sounds obtained from three public datasets and test
the performance of those models while detecting various types mance, they bring additional challenges, including offloading
of coughs, including COPD and COVID-19, obtain from three privacy-sensitive audio recordings from the users’ end to
separate datasets in the presence of background noises. the server. Additionally, most of the works are targeted to
Index Terms—audio analytics; cough; noise; smartphone develop models to detect disease-specific cough using a limited
dataset with limited to no consideration of environmental
I. I NTRODUCTION effects, e.g., the presence of different types of noises at
varying levels [34]. Additionally, models developed from one
A. Motivation
disease-specific cough may down perform while applying on
Coughing and its patterns have association with different a different disease-specific cough. Sometimes it is even more
types of inflammatory respiratory disease, such as COPD [1], challenging to develop cough models for an unknown or new
asthma [2], tuberculosis [3], and coronavirus caused COVID- disease before its outbreak, such as COVID-19. Therefore, it
19 or 2019-nCoV, MERS-CoV, and SARS-CoV-2 diseases [4], is important to develop cough models that can be applicable
among several other [5]. According to the world health or- to users with different types of respiratory diseases in the
ganization, about 67.7% of COVID-19 patients have a dry presence of different background noises.
cough [6]. Thereby, physicians rely on coughing and their
patterns as one of the major symptoms while diagnosing C. Contribution
various respiratory diseases and their stages. However, the In this work, we first present three modeling approaches
most common approaches to assess coughing patterns are (Section II-A). Next, models developed from three publicly
based on various patient-reported surveys [7], [8], which inher- available datasets are extensively tested on three separate
ently suffer from various limitations of self-reported surveys, datasets, including two disease-specific coughs (COVID-19
including human errors and recall bias [9]–[11]. Furthermore, and COPD) datasets (Section II-B) using 15 types of back-
these self-reported cough symptoms do not correlate well with ground noises at varying “signal-to-noise ratio” values to better
objective cough recordings. Thereby, an objective reporting of assess the generalizability and broad applicability of regular
cough symptoms in different environments using audio sensing cough-driven models (Section III). Findings from this work
and computing capability of smartphones can be extremely will provide guidelines for future research in this direction.
helpful not only to better assess various respiratory disease
symptoms but also to foster wide-scale coverage at a low cost II. M ETHODS
in normal and human transmissible pandemic time. In this section, we introduce our modeling schemes. We also
introduce our datasets, pre-processing steps, feature generation
B. Related Work and selection, and hyper-parameter optimization.
With the advancement of mobile networks due to emergence
A. Modeling Approaches
of the internet of things (IoT) [12] and improvement of sensing
In this work, we present three modeling schemes. In un-
* Both authors contributed equally to this research guided modeling approach, we develop unary (one class)
The chronic obstructive pulmonary disease (COPD) dataset
consists of cough sounds obtain from 12 patients (avg. age
56.2 ± 0.9 years). We use the RecForge II android app to
collect these test coughs at 44.1 kHz frequency. While we
primarily use the first three datasets for model development,
the last three datasets, i.e., SNP, COVID-19, and COPD
datasets, are used to test the models.
C. Pre-processing
Fig. 1: Bar graphs with error bars of model performances when We first change the sampling frequency of all clips to 44.1
testing on different types of coughs (no noise augmentation) kHz. Next, we segment the clips to collect ground truth labels,
augment the data, and split them into train-test sets.
1) Data Segmentation and Labeling: In this work, we use
models using only target sound, e.g., cough. On the other the Audacity [40] desktop audio-processing application to
hand, in guided modeling approach, we develop three separate segment and label cough clips into two or three phase cough
binary models considering three categories (i.e., animal, non- events [7]. In summary, we obtain 106 (ESC-50), 106 (SNP),
cough human, and non-living being) of background noises. In 170 (COVID-19), and 282 (COPD) cough events, and 40 non-
these three models, class-1 consists of the same cough sounds, cough events from one of the 15 types of background sounds.
but class-0 consists of one particular noise category, which is 2) Data Augmentation: Often audio recordings, e.g., a
comprised of five types of noises (Section II-B). Finally, In person’s coughing patterns, can be altered due to background
semi-guided modeling approach, we develop binary models for changes, a user’s physical state or mood (tired, excitement,
the semi-guided environments using coughs (class-1) and 15 exercising, and other numerous states). To simulate these
types of sounds obtained from the three categories of sounds effects, we consider three types of augmentations, i.e., 14 pitch
(class-0). shifts (±0.5, ±1, ±1.5, ±2, ±2.5, ±3, and ±3.5), three time
stretches (0.5, 0.25, and 0.75), and four “signal-to-noise ratio”
B. Audio Data Collection (SNR) values (0.5, 0.1, 2, and 10) for noise superposition.
In this work, we collect various cough and non-cough While pitch shift and time stretch are used to augment the
sounds from six separate audio datasets, including two respira- cough and non-cough sounds obtained from the ESC-50,
tory disease datasets. The Environmental Sound Classification FreeSound, and US-8K datasets (during model development),
(ESC-50) dataset [35] consists of five categories of sound clips noise augmentations are used to modify test-coughs obtained
recorded at 44.1 kHz frequency. The FreeSound dataset [36] from the ESC-50, SNP, COVID-19, and COPD datasets.
is a collaborative repository with 400k+ sounds and effects, 3) Train-Test Splits: As discussed before, we primarily use
which cover a wide range of recordings from field to syn- the 106 ESC-50 coughs and their 17 pitch-shift and time-
thesized sounds, recorded at 44.6 ± 4.2 kHz frequency. The stretch augmentations, i.e., a total of 1098 cough events (106
Urban Sound 8K (US-8K) dataset [37] consists of 8k+ sound x (1+17)) to form class-1, while developing models. We
clips recorded at 44.1 kHz from 10 types of urban sounds. In first split the 106 coughs randomly 10 times using a 9:1
addition to cough clips (obtain from ESC-50), from these three train-test ratio. Then, we combine the augmented versions to
datasets we create three common categories of background maintain mutual exclusion between the train and test sets. For
sounds, which we use as class-0 (i.e., non-cough class) while class balancing, we also select the same number of train-test
developing models and as background noises to modify test samples uniformly from the five or 15 types of background
coughs while testing noise effects. sounds (class-0 non-cough samples) to form the class-0 while
Animal category consists of cricket, crow, dog, frog, and developing binary guided or semi-guided models, respectively.
rooster sounds obtain from the ESC-50 dataset. Non-cough D. Feature Engineering
human category consists of breathing, laughing, snoring, and We compute 40 Mel-frequency cepstral coefficient (MFCC)s
sneezing sounds obtain from the ESC-50 dataset and the as well as 40 first and 40 second temporal derivatives. In
throat-clearing sounds obtain from the FreeSound dataset. summary, we compute a set of 120 candidate features. Using
Non-living being category consists of door knock, washing the “Select K Best” and variance-based approaches, we find
machine, vacuum cleaner, and engine sounds obtain from the 120 and 70 most influential features are good compromise for
ESC-50 dataset, and the air conditioner sounds obtain from binary and unary classifiers, respectively.
the US-8K dataset.
The SoundSnap (SNP) dataset [38] consists of 250k+ pro- E. Parameter Optimization and Classifier Selection
fessional sound effects. We obtain test-cough sounds recorded While modeling each split, we separately perform the hyper-
at 46.65 ± 11.10 kHz. The Coswara COVID-19 dataset [39] parameter optimization using grid search with various ranges
consists of cough and breathing sounds of COVID-19 positive of values. Finally, we select a combination of various parame-
patients and healthy participants. We obtain the COVID-19 ter values that achieves the highest model performance across
positive test-cough sounds recorded at 47.82 ± 0.83 kHz. all 10 splits as the optimal combination.
Fig. 2: Bar graphs with error bars of model performances when testing on noise augmented coughs

From our experimentation with a wide range of classifiers,


we find the random forest (RF) with 100 estimators works
the best for the guided models trained with the non-cough
human and non-living being background sounds as class-0.
Similarly, gradient boosting (GB) with 100 estimators works
the best for guided models with animal background sounds as
class-0. For semi-guided models, random forest (RF) with 100
estimators works the best. The SVM classifier with polynomial (a) Performance degradation (b) T-SNE plot
kernel (degree = 2 and regularization parameter = 1) works the
best for unary models, i.e., unguided models. All our analysis Fig. 3: (a) Generic model performance, (b) cough distribution
presented in this manuscript (Section III) is based on these
optimal models and their optimal parameter values.
guided models achieve lower than 0.75 accuracies when testing
III. A NALYSIS on COVID-19 and COPD coughs.
In this section, we compare the performance of three In general, we observe that unguided models outperform the
modeling approaches (discussed in Section II-A), while de- semi-guided models, which is probably because the “unary”
tecting coughs obtained from the three datasets, including two unguided models are biased to class-1, i.e., cough sounds,
respiratory disease-cough datasets in the presence or absence which is the only class that is used to train the models.
of background noises. We primarily use accuracy (ACC) to Similarly, in general, the best semi-guided model (i.e., “Binary
compare models while testing on different types of coughs. RF (15)”) performs worse than the best guided model, but
We use “Binary GB 5 (Animal)”, “Binary RF 5 (Non- better than the unguided (exception COPD coughs).
cough)”, and “Binary RF 5 (Non-living)” to refer to the best Now, we present our analysis using noise-augmented cough
guided models developed for the three types of environments events to determine the robustness of our models in the
(i.e., environments with the three categories of background presence of noises. In Figure 2, we present performance of
sounds; animal, non-cough human, and non-living being) using different models while testing on three categories of back-
binary classifiers with five types of sound from one of the three ground noises at four separate SNR values. In the figure, we
categories of environments. Similarly, we use “Binary RF 15” observe a performance drop with the addition of noise with
and “Unary” to indicate the best semi-guided and unguided increased noise levels, i.e., lowered SNR values. Considering
models developed using binary classifier with 15 types of all four SNR values, we find average test accuracy of more
background sounds and unary classifier with no background than 0.8 for all noisy coughs, except COPD, for which the
sounds, i.e., only cough sounds are used to develop models. average test accuracy is 0.72. As before, we observe that
We first present our analysis using cough events without guided models trained with the non-living being and non-
noise augmentation. In Figure 1, we present a detailed anal- cough human sounds perform the best and worst, respectively.
ysis of different models (trained from ESC-50 coughs) when At SNR value 0.1, the best model outperforms the worst model
testing on different types of coughs. In general, we observe by 24%.
that the binary guided models trained with non-living being To better understand the performance drop of different
background sounds as class-0 (i.e., “Binary RF 5 (Non-living)” cough models while testing on various types of coughs in
models) achieve an average accuracy of more than 0.9, which the absence or presence of noises at different SNR values,
is higher than the accuracy of other two guided models trained we present the median test accuracy drop (%) in Figure 3a in
with animal or non-cough human sounds as class-0. These two an aggregated form. In this analysis, we consider the median
accuracy values obtained while testing models on ESC-50 [7] S. Vhaduri, “Nocturnal cough and snore detection using smartphones in
coughs using 9:1 train-test splits as our baseline performance. presence of multiple background-noises,” in ACM COMPASS, 2020.
[8] S. Vhaduri, T. Van et al., “Nocturnal cough and snore detection in noisy
In the figure, we observe that COPD coughs suffer from a environments using smartphone-microphones,” in IEEE ICHI, 2019.
higher drop in accuracy compared to other coughs. With the [9] S. Vhaduri and C. Poellabauer, “Design and Implementation of a
addition of noises, we observe a higher drop in accuracy Remotely Configurable and Manageable Well-being Study,” in EAI
SWIT-Health, 2015.
across all datasets, including the ESC-50 dataset. Additionally, [10] S. Vhaduri et al., “Human factors in the design of longitudinal
performance drop increases with the increase of noise values. smartphone-based wellness surveys,” in IEEE ICHI, 2016.
Next, in Figure 3b, we use the t-SNE plot to investigate data [11] S. Vhaduri and C. Poellabauer, “Design factors of longitudinal
smartphone-based health surveys,” Journal of Healthcare Informatics
distribution of coughs to determine the performance variation Research, vol. 1, no. 1, pp. 52–91, 2017.
across different types of coughs. In the figure, we observe that [12] M. T. Al Amin, S. Barua, S. Vhaduri, and A. Rahman, “Load aware
cough samples obtained from SNP (red dots) are completely broadcast in mobile ad hoc networks,” in IEEE ICC, 2009.
[13] C.-Y. Chen et al., “Estimating sleep duration from temporal factors,
mixed with the ESC-50 cough samples (yellow dots). Thereby, daily activities, and smartphone use,” in IEEE COMPSAC, 2020.
models trained from ESC-50 coughs can well identify similar [14] S. Vhaduri and C. Poellabauer, “Impact of different pre-sleep phone use
test cough samples obtained from the SNP dataset. However, patterns on sleep quality,” in IEEE BSN, 2018.
[15] S. Vhaduri, S. V. Dibbo, and Y. Kim, “Deriving College Students’
cough samples obtained from the COVID-19 (blue dots) and Phone Call Patterns to Improve Student Life,” IEEE Access, DOI:
COPD (green dots) datasets form two visible clusters away 10.1109/ACCESS.2021.3093493, 2021.
from the ESC-50 cough samples. Therefore, the ESC-50 cough [16] S. V. Dibbo, Y. Kim et al., “Visualizing college students’ geo-temporal
context-varying significant phone call patterns,” in IEEE ICHI, 2021.
data-driven models struggle to identify the cough samples from [17] S. Vhaduri, S. V. Dibbo, C.-Y. Chen, and C. Poellabauer, “Predicting
those two clusters. Since the COPD cluster is larger (i.e., more next call duration: A future direction to promote mental health in the
samples) than the COVID-19 cluster, ESC-50 cough-driven age of lockdown,” in IEEE COMPSAC, 2021.
[18] S. Vhaduri, A. Munch, and C. Poellabauer, “Assessing health trends of
models perform worse when testing on COPD. college students using smartphones,” in IEEE HI-POCT, 2016.
[19] Y. Kim, S. Vhaduri et al., “Understanding College Students’ Phone
IV. D ISCUSSION AND F UTURE W ORK Call Behaviors Towards a Sustainable Mobile Health and Wellbeing
Solution,” in International Conference on Systems Engineering, 2020.
To the best of our knowledge, this is the first work that [20] W. Cheung et al., “Continuous Authentication of Wearable Device Users
attempts to test three types of generic cough models when from Heart Rate, Gait, and Breathing Data,” in IEEE BioRob, 2020.
detecting different types of coughs, including COVID-19 and [21] S. V. Dibbo, W. Cheung, and S. Vhaduri, “On-Phone CNN Model-based
Implicit Authentication to Secure IoT Wearables,” in EAI SaSeIoT, 2021.
COPD in the presence or absence of 15 types of background [22] S. Vhaduri and C. Poellabauer, “Wearable device user authentication
noises from smartphone-microphone audio recordings. While using physiological and behavioral metrics,” in IEEE PIMRC, 2017.
models developed from the public ESC-50 dataset can achieve [23] W. Cheung and S. Vhaduri, “Context-Dependent Implicit Authentication
for Wearable Device Users,” in IEEE PIMRC, 2020.
a reasonable performance to detect different regular and res- [24] S. Vhaduri and C. Poellabauer, “Biometric-based wearable user authen-
piratory coughs, such as COVID-19, disease-specific models tication during sedentary and non-sedentary periods,” in International
can improve the detection performance. However, this will Workshop on Security and Privacy for the Internet-of-Things, 2018.
[25] A. Muratyan et al., “Opportunistic Multi-Modal User Authentication for
also require disease-specific data collection, which is relatively Health-Tracking IoT Wearables,” in EAI SaSeIoT, 2021.
easy for known diseases, such as COPD, but challenging for [26] S. Vhaduri and C. Poellabauer, “Multi-modal biometric-based implicit
cases like the sudden outbreak of a pandemic like COVID-19. authentication of wearable device users,” IEEE Transactions on Infor-
mation Forensics and Security, vol. 14, no. 12, pp. 3116–3125, 2019.
Thereby, the generic models can still be utilized for an initial [27] ——, “Opportunistic discovery of personal places using multi-source
screening. Though this work is based on a limited number of sensor data,” IEEE Transactions on Big Data, vol. 7, no. 2, pp. 383–
cough and non-cough events/samples, we apply three types 396, 2021.
[28] ——, “Hierarchical cooperative discovery of personal places from
of augmentations to increase the data volume and create real- location traces,” IEEE Transactions on Mobile Computing, vol. 17, no. 8,
world effects. Additionally, we perform 10 random 9:1 train- pp. 1865–1878, 2018.
test splits while developing and validating our models to avoid [29] S. Vhaduri, A. Striegel et al., “Discovering places of interest using
sensor data from smartphones and wearables,” in IEEE UIC, 2017.
overfitting and data sparsity. With the availability of large [30] S. Vhaduri and C. Poellabauer, “Opportunistic discovery of personal
datasets, more advanced modeling techniques, such as deep places using smartphone and fitness tracker data,” in IEEE ICHI, 2018.
learning models can be developed to improve performance. [31] ——, “Cooperative discovery of personal places from location traces,”
in ICCCN, 2016.
[32] L. Kvapilova, V. Boza et al., “Continuous sound collection using smart-
R EFERENCES phones and machine learning to measure cough,” Digital biomarkers,
[1] M. G. Crooks, A. Den Brinker et al., “Continuous cough monitoring vol. 3, no. 3, pp. 166–175, 2019.
using ambient sound recording during convalescence from a copd [33] M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive privacy
exacerbation,” Lung, vol. 195, no. 3, pp. 289–294, 2017. analysis of deep learning: Passive and active white-box inference attacks
[2] C. Thorpe et al., “Towards a quantitative description of asthmatic cough against centralized and federated learning,” in IEEE SP, 2019.
sounds,” European Respiratory Journal, vol. 5, no. 6, pp. 685–692, 1992. [34] J. Monge-Álvarez, C. Hoyos-Barceló et al., “Robust detection of audio-
[3] A. Proaño, M. A. Bravard et al., “Dynamics of cough frequency in adults cough events using local hu moments,” IEEE journal of biomedical and
undergoing treatment for pulmonary tuberculosis,” Clinical infectious health informatics, vol. 23, no. 1, pp. 184–196, 2019.
diseases, vol. 64, no. 9, pp. 1174–1181, 2017. [35] “Esc-50,” Available: https://bit.ly/32bLeNc, Accessed: March 2021.
[4] “Coronavirus,” Available: https://bit.ly/3g4zd4l, Accessed: March 2021. [36] “FreeSound,” Available: https://freesound.org/, Accessed: March 2021.
[5] C. B. Simpson et al., “Chronic cough: state-of-the-art review,” Otolaryn- [37] “Us-8k,” Available: https://bit.ly/2uHhhYh, Accessed: March 2021.
gology—Head and Neck Surgery, vol. 21, pp. 693–700, 2006. [38] “SoundSnap,” Available: https://bit.ly/3ddHUY1, Access: March 2021.
[6] springwise, “A PLATFORM FOR DETECTING A COVID-19 [39] “Coswara,” Available: https://bit.ly/3s9Ab1C, Accessed: March 2021.
COUGH,” Available: http://bit.ly/3cA8hYw, Accessed: April 2021. [40] “Audacity,” Available: https://bit.ly/3dctIP4, Accessed: March 2021.

You might also like