You are on page 1of 10

ANALYSIS OF FATIGUE THROUGH

SPEECH
Siddharth Shekhar; Ritik Gautam; Deepjyoti Ray; Bheemraj Chhatria

Abstract– This report describes a general framework for detecting fatigue through based
articulation and speech quality-related speech characteristics. The advantages of this
real-time measurement approach are that obtaining speech data is non-intrusive, free from
sensor application, and doesn’t affect critical activities during work. Here we have taken the
samples from subjects in 6 sessions (one rest state and 5 fatigue states) and in order to induce
fatigue we use the physical exercise method, the subject is made up and down stairs for 3
continuous minutes and immediately after heart rate is measured and we record the subject’s
voice and data(Stroop Test) and analyzed various components of speech of the collected data
and infer conclusions about the critical fatigue level of the subject.

Index Terms– fatigue, sleepiness, physical activity, exercise, speech analysis

I. INTRODUCTION
People frequently experience fatigue, which has an impact on their everyday lives and can be quite
detrimental to their health and productivity. The interest in creating techniques for accurately
assessing weariness, notably through voice analysis, has grown in recent years.
Physical and mental health as well as productivity are all impacted by fatigue.It raises the possibility
of mishaps, mistakes, and injuries, which can result in despair, anxiety, and other medical issues. In
high-risk professions like pilots and healthcare workers, it's critical to have effective methods for
identifying and addressing fatigue.
Self-report assessments, performance-based measurements, biological indicators, and speech analysis
can all be used to detect fatigue. While performance-based measures evaluate mental or physical
performance, self-report measures use questionnaires and rating scales. Speech analysis focuses on
changes in speech qualities, while biological measures look at physiological signs of weariness. An
assessment of fatigue levels can be made more precise by combining various techniques.
II. LITERATURE REVIEW
There are various studies going on for the analysis of fatigue using speech to find a non-intrusive way
to check the fatigue level. Several studies have proposed that acoustic analysis of speech might be a
practicable method that is both non-invasive and does not interfere with the work task.

There was a study that was aimed at verifying if there were variations in the speech and phonatory
aspects of a pilot who complained of fatigue and sleepiness prior to an accident.The data of the speech
sample was taken a day prior and just before the accident and was analyzed with the help of PRAAT
version 5.3.85 (University of Amsterdam, Netherlands), a free package widely used by speech
scientists. The results showed significant variation between the speech produced in baseline
conditions and just before the accident.[1]

Warfighters and civilian pilots are particularly vulnerable to weariness due to the peculiarities of the
military and aviation environment. There was also a study that was done to describe an approach to
the development of a voice-based fatigue prediction system.Here it was found no single voice
characteristic demonstrates a consistent and reliable change as the speakers become fatigued. Rather
than study, any one specific voice parameter approach was taken to observe a more holistic
representation of the speech signal. The sample was taken from subjects who underwent a night of
sleep deprivation and compared with the sample under normal conditions and analysis was done
Variations in two samples were used for fatigue analysis[2]

More than half of traffic accidents occurred because of fatigue driving. To lessen the accidents, fatigue
test is of great significance. There was a study done that presents feature-based parameters and the
probabilistic neural network (PNN) speech recognition model to detect fatigue.Here the model was
created in which people just need to speak by the microphone into a computer, and then by the
changes in sound wave calculate the values. If the value is to exceed the threshold, rest is needed to
recuperate strength.[3]

There was a study on how vocal fatigue symptoms are related to acoustic factors that reflected the
style of voice production and the impacts of vocal loading. A prolonged phonation was recorded at
habitual speaking pitch and loudness before and after a working day. Average fundamental
frequency(F0), sound pressure level(SPL), and phonation type reflecting alpha ratio were all
calculated for the data. After a long day at work, above values increased, jitter and shimmer values
decreased and throat fatigue increased. The acoustic characteristics' average levels did not correspond
to the symptoms. A higher mean F0 was connected with throat fatigue[4].

The paper “Estimating Changes in Speech Metrics Indicative of Fatigue Levels” examines the
association between speech patterns and fatigue levels.Using machine learning, changes in speech
metrics were analyzed, revealing that variations in speech metrics such as pauses, errors, and speech
tempo indicate increasing levels of fatigue[5].

Promising results have been shown in using speech analysis to estimate fatigue levels. By analyzing
speech features, such as speech rate and pause duration, researchers identified significant correlations
with subjective ratings of fatigue.However, larger studies in diverse populations and practical
applications would be needed to validate this method for monitoring fatigue levels in workers[6].
III. MOTIVATION
By referring to the aforementioned papers, we have come to realize that:

•The current methods like EEG are dependable but the setup is complex.

•It is important to find measures of fatigue which are


•Non intrusive
•Non invasive

•There is a need to develop a method that can analyse this data in real-time.

IV. OBJECTIVES
● To examine the changes in acoustic features of speech with respect to an increase in
fatigue.

● Correspondence with the subjective data recorded with the help of NASA TLX
Questionnaire.

● To use speech as a non-intrusive, non-invasive method of fatigue analysis.

● Develop a predictive model for fatigue analysis in real time.

V.PRODUCTION AND SPEECH ANALYSIS


The whole objective of this project depends on the method of speech analysis that we plan to
implement. Before that, we need to understand how speech is produced

A. PRODUCTION OF SPEECH
The vocal tract is the main path through which speech is produced. It basically acts as a filter
● Initially we have the glottal pulse which is a noisy high-pitched signal which is generated by
the vocal cords.
● This signal is passed through the vocal tract which acts as a filter for creating the speech
signal
Speech production

The frequency response provided by the vocal tract is carrying information about the timbre of the
sound; which denotes the actual phonemes that we produce.

B. MFCC ANALYSIS

Mel Frequency Cepstral Coefficients analysis is a method to allocate discrete coefficients to different
characteristics/parameters of speech.

−1
𝐶(𝑥(𝑡)) = 𝐹 [𝑙𝑜𝑔{𝐹(𝑥(𝑡)}]
The mathematical expression of cepstrum
-
● 𝐹[𝑥(𝑡)] is Spectrum
● 𝑙𝑜𝑔 [𝐹{𝑥(𝑡)}] is log spectrum
−1
● 𝐹 [𝑙𝑜𝑔{𝐹(𝑥(𝑡))}] is Cepstrum
● 𝑥(𝑡) is the time domain signal
● To get the spectrum, we take its Fourier transform
● We take the log of this value and the result is a log of the spectrum
● Finally we take the inverse Fourier transform of the previous value, which gives us the
Cepstrum.
Hence we also say that Cepstrum is a spectrum of a spectrum.

Speech can be formalized by the given expression:


𝑙𝑜𝑔{𝑋(𝑡)}=[log{E(t)}+log{H(t)}]

𝑙𝑜𝑔{𝑋(𝑡)} is the speech signal


𝑙𝑜𝑔{𝐸(𝑡)} is the glottal pulse
𝑙𝑜𝑔{𝐻(𝑡)}is the vocal tract frequency response
Here we can see that the speech signal is actually a product of the
glottal pulse signal and vocal tract frequency signal

We are not that interested in the glottal pulse signal. Hence our goal here is to filter it out. Major
information about the speech signal, like formants and phonemes, is carried by the vocal tract
frequency response.

E(t) is eliminated with the help of a liftering process

To calculate the MFCCs, we use Mel scaling and direct cosine transform(we don't use Fourier
transform here because it gives complex results. DCT gives real values which are sufficient for the
purpose).

.
Process to calculate the MFCCs

We generally take 12-13 coefficients, the initial coefficients carry the most value/information(eg.
formants, spectral envelope). We use delta and delta square values as well
VI. METHODOLOGY
This project involves the detection and tracking of fatigue through speech analysis. This is a
subject-based procedure; which means fatigue needs to be induced in the subjects. Here we have used
physical exercise as a method. The procedure is described as follows:

EXPERIMENT DESIGN

The experiment setup was structured in a way such that no time is lost in between taking readings.
● To analyze fatigue we have taken the speech sample of 10 subjects and among them 4
subjects are females and 6 are men. Their ages vary between 21-28.
● Before the test, the subjects are made not to take tea or coffee 4-5 hours before.
● Subjects sat in one place for all tests, followed by an exercise session.
● Tripods used for thermal imaging to capture the same frame for each subject.
● Speech recorder and Stroop test prepared before subject returned.
● Experiments conducted during off times to avoid hindrances.
● Subjects rested while wearing a smartwatch to detect heart rate in real-time.
● First set of readings recorded after 10-20 minutes of rest when heart rate reaches below 80,
including speech, thermal imaging, and Stroop test.
● Subjects ran upstairs for exercise, starting with 2 rounds and increasing gradually.
● Heart rate observed immediately after each exercise session.
● Thermal images captured during speech and stroop tests.
● Sessions lasted around 10 minutes, and recordings were imported after each subject's
experiment.

The image shows thermal images being captured during speech recording.
VII. READINGS AND OBSERVATIONS
We have included the speech analysis data of a sample participant. The evaluation and correlation are
also done for the same participant.
A. SPEECH ANALYSIS

We have used Python 3.0 in Jupyter Notebook environment to visualize MFCC values along
with delta MFCC and delta delta MFCCs.

Code used for MFCC visualization

Code used for delta and delta delta MFCC visualization

Visualised MFCCs

delta MFCCs delta delta MFCCs

B. SPEECH EVALUATION AND CORRELATION

We have used Python 3.0 in Jupyter Notebook environment to evaluate MFCC values and
correlate these with the subjective data from TLX Questionnaire using a grouped bar graph.
The data we are using here if of only one subject, as that would be enough to present our idea
in a clear way.
Code for MFCC evaluation

Evaluated MFCCs

Code for formulation of grouped bar graphs

We have used grouped bar graph method to visualize the correlation. As we can see, with a decrease
in the MFCC parameter, the performance and frustration value increases.
SUBJECT 1

SUBJECT 2

SUBJECT 3
Correspondance data

VIII. CONCLUSION
In this project, we have experimented with different subjects for speech and subjective assessment to
determine the level of fatigue induced in the body. After proper analysis of the experiment results, we
can correlate the dependency of the speech phenomes with the level of fatigue induced in a body.

● In our experiment, we kept track of heart rate and breadth rate and observed that variation of
these parameters causes a change in certain phenomes of our speech and the way the subjects
speak.
● There is a constant elevation of heart rate after each fatigue induced.
● This in turn affected the stroop test and the subject took a much longer time to finish the test
after each stage.
● Change in the speech was observed as the subject fumbled after each level of fatigue induced.
● We were also able to observe the change in MFCC values with respect to the variation in
NASA TLX questionnaire data.
● A correlation was observed with various parameters. And it was visualized using grouped bar
graph.

Thus it is concluded that the level of fatigue induced affects our cognition and our speech. So after a
certain point of fatigue that is induced in the body, with proper analysis, the maximum fatigue a body
can take can be determined. We use these values to develop a metric.
IX. FUTURE SCOPE
The future scope of analyzing fatigue through speech includes developing real-time monitoring
systems, integration into existing safety protocols, and potential use in healthcare settings. Continued
research in this area could lead to innovative solutions for preventing fatigue-related accidents and
improving overall well-being.

X. REFERENCES
1. ” Speech Analysis for Fatigue and Sleepiness Detection of a Pilot” by Carla Aparecida de
Vasconcelos; Maurílio Nunes Vieira; Göran Kecklund; Hani Camille Yehia

2. ”Detecting Fatigue From Voice Using Speech Recognition” by H.P. Greeley, E. Friets, and
J.P. Wilson S. Raghavan and J. Picone J. Berg

3. Xiao-Jun Zhang, Ji-Hua Gu, and Zhi Tao, "Research of detecting fatigue from a speech by
PNN," 2010 International Conference on Information, Networking and Automation

4. Acoustic Measures and Self-reports of Vocal Fatigue by Female Teachers. Anne-Maria


Laukkanen, Irma Ilomäki , Kirsti Leppänen , Erkki Vilkman .Department of Speech
Communication and Voice Research, University of Tampere, Tampere, Finland.Department of
Otolaryngology and Phoniatrics, Helsinki University Hospital, Helsinki, Finland.Accepted on
3 October 2006.

5. ESTIMATING CHANGES IN SPEECH METRICS INDICATIVE OF FATIGUE LEVELS


Parham Shahidi, Reza A. Soltan, Steve C. Southward, Mehdi Ahmadian Center for Vehicle
Systems & Safety, Department of Mechanical Engineering, Virginia Tech, MC-0901
Blacksburg, VA 24061, USA.

6. Krajewski, J., Trutschel, U., Golz, M., Sommer, D. & Edwards, D. (2009) “Estimating
Fatigue from Predetermined Speech Samples Transmitted by Operator
Communication Systems”, Driving Assessment Conference. 5(2009).

You might also like