VOice Stress Analysis

University of Sheffield Department of Computer Science
Voice Stress Analysis: Detection of Deception
Xianfeng Liu
MSc in Advanced Computer Science August 2005 Supervisor: Prof. Roger K. Moore
Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield
Plagiarism Declaration
Plagiarism Declaration
All sentences or passages quoted in this dissertation from other people's work have been specifically acknowledged by clear cross-referencing to author, work and page(s). Any illustrations which are not the work of the author of this dissertation have been used with the explicit permission of the originator and are specifically acknowledged. I understand that failure to do these amounts to plagiarism and will be considered grounds for failure in this dissertation and the degree examination as a whole.
Name: Xianfeng LIU Signature: Date: 31/Aug/ 2004
Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield -I-
Abstract
Abstract
Voice Stress Analysis (VSA) technology has been introduced to the lie detection field for several decades. It originated from the concept of micro muscle tremors (MMT) which was considered to be a source of deception detection. However, the effectiveness of VSA devices for deception detecting is still questionable. In this project, jitter (micro tremor) and pitch were set as two kinds of features to detect deceptions among several subjects. A deception encouraged card game was designed for build up a deceptive/non-deceptive database. In order to achieve open-set performance, the entire data set was first divided into 22 equal-size groups with both deceptive and non-deceptive speech. Each group contains the same number of non-deceptive and deceptive data. The result has shown that: pitch can detect deception but it is speaker dependent. Detections probabilities on jitter in both speaker dependent and independent methods are not significant higher than chance level.
Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - II -
Acknowledgements
Acknowledgements
The completion of this dissertation was made possible through the support and cooperation given by my Supervisor Prof. Roger Moore. I would like to express my sincere gratitude to him for his enlightening guidance and encouragement throughout the academic year. I would also thank Dr Jon Barker who gave me a lot of help for the voice data collection. Lastly, I want to thank my friends, for rendering constant support through the year.
Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - III -
Table of Contents
Table of Contents
Plagiarism Declaration.....................................................................................................I Abstract .......................................................................................................................... II Acknowledgements....................................................................................................... III Introduction............................................................................................................. 1 Literature survey ..................................................................................................... 3 2.1 Overview of stress....................................................................................... 3 2.2 Voice Stress Analysis (VSA)....................................................................... 4 2.3 Deception Detection (DD) .......................................................................... 7 2.4 Bayesian Hypothesis Testing ...................................................................... 9 Requirements and Analysis................................................................................... 12 3.1 Introduction............................................................................................... 12 3.2 Data Collection ......................................................................................... 12 3.3 Analysis Software ..................................................................................... 13 Design and Implementation.................................................................................. 18 4.1 Data Collection ......................................................................................... 18 4.2 Pitch Extraction......................................................................................... 21 4.3 Jitter Extraction......................................................................................... 23 4.4 Classification and Detection ..................................................................... 26 Results and Discussion ......................................................................................... 29 5.1 Probability Density Functions................................................................... 29 5.2 Final Detection probability ....................................................................... 33 Conclusion and Future Expectation .................................................................... 36 References .................................................................................................................... 38 Appendices.................................................................................................................... 40 Appendices I: ......................................................................................................... 40 Deception Encouraged Card Game Instruction .............................................. 40 Object of the game:......................................................................................... 40 Equipment:...................................................................................................... 40 Definitions: ..................................................................................................... 40 Play: ................................................................................................................ 41 The Winner: .................................................................................................... 43 Additional Rules: ............................................................................................ 43 Appendices II......................................................................................................... 44 Extracted Pitch Value for Non-deceptive Voices ............................................ 44 Appendices III ....................................................................................................... 45 Extracted Pitch Value for Deceptive Voices.................................................... 45 Appendices IV ....................................................................................................... 46 Extracted Jitter Value for Non-deceptive Voices ............................................ 46 Appendices V ......................................................................................................... 47 Extracted Jitter Value for Deceptive Voices.................................................... 47 Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - IV -
List of Tables
List of Figures
Fig 2.1 Waveform used to measure stress based on energy in the waveform (Little or no stress) ............................................................................................................................................ 5 Fig 2.2 Waveform used to measure stress based on energy in the waveform (Medium Stress).............................................................................................................. 5 Fig 2.3 Waveform used to measure stress based on energy in the waveform (Hard Stress)..6 Fig 2.4 Truthful/Deceptive options with inconclusive decisions excluded.....8 Fig 2.1 A real deception detection case solved by CVSA...8 Fig 3.1Voice Data Collection13 Fig 3.2 Distribution Fitting Tool...14 Fig 3.3 Labeling and segmentation by Praat.15 Fig 3.4 TextGrid file format..16 Fig 3.5 Use Transcriber to segment and label....17 Fig 4.1 Knock-out game table...18 Fig 4.2 Card game snap.19 Fig 4.3 Experiment Setup and workflow...20 Fig 4.4 PitchTier23 Fig 4.5 Pitch extraction script for Praat.23 Fig 4.6 Pulses (Jitter).25 Fig 4.7 Jitter extraction script for Praat.26 Fig 5.1 Speaker independent pitch pdfs.29 Fig 5.2 Speaker independent jitter pdfs.30 Fig 5.3 PDF of Pitch..31 Fig 5.4 PDF of Jitter..32 Fig 5.5 Final detection probability of pitch and jitter35
Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield -V-
List of Tables
List of Tables
Tab 2.1 Cost comparisons between VSA and Polygraph....5 Tab 4.1 mean and variance value of Pitch by speaker.......27 Tab 4.2 mean and variance value of Jitter by speaker...27 Tab 5.1 Final detection probabilities.33
Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - VI -
Chapter 1 Introduction
Chapter 1
Introduction
Voice Stress Analysis (VSA) was researched and developed in the past few decades by private individuals and the U.S Army. by private individuals and the U.S Army. In the past few decades, this theory has been introduced to the lie detection field. More recently, this lie detection theory has become more and more popular. For instance, in the UK, voice stress analysis has been used in commercial products and got much positive feedback. Moreover, policemen, some insurance companies and even telecom companies are using this technology on incoming telephone calls to detect frauds [1] (News 2003). Also, a number of shops are selling 'love detectors' based on some form of voice analysis [2]. They also claim that the voice stress analysis technology is effective in deception detection and distinguishing different human emotions. Though some company has started to employ the new theory for commercial, it is still doubted by many people about the correctness of the method. Researchers claimed that high levels of stress do not necessarily correlate with deception, the relationship between them still need to be proved. In addition, even if the correlation between the two can be established, an innocent people will also get stressed under a tough situation. One theory behind voice stress analysis is that there are inaudible vibrations known as "micro tremors" in the voice. It is claimed that the micro tremors change when a person is telling a lie [3]. On the other hand, audible features of human voice are also considered. In this project, fundamental frequency (Pitch) and the micro-tremor (Jitter) were examined from several subjects voice recordings. The purpose is to study the possibility of detection of deception through humans stressed voice especially stressed voice. Experiment has been designed to analyze the probability of the detection correctness by using these two features separately. 8 subjects were invited to play a so-called deception encourage card game. Their performance has been recorded. In the analysis stage, Bayesian hypothesis test [4] was employed for deception classification. The main objective of the project is to find out whether voice-based analysis methods are able to detect deceptions. Fundamental frequency and micro tremor were separately extracted from a database and same statistical analysis algorithm was used on both. Regarding the personal variance, experiments were based on both speaker dependent and speaker independent. The final result of detection probability would show the effeteness of the VSA theory based method and also the effect of speaker variation. In order to achieve open-set performance, the entire data set was first divided into 22 equal-size groups with both deceptive and non-deceptive speech. Each group contains Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield -1-
Chapter 1 Introduction
the same number of non-deceptive and deceptive data. The result has shown that: pitch can detect deception but it is speaker dependent. Detections probabilities on jitter in both speaker dependent and independent methods are not significant higher than chance. This report is organized as follows: Chapter 2 presents a review of some of the different approaches to deception detection and also an overview of some of the work being carried out in this area by other researcher. Chapter 3 discusses the requirements of the project. It also discusses some of the background of the techniques that have been used in this project. Chapter 4 gives an overview of the work that has been done. Finally Chapter 5 gives a summary of the system along with the results and conclusions and future works that can be drawn from Chapter 6.
Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield -2-
Chapter 2 Literature Survey
Chapter 2
Literature survey
2.1 Overview of stress
There have been a number of studies carried out to determine the relationship between what the voice stress analyzers detect and the emotional stress. It would appear to be useful to identify vocal stress and define its nature. The typical common definition of stress considers a mechanical issue: when a stress is applies to a body a corresponding strain is produced [5] However, this definition is not very useful because, even though the stressor itself is defined, the subjects emotional state is unknown. Another definition which was discussed by [5] is like that: Stress is observable variability in certain speech features due to a combination of unconscious response to stressors and/or conscious control. This definition not only emphasizes that the nature of stress as a variability but also refers an unstressed state. However, the problem is: it is unclear how to define a meaningful unstressed reference condition. This definition does not clearly explain which features or combinations of feature variations correlate with different stressors [5]. Another typical common definition of stress is non-specific. Stress is a general arousal or change in physiology. It defined that stress is directly correlated to individual security and emotion. This notion of stress indicates a variation of arousal, thus it is useful but it is not easy to predict the specific outcome [5]. In the ESCA-NATO Workshop on Speech Under Stress, one definition of stress was discussed which proposed to separate the cause of stress and the effect of stress. Stress is an effect on the production of speech (manifested along a range of dimensions), caused by exposure to a stressor According to this definition, voice stress can be defined as a cause and also as an effect. In other words, stress is an effect on human caused by a stressor [6]
It is not easy get a single acceptable definition of stress to satisfy every perspective based on different research domain; the concept of stress is broad and based on specific studies. There are still many other definitions of stress made by other researchers; each of them has its own particular emphasis, which can not be considered as wrong. It is no need to define a unified definition for voice stress.
2.2 Voice Stress Analysis (VSA)

Voice stress analysis originated from the concept that when a person is under stress. In the moment of stress, especially if the speaker is under jeopardy, the body prepares for fight by increasing the readiness of its muscles to spring into action. Changes in the acoustic speech signal due to stress are mainly caused by these stress-reactions. These changes also affect the organs of speech, such as the respiration and muscle tension. Hence, it should be possible to establish whether a person is stressed just by analyzing his/her voice [7]. The terminology of the muscle vibration is micro-muscle tremors (MMT) or micro tremor. The micro tremors occur in the muscles that make up the vocal tract which are transmitted through the speech. It is described as a slight oscillation at several cycles every second which is claimed to be the sole source of detecting if an individual is lying [8]. Much work on stress analysis in real life situations concentrates on communication under dangerous (Jeopardy) conditions. In many of these studies [1] an increase of the fundamental frequency (F0) of the voice in situations of increasing danger is reported. Williams and Stevens [1] reported an increase in F0 range and abrupt fluctuations of F0 contour, with increasing stress. In a Russian study [15] the voices of astronauts are examined and changes in spectral energy distribution (spectral centroid moving to higher frequency) are reported. According to those theories mentioned above, lie detection devices have been invented and deployed. These devices offer several potential advantages over the standard polygraph (See Tab 2.1 below). Firstly, the training time is less than polygraph training, and there are no academic prerequisites to receive that training. Then, the VSA takes little time, averaging about 45 minutes per session. The most convenient thing is that compared with traditional polygraph method there are no sensors placed on the body, only a small microphone clipped to the examinees clothing. Only the input voice is used, the examinee need not even be present during the examination. Moreover, recording from a remote location can be processed with the equipment.
Tab 2.1 Cost comparisons between VSA and Polygraph (Taken from [8]) VSA systems can be broken into two separate categories, energy-based systems and frequency-based systems. The majority of the systems evaluated are based on the detection of the MMT [9]. He did an experiment based on his theory and claimed that when a voice data is processed through a bank of filters, a series of waveforms can be obtained which may represent the stress in the voice. In the case of a non-stress response, the shape of the waveform is looked like a Christmas tree (see Fig 2.1 [9]). As stress increased, the shape turned to be flatter (see Fig 2.2 [9] and Figure 2.3 [9]).The waveform which shows signs of significant stress can be labeled as deceptive. This type of technology is known as energy-based VSA.
Fig 2.1 Waveform used to measure stress based on energy in the waveform (Little or no stress) [Taken from [9]]
Fig 2.2 Waveform used to measure stress based on energy in the waveform (Medium Stress) [Taken from [9]]
Fig 2.2 Waveform used to measure stress based on energy in the waveform (Hard Stress) [Taken from [9]] The other kind of systems is frequency based VSA systems which are claimed to be able to identify changes within frequency bands and the distribution of the frequencies within those bands [9]. In these systems, a continuum of stress can be identified and a comparison of the position of the relevant stress on this continuum will be used to determine whether the answers are deceptive or non-deceptive. The aim of VSA is to analyze the levels of vibrations which are measures of stress. However, stress can also be brought on by a simple thought or memory such as suddenly remember some dangerous things in the future. Due to the difference of environment and situation, different people may have different stress levels, and this level changes from day to day along with peoples mood. It is well accepted that it is possible to decode emotions from speech. The reason is that the changes of emotion may indicate the "perceived jeopardy" or "danger" of statements [10].
2.3 Deception Detection (DD)

In the last 25 years, voice stress analysis technologies have been introduced in the lie detection area which are claimed to be more convenient and accurate compared to the traditional polygraph one. By now police agencies and insurance companies are trying to use them to detect frauds. For instance in the UK, a car insurer (Highway Insurance) which introduced phone lie detectors says a quarter of all vehicle theft claims have been withdrawn since the initiative began [1]. Deception detection aims to determine whether the information is deceptive or non-deceptive. There are mainly two types of approaches to detecting deception: manual approach and automatic approach. Compared with the manual approach, the automatic approach is more efficient and easier to use. However, due to high-stakes of some deception situations, manual DD is still very common among domain experts in the field. DD approaches usually depend on a set of cues that are indicative of deception. Traditional deception research has identified a rich set of cues to deception that have been tested in lab or field environments. Recently, attempts have been made to discover automated cues to differentiate deception from truth. It saves a lot of amount of time because it avoids encoding deception data manually. Humans ability to identify deception is no better than chance [11]. The automatic discovery of cues to deception improved the detection performance to some degree [12]. There was a study based on VSA by NITV [13]. They compared the traditional Polygraph method with two VSA products. Fig 2.1 illustrates that both the Polygraph and VSA examiners were able to accurately discriminate between truthful and deceptive cases. They both got a higher rate than the one of chance. The polygraph examiner achieved an overall accuracy rate of 74%. Concurrently, the VSA users overall accuracy in this case was 84%. VSA is based on the principle of measuring the amount of changes in the parameters associated with the involuntary dissipation of the FM component of the voice. These changes related to psychological stress which induced by fear, anxiety, guilt, or conflict can be useful in the detection of deception [13].
Fig2.3 Truthful/Deceptive options with inconclusive decisions excluded [Taken from [13]] There is another example as the practical application of VSA: a real child abuse case solved by Computer Voice Stress Analyzer (CVSA) was released by National Institute for Truth Verification (NITV) [13]. The victim was a two and a half year old child. The suspects were her mother and her mothers roommate. Both were given CVSA exams and both charts were DI. Fig 2.5 illustrates copies of both charts. By contrasting to Fig 2.1-2.3, it can be concluded that the two suspects became much stressed when they were asking the specific questions (see the ones with red circles). It was reported by that: voice stress analysis should be considered as an investigative aid to preferably be used after investigators have thoroughly collected all relevant evidence of a criminal offense, which in another word is that the VSA should not be used as a device for the purpose of coercion. The truth verification examination should be a part of the investigation, not the investigation. This requires not only the high accuracy of the voice stress analysis device but also high level of professional interrogation skill.
Fig 2.4 A real deception detection case solved by CVSA [Taken from [13].] Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield -8-
On the other side, controversies over the use of voice stress analysis came up along with the presence of this technology. The accuracy of the VSA to detect deception has been evaluated by [14]. One hundred and nine people participated in his experiment and half of which were asked to commit a realistic and engaging mock crime. The other half had little knowledge about the mock theft. The CVSA examiners scored the exams in accordance with NITV procedures. Charts were blind-scored by three other evaluators. Blind CVSA evaluators made correct decisions on 49.8%, while the testing examiners achieved 48.6% accuracy. However, both of these accuracy rates did not exceed chance accuracy 50%. Then [14] concluded in his paper that VSA is not reliable for lie detection because the VSA sensitivity to detect lies was very low. Another study discussed and analyzed the major empirical evidence which was claimed by the voice stress analysis proponents, specifically the statement that voice stress devices are effective in deception detection. In this study, a sub audible micro tremor signal generated by voice stress devices had been extracted from the vocal spectrum. The results showed that the promise of voice stress analysis in the lie detection field was not and may never be a reality. It was stated that the reliable evidence that the VSA devices were useful in detecting deception did exist but there was no correlation with stress [15]. Until now, there are a number of people believe that the voice stress analyzer are not reliable to the detection of deception, and a lot of experiments have been done to prove their arguments. However, most of the laboratory-based evidence by these people was in the game playing situations with low level of jeopardy. The result that the VSA is not reliable also seems to be not reliable. The U.S. Supreme Court, after listening to all of the 'studies' presented on the accuracy of the polygraph (over 70 years development) and VSA, declared that "There is no consensus in the scientific community that polygraph evidence is reliable" [16]. Despite progress has been made, most of attempts have not been successful. The distance from applying them to real-world situations in making critical decisions is still far.
2.4 Bayesian Hypothesis Testing

Bayesian hypothesis testing is less formal than non-Bayesian varieties. In fact, Bayesian researchers typically summarize the posterior distribution without applying a rigid decision process. Since social scientists do not actually make important decisions based on their findings, posterior summaries are more than adequate. If one wanted to apply a formal process, Bayesian decision theory is the way to go because it is possible to get a probability distribution over the parameter space and one can make expected utility calculations based on the costs and benefits of different outcomes. In this project, pair wise (Deceptive/Non-deceptive) classification was considered. The Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield -9-
classifier is similar as Bayesian hypothesis testing. In Bayesian hypothesis method, there are two hypotheses termed H0 and H1. According to this project, under H0, the speech was addressed as non-deceptive and on the other hand, under H1, the speech is deceptive. Putting all these into a feature vector x (x = x1 ......x m , m is the size of the
vector) the following two conditional probability densities p(x|H0)and p(x|H1). (PDF) are estimated [15]. With these PDFs, the likelihood ratio is then defined as:
p ( x | H 1) p ( x | H 0)
Eq 2.1
Whether the input speech is deceptive or non-deceptive is decided by comparing the likelihood ratio with a pre-defined value . If is bigger than , then the input speech is considered as deceptive; otherwise it is classified as non-deceptive. The value of depends on what criterion is used for detection. In the classification system, a criterion should be selected so that the two important probabilities, the false acceptance rate (FAR) and the false rejection rate (FRR), should be as low as possible. For a stress classification system, the value of depend on the equal error (FAR=FRR) rate (EER). In order to form the likelihood ratio in Eq 2.1, the PDFs of both probabilities of deceptive and non-deceptive speech should first be estimated. If it can be assumed that all the components (z| x1 , x2 ....x M ) of the feature vector x are independent and
2 identically distributed Gaussian random variables with mean n and variance n
under non-deceptive conditions, but with a different mean s and variance s2 under deceptive conditions, then the individual feature component PDFs conditioned on neutral (H0) or stressed (H1) speech is as follows, f ( xi | H 0) = ( x )2 exp i 2 n 2 2 n 2 n 1 ( x )2 exp i 2 s 2 s 2 s2 1
Eq 2.2
f ( xi | H 1) =
Eq 2.3
With these PDFs and assuming statistical independence, the overall conditional probabilities p(x|H0) and p(x|H1) can be computed as, Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 10 -
2 p( x | H 0) = 2 n
M 2
1 exp 2 2 n
M 2
(x
i =1 M i =1
2 n ) 2 s )
Eq 2.4
p( x | H 1) = 2 s2
1 exp 2 2 s
(x
Eq 2.5
Substituting Eq 4.9 and 4.10 into Eq 4.6, the likelihood ratio can be computed as,
p ( x | H 1) n = = p(x | H 0) s 1 exp 2 2 n
M 2 ( xi n ) i =1 M
1 2
2 s i =1
(x
2 s )
Eq 2.6
Taking the logarithm of each side, the log likelihood ratio is obtained as follows,
ln = M ln n s
M )2 M )2 ) ) 2 2 + 2 2 + ( n ) 2 2 + ( s ) n s
Eq 2.7
) ) Where the and 2 are the estimated mean and variance of the input sample
feature vector, x, which are defined as,

) 1 M 1 M
= 2 =
)
x
i =1 M i =1
Eq 2.8
) )2
(x
Eq 2.9
The decision of whether the input speech is deceptive or non-deceptive is made by comparing the likelihood (Eq. 2.1 for Gaussian distributed features) or log likelihood ratio (Eq. 2.7) with a pre-defined threshold . For a deception classification system, however, we are only interested in the overall accuracy and have no preference for either FAR or FRR. Therefore, the value of corresponding to equal error (FAR=FRR) rate (EER) is selected. In the experiments performed here, the values of FAR and FRR were calculated as the ratio of the number of falsely accepted vowels to the total number of speech, and the ratio of the number falsely rejected vowels to the total number of vowels, respectively. By changing the threshold value, the value of corresponding to EER can be found Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 11 -
Chapter 3 Requirements and Analysis
Chapter 3
Requirements and Analysis
3.1 Introduction
This project aimed at developing a system for the classification of deceptive and non-deceptive voice data. A review of the techniques has already been presented in the previous section. Apart from the techniques already presented in the preceding section, certain features of voice data should be extracted from a sufficient database. There is no ready-to-use database available for the project; hence, a rational data collection was designed and set up to create the speech database. Around 450 pieces voice (include both deceptive and non-deceptive) data was collected. However, segmentation of these data can only be done manually. In this phase, a segmentation tool called Transcriber was employed for the first segmentation and also labeling (There were two stages of segmentation in this project). All the data were segmented and labeled again by using Praat. Praat can automatically extract segmentations and save them into single data. It can also extract pitch and jitter from a selected voice data. (The script code can be found in the Program Codes). In order to get the detection probability of the classifier, one approach can be first of all manually classify the signal and then classify it using the system and see the probability. The classifier is based on Bayesian Hypothesis Testing theory which has already been described in the above section. In this phase, statistical toolbox of Matlab was employed to statistically analyze the features.
3.2 Data Collection

The objective of this research is to find out whether voice stress analysis (VSA) technologies can detect deception. Getting deceptive/non-deceptive voice data is an essential part. In previous studies, these data were obtained by asking subjects to act as liars. For instance, in Horvath and Heisses studies [17] [18] and many other works, there was a made-up a scenario, subjects who participated in the scenario acted as liars and judgers (criminal and judges). All their audible performances were recorded as the voice data which would be used in their experiment. However, the most important disadvantage of this method is that all the subjects have already known that they were acting and they might not feel jeopardy. Hence, their voice data may not have distinct differences between deceptive and non-deceptive. In this project, to avoid this problem, a card guessing game was designed for voice data collection (see Appendices I). Subjects were encouraged to lie in order to win the game but not to act as a liar. Only through lying, one can beat against his/her opponents and win the final prize (See Fig Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 12 -
3.1).
Fig 3.1 Voice Data Collection
3.3 Analysis Software

There are three software that were used in the project which were MATLAB, Praat and Transcriber. They are both considered due to the features which each possess that would aid the completion of this project. 3.3.2 Matlab This project used MATLAB to implement the feature extraction system and classifiers. The name MATLAB stands for matrix laboratory. It is a high performance language for technical computing. It integrates computation, visualisation and programming in an easy to use environment where the problems and their solution are expressed in familiar mathematical notation. It is usually used for mathematics and computation, algorithm development, modelling, simulation and prototyping, scientific and engineering graphics, application development, including graphical user interface building [19]. MATLAB provides a high performance language for mathematical computation. The main feature of MATLAB is its ease of use when dealing with complex mathematical manipulations and visualization. Unlike other high level programming languages, MATLAB provides an easy language for dealing with vectors and matrix manipulations. The advantage is that each discrete time signal can be represented as a Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 13 -
matrix. In the project, each speaker both non-deceptive and deceptive voice data were loaded as vectors separately. Moreover, it also provides addins in the form of toolboxes that can be used to extend its functionality. The statistical toolboxes were used here in the project for modelling the classifier based on Bayesian hypothesis testing theory. It includes graphical user interfaces (GUIs) and command line tools that make it easy to look at probability distributions, fit them to data, or generate random samples from them. The Distribution Fitting Tool is a GUI (see Fig 3.2) that enables users to learn about a variety of probability distributionsfor example, a probability density function or cumulative distribution function can be graphed and investigated how a distributions parameters affect its position and shape. The Distribution Fitting Tool fits data using 16 predefined probability distributions, a nonparametric (kernel smoothing) estimator, or a custom distribution [Matalab 7.04 help].
Fig 3.2 Distribution Fitting Tool A second GUI provides a random number generator to simulate behavior associated with particular distributions. Random data can be used in testing hypotheses or models under different conditions.
3.3.2 Praat One other software that has been used extensively in this project is Praat. Praat is a program that has been designed to be used in phonetics research in order to analyze, synthesize and manipulate speech. It also has the ability to generate different pictures to support speech analysis. Praat has been developed by the Institute of Phonetic Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 14 -
Science, University of Amsterdam. Apart from the above mentioned features, it also has the ability to perform Signal Labeling, Segmentation, Speech manipulation and a high level scripting language. All the features are available through interactive menus. According to this project, Praat was employed for precisely speech segmentation and feature extraction. Non-deceptive and deceptive speech signal were segmented into many small individual wave files, then, pitch and jitter of each voice file could be automatically extracted. The following figure shows how Praat deal with speech waveform labeling and segmentation.
Fig 3.3Labeling and segmentation by Praat Two annotation tiers were used. Segmentation of deceptive and non-deceptive was saved in each tier. Then corpus can be extracted from each tier. The format of the text file which the annotation information is saved is shown blow (See Fig 3.4)
Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 15 -
Fig 3.4 TextGrid file format 3.3.3 Transcriber Transcriber is open-source software. It is a tool for assisting the manual annotation of speech signals. It provides a graphical user interface (GUI) for segmenting speech recordings, transcribing them, and labeling speech turns. It is useful in the area of speech research. As for this project, the Transcriber will be used to segment and label the recordings of subjects after the card game. Sentences of each subject will be label as T for Truth; L for Lie and R for reply (Accept/Do not Accept ). This software is free and has a very good GUI. (See Fig 3.5) A big advantage of this software, as with some other computer-based transcription systems, is that the transcript is synchronized to the audio file. So, when using Transcriber, any portion of the transcript and immediately be played with the corresponding audio. It is also very easy to search for all sections marked [inaudible] and listen to the corresponding audio.
Fig 3.5 Use Transcriber to segment and label
Chapter 4 Design and Implementation
Chapter 4
Design and Implementation
4.1 Data Collection
Data collection in this project was divided into two major phases. Firstly, a trial collection has been made. 4 subjects were invited to play the game in the departmental recording booth. Two laptops with recording software installed were employed for the real time data storing. There are two objectives to do the trail experiment. One is to test the hardware and software, make sure they work properly. The other objective is to calculate how many non-deceptive/deceptive sentences can be obtained from one game. According to statistical significance, if the non-deceptive and deceptive data can be detected as they have significant differences; there should be at least about 300 pieces of experimental recordings used in the project. Thus, after the trial, the number of cards was added to 22 for each player to make sure sufficient data can be collected from the following formal collection experiment. Second, the formal collection experiment was executed afterwards. In this stage, 8 subjects were involved. They are all from department of computer science. Two of them are research staff of the department and the rest are master students. It is very hard to come up with a test in which the subject is sufficiently involved to create a high stress level speech. Thus, all of them were asked to play the card game as a knock-out competition and the final winner could have the prize (see Fig 4.1).
Fig 4.1 Knock-out game table
Each player was equipped with a headset microphone, and the game will also be filmed by a digital camera. During the game, both of the players have to show their cards in front of the camera before playing them out. The video recording will record all the cards played by players and also their facial expression which might be used for the classification in the later facial experiments. Both voice data and video data of the game and the deceptive voice can be picked out afterwards as following graph (see Fig 4.2).
Fig 4.2 Card game snap According to the instruction of the game (Appendices I), telling truth will always be safe for the players, but three times continuously truth statement found by their opponents will lead the player to loss of that round. To some extends, deception is encouraged or even forced to the players. This additional rule balanced the number of deceptive and non-deceptive performance in the game. According to the theory of significant difference, not only the total number of the data should be sufficient, but also it should not be polarized. The final database contains 336 pieces of speech recording which includes 181 non-deceptive voices and 155 deceptive voices (Corresponding data value can be found in appendices). Moreover, a 50 pounds prize has been funded by the department of computer science which may stimulate subjects perceive jeopardy when they are telling lie. The experiment focused on analyzing features of these data to identify any difference of these two kinds of data (deceptive/non-deceptive). If significant differences do exist, Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 19 -
it then indicates that deception can be detected by using VSA method. The following figure (Fig 4.3) shows the experiment setup and workflow.
Fig 4.3 Experiment Setup and workflow According to the figure above (see Fig 4.3), the following paragraphs demonstrate steps of data collection with in detail: Step 1: Collection method design. A card game (Appendices I) was designed enlightened by a German dice game. The major idea of the game is to make people lie to win the game. Step 2: Hardware facilities setting up. The quality of the recording is vitally important in this project. A noise reduce recording booth (Department of Computer Science, Sheffield University) was employed for recording. Two headphone mic-phones were plugged into two laptops which were recording software pre-installed (WaveCN. V1.08). Moreover, a Canon digital video camera with a tripod was set up in the booth. Each players 22 cards were randomly selected from 2 new packs of squeezers. Then, the player can not guess opponents cards by memorizing played cards. Step 3: Trial data collection. The trial experiment was essential. First, all the devices were checked for the Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 20 -
following formal experiment. Second, voice data from the trial might tell whether adjustment needed. Thirdly, average number of deceptive and non-deceptive speech was calculated which shown that whether more cards are needed to the player to obtain more voice data for statistical analysis. 4 subjects involved in the trial experiment. They played with 11 cards of each other initially, and then 15 cards and finally 22 cards for each player which was used for the following formal experiment. Step 4: Method adjustment & formal collection design According to the number of available subjects and the number of voice data needed, a knock out competition style was considered. There were totally 8 subjects participated and 7 games in the competition (see Fig4.1). Final winner could win 30 pounds prize (50 pounds funded by the department of computer science, 20 used in the trial stage) Step 5: Implementation Set up time for each subject and executed the experiment. Step 6: Data transfer/convert, labeling and segmentation The entire wave files were transferred to one computer and were given normalized names. Then, Transcriber and Praat were employed for voice labeling and segmentation. Moreover, the video data stored tapes were also converted into windows media video (WMV) style by using windows movie maker. Step 7: Features extraction In this project, only two features were used which were pitch and jitter. Those features value was extracted automatically by using Praat. Then, both pitch and jitter values were divided into several open-set by speaker dependent and speaker independent for the following classification. Step 8: Classification and Comparison Bayesian Hypothesis Testing was used for deception classification and getting final detection probability.
4.2 Pitch Extraction

Pitch feature was reported to be able to detect stress [20]. More features will used together because it shows that more features can detect stress better than one. Therefore, jitter will be used as another feature accompany with pitch for data analyzing. In this project, Praat was employed for pitch extraction. Praat uses the autocorrelation method for pitch analysis. The algorithm performs acoustic periodicity detection on the basis of an accurate autocorrelation method [21]. [22] reported that this method is more accurate and noise-resistant then the cepstrum one and also the original Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 21 -
autocorrelation ones. The algorithm he used is that: rx ( ) rxw ( ) / rw ( ) Eq 4.1
To estimate the autocorrelation rx ( ) of the original signal segment, it divides the autocorrelation rxw ( ) of the windowed signal by the autocorrelation rw ( ) of the window. This estimation can easily be seen to be exact for the constant signal x(t) = 1 (without subtracting the mean, of course); for periodic signals, it brings the autocorrelation peaks very near to 1. The first consideration for stress evaluation involves characteristics of fundamental frequency f0, including contours, mean, variability, and distribution. Then, the differences in mean, variance, and distribution of pitch (f0) were also considered. The results are assumed that statistical tests performed sample variables to be Gaussian distributed, so a comparison of f0 distribution contours will be performed.
All the information of fundamental frequency were stored and represented by PitchTier. PitchTier is one of the types of objects in PRAAT. The object represents a time-stamped pitch contour, i.e. it contains a number of (time, pitch) points, without voiced/unvoiced information. For instance, if PitchTier contains two points, namely 150 Hz at a time of 0.5 seconds and 200 Hz at a time of 1.5 seconds, then this is to be interpreted as a pitch contour that is constant at 150 Hz for all times before 0.5 seconds, constant at 200 Hz for all times after 1.5 seconds, and linearly interpolated for all times between 0.5 and 1.5 seconds (i.e. 170 Hz at 0.7 seconds, 210 Hz at 1.1 seconds, and so on). According to this project, the interval was set as 0.01 second which means pitch value was extracted after ever 0.01 second. A sample is shown on Figure Fig4.4.
Fig 4.4 PitchTier Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 22 -
Fig 4.5 Pitch extraction script for Praat A manual script of Praat has been written to automatically do this job and save each wave files pitch information into a single .PitchTier file. Then each files average pitch value can be calculated. Another text file was used to save all the average pitch values (See Fig 4.5).
4.3 Jitter Extraction

Jitter has also been extracted from the voice data. Jitter is the perturbation in the vibration of the vocal chords. It is known as Period-to-period fluctuations in F0. This causes the variation of the fundamental frequency in different cycles.
1 N
i =1
Eq 4.2
Perturbation means a deviation from steadiness [7]. Let i be any cyclic parameter Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 23 -
(amplitude, pitch period, etc.) in the i th cycle of the waveform. Then the steady value of this parameter over a span of N cycles can be estimated from its arithmetic mean (See Eq 4.2). And the zero th order perturbation functions as the arithmetic difference (See Eq 4.3):
Pi 0 = i ,
i = 1,N
Eq 4.3
where the superscript gives the order of the perturbation function. Higher-order perturbation functions can be obtained by alternately taking backward and forward differences of lower order functions. We will consider the first-order perturbation function (See Eq 4.4):
0 Pi1 = Pi 0 Pi 1 = i i 1 , i = 2,N
Eq 4.4
The first order perturbation function can be used to determine the fundamental frequency perturbation if in Equation 4 i is taken to be the fundamental frequency. The fundamental frequency is computed only for the voiced parts of speech. The fundamental frequency perturbation is defined as the average of the absolute values of all these differences normalized to percentage:
jitter =
100 ( N 1)
i =2
i 1
Eq 4.5
F0 perturbations have been found to be different between emotional modes, such as anxiety, fear, anger, etc. [23], [24]. In this project, deceptive (stressed) and non-deceptive (natural) segments of speech will be analyzed. They will be presented in a using the average perturbation contour diagram. If the contours of the non-deceptive and the deceptive speech segments do not overlap to each other, then the result can be made that jitter analysis can be used in the detection. Like the pitch extraction, jitter can also be automatically extracted by Praat. The value of jitter includes five different types which are: local, rap, ppq5 and ddp. The first one jitter (local) was used here. This is the average absolute difference between consecutive periods, divided by the average period. It was given 1.040% as a threshold for pathology. According to the threshold of pathology, a normal speakers jitter value should never exceed this threshold and even much smaller than it. As shown in the Appendices, the jitter values are all around 100 times smaller than 1. All the speakers are normal person without any abnormal phenomenon for their vocal organs, thus the extracted value of jitter can be considered to be usable. The same as the extraction of pitch, extracted values of jitter were stored in a single .txt file (See Fig 4.7).
Fig 4.6 Pulses (Jitter)
Fig 4.7 Jitter extraction script for Praat
4.4 Classification and Detection

It is now turn to the related problem of classification of deceptive and non-deceptive speech. The task is to formulate an algorithm for detection of speech spoken under one particular deceptive style versus non-deceptive speech. Here the two terms, classification and detection which can be used interchangeably since only pair wise classification is considered. There were two processing stages are required for deception detection. In the first stage, acoustical features are extracted from an input speech waveform which has already been mentioned above. The second stage is focused on detection of deceptive speech from non-deceptive using one or more available methods. A variety of methods exist for stress detection which include, but not limit to, detection-theory based methods, methods based on distance measures, neural network classifiers, and statistical modeling based techniques [8]. In this section, Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 26 -
Bayesian hypothesis-testing framework was employed detect deceptive (stressed) versus non-deception (neutral) speech. The Bayesian hypothesis method is a stress detection technique to determine if a given audio data is either neutral speech or stress speech. It can also classify different types of emotional stress [25]. In Bayesian hypothesis method, there are two hypotheses termed H0 and H1. Under H0, the speech is neutral; on the other hand under H1, the speech is stressed. Putting all these into a feature vector x (x =
x 1 ...... x
m
, m is the size of the vector) the
following two conditional probability densities p(x|H0)and p(x|H1) (PDF) are estimated [16]. With these PDFs, the likelihood ratio is then defined as:
p ( x | H 1) p ( x | H 0)
Eq 4.6
. If is bigger than , then the input speech is considered as stressed; otherwise it is classified as neutral. In order to achieve open-set performance, the entire data set was first divided into 22 equal-size groups with both deceptive and non-deceptive speech. Each group contains the same number of non-deceptive and deceptive data. Then, the subsets were divided again by speakers. For instance: subsets 1 to 5 belong to one speaker and 6 to 10 belongs to another which could be set as two different subgroups. It should be noticed that after calculating each speakers mean and variance of both pitch and jitter, big differences can be found individually (See Tab 4.1 and Tab 4.2).
Non-Decption
A.T 147.352 29.9724
J.E 155.9583 97.3767
L.W 153.1072 1489
L.G 116.3107 53.5314
Y.D 166.6665 549.6449
Y.P 134.452 31.3271
Y.T 135.8364 5.8552
Mu Sigma
Deception Mu Sigma
A.T 155.0814 70.5265
J.E 166.6267 1073.1
L.W 168.5697 2034.5
L.G 119.3156 71.3751
Y.D 195.4759 1879.9
Y.P 144.428 82.0586
Y.T 144.1616 49.7894
Tab 4.1 mean and variance value of Pitch by speaker.

Non-deception
A.T 0.0112 9.16E-06
J.E 0.0126 2.14E-05
L.W 0.0152 1.92E-05
L.G 0.0167 2.40E-05
Y.D 0.0119 6.63E-06
Y.P 0.0143 2.21E-05
Y.T 0.0122 1.61E-05
Mu Sigma
Deception Mu Sigma
A.T 0.0119 1.23E-05
J.E 0.0149 9.47E-06
L.W 0.0155 1.50E-05
L.G 0.0158 3.27E-05
Y.D 0.0119 8.11E-06
Y.P 0.0138 8.67E-06
Y.T 0.0137 1.54E-05
Tab 4.2 mean and variance value of Jitter by speaker The above tables indicate that there may be difference between doing the classification with speaker independently and speaker dependently. Based on the point of view, the experiment of speaker dependent and speaker independent will be considered separately. In the speaker dependent experiment, each speakers speech data were divided into three parts. A quarter of them were used as the training data, and then half of the data was for the development. In this development stage, the threshold detection value, could be calculated. The test part was the rest quarter of the speakers data which gave the final detection probability of the speaker. One the other hand, speaker independent does not care about differences between speakers. The experiment based on this used the 22 subsets mentioned above and set aside the training and final test corpus by one speaker at a time. The final detection probability can be found in the following chapter.
Each such group of identical speaker was set aside to train the system and the remaining data (groups) was used to test the system to obtain the overall error rate threshold for Bayesian hypothesis-testing method, and the mean and variance for the distance measure approach. The final error rate is obtained by accumulating all error rates from 22 open-set tests. Then, the training groups were changed to another speaker and the corresponding rest were set as the test data. In each test set (as described above) value was stored as vector which would be used in matlab. Two lengths of vectors were considered here as different categories of test set. The full length of each vector should be the sum of the number of deceptive and non-deceptive speech. Then, half of those data were selected to form new vectors. All these vectors were taken to the Bayesian method to calculate the final correctness rate.
Chapter 5 Result and Discussion
Chapter 5
Results and Discussion
5.1 Probability Density Functions
As mentioned above, in this project, both speaker dependent classification and speaker independent classification were considered. The reason to do this is it was assumed that big differences could be found between those two methods by approximately comparing the value of pitch and jitter. Firstly, the probability density of both speaker dependent and speaker independent were plotted (as described in the following paragraphs): The following figures show the probability density function of pitch of each subject (See Fig 5.1-4). The probability density function (PDF) has a different meaning depending on whether the distribution is discrete or continuous. For discrete distributions, the pdf is the probability of observing a particular outcome. Unlike discrete distributions, the pdf of a continuous distribution at a value is not the probability of observing that value. For continuous distributions the probability of observing any particular value is zero. To get probabilities you must integrate the pdf over an interval of interest. A pdf has two theoretical properties: The pdf is zero or positive for every possible outcome. The integral of a pdf over its entire range of values is one.
Fig 5.1 Speaker independent pitch pdfs Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 29 -
The above graph demonstrates the pitch probability density of all speakers deceptive and non-deceptive voices. The blue line represents the deceptive datas probability density and the red one represents the non-deceptive datas probability density. From the graph it can be found that probability pitch distribution of deceptive voices has litter difference between the non-deceptive one. Only from the graph it is assumed that the deceptive and non-deceptive data are in much common and detection on them is difficult. However, it is only an assumption before the classification. The final experiment result in the following part will approve this hypothesis. Then, it turns to the jitter. According to the following graph (Fig 5.2), difference is clearer compared with the pdf graph of pitch. Nevertheless, the difference is still not so distinct that it can easily tell each kind of data only from their distribution. There is still a big part of their plot is overlapped.
Fig 5.2 Speaker independent jitter pdfs The speaker independent pitch and jitter distribution overlapped a lot which seems not suitable for deception classification. However when it comes to speaker dependent, the result seems quite different. The following figures show each speakers pitch and jitter individually (Fig 5.3).
Fig 5.3 PDF of Pitch The above figures indicate that the probability density in probability density of both deceptive and non-deceptive voice data. From the six graphs above it can be clearly found that except for the two graphs in the middle (Subject 2 and 3), the difference of probability density between deceptive and non-deceptive data are prominent. Take the first figure for instance; the mean is about 145 Hz for non-deceptive voice and 155 Hz for deceptive data. There distribution density are also significant different. The figure of subject 2 also shows big difference between deceptive and non-deceptive. Nevertheless, even in the middle two graphs, difference between the two curves still can be observed. Their variance looks similar however their mean pitch values have about 10 Hzs dispersion. Compared with the speaker independent figures above, it can be claimed that the classification result on speaker dependent and speaker independent may be quite different and the latter one seems more feasible in deception detection. However, results are different in jitter. The following figures (Fig 5.4) show the speaker dependent jitter distribution which is much different from that of pitch.
Fig 5.4 PDF of Jitter In this combined graph, the probability distribution of the two curves largely overlaps. It can be hardly believed that the speaker dependent jitter classification can result a significant detection probability ratio from each speaker. In general, each speaker has different fundamental frequency of their voice, and even the same speaker in one or more games may have various performances. For example, the subjects may feel more and more nervous along with their qualification. It is assumed that the final round of the card game would be the most furious one which the winner will finally win the 30 pounds prize. It is hard to detect deception without considering those differences between speakers. Moreover, jitter which was illustrated in literature review in the above paragraphs was supposed to be able to be used for deception detection. However, the probability density graphs here indicate that jitter may not be feasible for deception detection.
5.2 Final Detection probability

As described above, after compared the probability densities of both deceptive and non-deceptive data, speaker dependent classification seems more qualified for this project. Nevertheless, both speaker dependent and independent classification has been carried out. Values were shown in the following table (see Tab 5.1): Speaker Independent Speaker Dependent Pitch Jitter Pitch Jitter 60.0% 52.7% 87.0% 51.7% 68.8% 28.3% 76.6% 53.9% 70.0% 54.4% 64.4% 31.3% 40.0% 54.0% 75.0% 53.7% 25.0% 64.2% 48.1% 23.1% 53.2% 50.7% 70.2% 40.7% Tab 5.1 Final detection probabilities
Speaker A.T J.E L.W L.G Y.D Average
As the table illustrated, there are two categories of values which are speaker independent detection probability rate and the speaker dependent detection probability rate. It should be noted that in the speaker independent category, the name of the speaker stands for from which speaker the training and test corpuses were selected. And then the development subsets were the reset corpus of that speaker plus all the other speakers corpuses. On the other hand, the probability values in the speaker dependent category were calculated and obtained only within that person. In addition, these values are presented as statistical significance levels but not exact detection ratio. In Statistics, "significant" means probably true (not due to chance), according to this project, these percentage values indicate that how much deception can probably be detected by certain methods. 5.2.1 Speaker Independent From the table, great difference can be found between speaker independent and speaker dependent results. Take pitch for instance, in speaker independent experiment, the highest detection probability is 70%, two are around 60% and two are below 50%. The average detection probability is 53.2% which is only a little bit higher than chance. Detection probability at this level is not convincible that deception can be detected. Though, some values are higher than 50% and even reach 70%, there is a great fluctuation between them which also means speaker independent detection is not quite effective. According the probability density function graph (Fig 5.1) above, it can be concluded that speaker independent detection on pitch is not feasible.
The value of detection with jitter in speaker independent category is different from pitch. There is a minor fluctuation which seems speaker independent detection on jitter has little influence on varieties of speakers. However, the detection probability is very low. It is not significant higher than 50% (the chance level). In summary, the experiment result reflects the opinion in the probability density function figures (Fig 5.1 and Fig 5.2). Speaker independent seems not feasible for deception detection. When detecting with pitch, some results reach high statistical significant levels but some are much lower than chance level (50%). Great fluctuation of the final detection probabilities can be found which do not match the speaker independent method. On the other hand, detecting on jitter seems has little influence on varieties of speakers and it can generates steady probability values. Nevertheless, all those probabilities are much lower than statistical significance level. Therefore, it can be concluded that speaker independent deception detection on pitch and jitter may not be effective. 5.2.1 Speaker Dependent When it comes to speaker dependent detection, the results are quite different. As values shown in Tab 5.1, detection probability on pitch is much higher than chance level. However, the results of jitter are still not significant higher than 50%. According to the results on pitch, different speakers detection probabilities are different. 90% detection probabilities on pitch are higher than 60%. One speakers detection probability is nearly 90%, two are over 70% and one is lower than 50%. It should be also noticed that the average result is 70.2%. From a statistical perspective, this probability is significant higher than chance which should be considered that this method is effective to detect deception. The results on jitter are still not significant high enough to suggest that the detection is effective. Three values are about at the chance level, two are much lower than 50%. The average is only 40%. According to statistical significant, this result can not prove that speaker dependent deception detection on jitter is feasible. Compared with speaker independent detection, speaker dependence seems more proper in deception detection. According to this project, pitch is the feature that can be used in the detection. However, jitter which was claimed to be able to detect deception is not the right feature here in both speaker dependent and independent experiments. The following graph shows the final detection probability value (Fig 5.5).
Detection Probability of Pitch and Jitter

100.0% 90.0% 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0%
Speaker Independent Pitch Speaker Independent Jitter Speaker Dependent Pitch Speaker Dependent Jitter
A.T
J.E
L.W
L.G
Y.D
Average
Fig 5.5 Final detection probability of pitch and jitter
Chapter 6 Conclusion
Chapter 6
Conclusion and Future Expectation
The issues of stress classification and deception detection are becoming increasingly important for law enforcement and military in the field. Previous methods for VSA deception detection have focused on micro tremors in the muscles for voice production. While there is evidence which suggests that muscle control within the speech production system could be influenced by the presence of stress experienced by the speaker. Nevertheless, there is still uncertainty about in what degree and how this change in speech muscle control could manifest itself into micro tremors. In this report, previous studies on speech under stress and deception detection have been considered. Our own results have been got from our own evaluations and experiments by using features of pitch and jitter. All of these findings suggest that when a speaker is telling a lie, she/he may get under stress and their voice characteristics change. Changes in pitch are influenced in different ways by the presence of speaker. In other words, it is quite dependent on speakers. On the other hand, changes in jitter are small and are not influenced by individual speaker. However, as there is the case with speaker control of pitch, a variety of factors could influence the presence or absence of the micro tremor. It is claimed that human can control their muscles during speech production [10]. Thus, it seems unlikely that the measure in micro tremor (jitter) based on the CVSA could be successful in deception detection. Nevertheless, it is not impossible that under extreme levels of stress, the muscle control throughout the speaker will be affected. Then those speeches can be detected as deceptive or other emotion. In this project, both the speaker dependent and independent results show that jitter is not the proper choice. In conclusion of the project, a deceptive/non-deceptive speech database was built by asking subjects participate into a card game. According to statistical significance theory, there are more than 400 corpuses included. Around 300 corpuses were finally selected to be used in the experiment. Two features pitch and jitter were extracted form the database. Then, Bayesian Hypothesis Testing was employed to statistically classify deceptive and non-deceptive corpus. The final experiment results indicate that pitch is the feature that can be used in deception detection. Nevertheless, only speaker dependent detection is effective. Jitter is not statistical significant in the detection in both speaker dependent and independent one. Although these results state fundamental frequency detect deception, this still need further proven. It is a fact that the deceptive stress data plays a vitally important role in detection and classification of deceptive stress voices. However, only 50 pounds prize may not make the subjects feel much jeopardy. From the video tapes it can be founded that some subjects in the earlier Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 36 -
Chapter 6 Conclusion
rounds of the game looks relax. This may directly infects the final result of detection. If a new method can be employed to make subjects feel more jeopardy when they telling lie, the results will be more exact and convincible. Moreover, in this project, deceptive stress versus emotional stress or physical stress was not tested. This is also a very important issue in deception detection. Many liars are under an extreme amount of stress when being interrogated. Do these VSA systems actually differentiate between those types of stress? It is the future extension of the project needs to be proven.
References
References
[1] BBC News (2003). Lie detectors cut car claims. Available: http://news.bbc.co.uk/1/hi/uk/3227849.stm [2] Love Detector. Nemesysco Ltd. Availeble: http://love-detector.com/ [3] Darren Haddad, Sharon Walter, Roy Ratley, Megan Smith. Investigation and Evaluation of Voice Stress Analysis Technology. 2002, The U.S. Department of Justice report (98-LB-VX-A013) [4] F. Botti, A.Alexander, A.Drygajlo,On compensation of mismatched recording conditions in the Bayesian approach for forensic automatic speaker recognition, 2004,Forensic Science International, 146S, S101S106 [5] Iain R. Murray, Chris Baber, Allan South (1996).Towards a definition and working model of stress and its effects on speech. Elsevier Science Publishers B. V. Volume 20 , Issue 1-2, pp 3-12 [6] Harry Hollien, Laura Geison, James W. Hicks, Jr. (1987). Voice stress evaluators and lie detection, Journal of Forensic Science, JFSCA, Vol. 32, No.2, pp.405-418 [7] M. H. Beers and Robert Berkow, The Merck Manual of Diagnosis and Therapy, 1999, 17th Edition, John Wiley & Sons. [8] D. Haddad, et. al, Investigation and Evaluation of Voice Stress Analysis Technology. Final Report for National Institute of Justice, Interagency Agreement 98- LB-R-013. Washington, DC, 2002. NCJRS, NCJ 193832. [9] Clifford S. Hopkins, Daniel S. Benincasa, Roy J. Ratley, John J. Grieco, (2005). Evaluation of voice stress analysis technology. Hawaii International Conference on System Sciences (IEEE). [10] Scherer K.R, Oshinsky J.S. (1977), Cue Utilization in Emotion Attribution from Auditory Stimuli. Motivation and Emotion, Vol.1, pp.331-346 [11] Lina Zhou, Azene Zenebe, Modeling and Handling Uncertainty in Deception Detection. Proceedings of the 38th Hawaii International Conference on System Sciences, 0-7695-2268-8/05/$20.00 (C) 2005 IEEE [12] B. M. DePaulo, J. T. Stone, and G. D. Lassiter, Deceiving and detecting deceit, in The Self and Social Life, B. R. Schlenker, Ed. New York: McGraw-Hill, 1985, pp. 323-370. [13] A real case solved by CVSA (2000/2001). National Institute for Truth Verification, Journal of continuing education. [14] Cestaro, V.L. (1996). A comparison between decision accuracy rates obtained using the polygraph instrument and the Computer Voice Stress Analyzer (CVSA) in the absence of jeopardy. Polygraph, 25(2), 117-127. [15] Horvath, Frank. (2002). Detecting deception: The promise and the realty of voice stress analysis. National Criminal Justice Institute. NCJ Number: 196936. Polygraph Journal: Volume 31, Issue (2). [16] The CVSA vs Polygraph war-Whos winning? (2000/2001). National Institute for Truth Verification. Journal of continuing education. pp: 12 Voice Stress Analysis: Stress and Detection of Deception Department of Computer Science, The University of Sheffield - 38 -
References
[17] L. Zhou and D. Twitchell, An Exploratory Study into Deception Detection in Text-based Computer-Mediated Communication,presented at Proceedings of Thirty-sixth Hawaii International Conference on Systems Sciences(HICSS'03), Hilton Waikoloa Village , Island of Hawaii (Big Island), 2002. [18] B. Walsh (2004), Introduction to Bayesian Analysis. Lecture Notes for EEB 581. [19] Horvath, F., Detecting Deception: The Promise and Reality of Voice Stress Analysis, Journal of Forensic Science, Vol. 27, No.1, Jan. 1980, pp. 340-351 [20] Heisse, J. W., Audio Stress AnalysisA Validation and Reliability Study of the Psychological Stress Evaluator (PSE) in Proceedings of the Carnahan Conference on Crime Countermeasures, Lexington, KY, 1976, pp. 5-18 [21] Duffy, Dean G, Advanced engineering mathematics with MATLAB, 2003, Boca Raton, Fla. : Chapman & Hall/CRC, 2nd ed [22] Darren Haddad, Sharon Walter, Roy Ratley, Megan Smith (2002). Investigation and Evaluation of Voice Stress Analysis Technology. The U.S. Department of Justice report (98-LB-VX-A013) [23] Paul Boersma (1993), Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound, Proceedings 17 (1993), pp 97-110. [24] P. Lieberman and S. B. Michaels. Some aspects of fundamental frequency and envelope amplitude as related to the emotional content of speech. J Acoust Soc Am, 32(7):922927, 1962. [25] C. E. Williams and K. N. Stevens. Emotions and speech: some acoustical correlates. J Acoust Soc Am, 52(4):12381250, 1972.
Appendices
Appendices
Appendices I:
Deception Encouraged Card Game Instruction
Players:
For only two players
Object of the game:

Try to lie to the opponent to run out all cards which player firstly run out of cards will be the winner.
Equipment:
A deck of playing cards consists of 52 cards.
Definitions:
1. PAIRS, TREBLES or FULLS: A pair is made of two cards with the same number showing on their face. A treble is made of three cards with the same number showing on their face. A full is made of full cards with the same number showing on their face Examples: Player A has a pair of 3, which could be 2 "3s". The cards can be in any combination of suits.( hearts, diamonds, clubs and spades) (Fig 2)
Fig 1 RUNS: A run is made of four or more cards numbered in order. Example: Player B can play a run of 4, which could be "3", "4", "5", and 6". The cards can be in any combination of suits. (Fig 2)
2.
Fig2
3.
ALL ONE SUITE: The cards are all the same suit. Example: Player A may play 3 cards of one suit, which could be 3 hearts cards or 3 spades cards etc. (Fig 3)
Appendices
Fig 3 4. DEAD CARDS: If the player believe what his/her opponent said about the played cards, then, the cards should be pushed aside and kept face down by the end of the game. These cards are called Dead Cards.
Play:
One player is chosen to be the dealer at the very beginning of the game, who shuffles the deck and deals out all the cards one by one for each other. All the dealt cards are kept face down in front of each player. After dealing all the cards, each player at that time should have 26 cards (half of the deck) holding in their hands. Then the game starts from the one who does not deal this turn. The starter can play any card and any combination (pairs, one suit, runs) of cards at one time. When she/he plays, the card(s) should be placed in the middle of the two players and faced down so that the other player can not see it (them). Then, to continue to game, the player who played the cards has two options to tell the opponent about his/her played cards, which is the main point of the game: 1. Tell a lie: the player can decides to tell a lie of what he/she has played and wait for his/her opponents reply. Example: Player A has played a runs clubs 3, diamonds 4, hearts 5, and spade 6 Then he can say that: two pairs, two 3s and two 6s. Now Player A is telling a lie because actually he played a runs. (Fig 4) Play Lie
Player A
Fig 4 2. Tell the truth: the player can also decide to tell the truth of what he/she has played and wait for his/her opponents reply. Example: Player A has played a runs clubs 3, diamonds 4, hearts 5, and spade 6, then he say: a runs clubs 3, diamonds 4, hearts 5, and spade 6 Now Player A is telling the truth. (Fig 5) Play Truth
Player A
Fig 5 Now, it turns to the other player to play, this player should give a reply of the previous one said. At this time this player also has two options: 1. Believe what the other player said: the cards are faced down; he/she is not sure whether or
Appendices
not the previous player told the truth. Then he/she can accept what the other said. If he/she believes the other players words, the played cards should be pushed aside (without turned over) and become dead cards. The game keeps going and the role of the player will be changed in the next turn. Example: Player A play four cards face down in front and said: four 8s. Player B believes it and say: I believe. Then these cards become dead cards. (Fig 6)
Dead Cards
Player B
Believe
Push aside
?
Fig 6
2.
Do not believe what the other player said: the player does not believe what the other player said. He/She should first say: I do not believe it. Then turn over these cards the previous played. If he/she is right (the previous player told a lie), all these cards will be returned to the previous player who played them. The game keeps going and the role of the player will be changed in the next turn. Example: Player A played a runs clubs 3, diamonds 4, hearts 5, and spade 6 and said: two pairs, two 7s and two Jacks. Player B says: I do not believe it. Then, Player B turns over these cards and found he/she is right (Player A told a lie). All these 4 cards will be returned to Player A. (Fig 7)
Player B
Do not believe
Check Turn over
Player A Return to Fig 7
Player A lied
On the other hand, if he/she is wrong (the previous player told the truth) then he/she should collect all the cards as a punishment. The game keeps going and the role of the player will be changed in the next turn. Example: Player A played a runs clubs 3, diamonds 4, hearts 5, and spade 6 and said: a runs clubs 3, diamonds 4, hearts 5, and spade 6. Player B says: I do not believe it. Then, Player B turns over these cards and found he/she is wrong (Player A told the truth). All these 4 cards now belong to Player B. (Fig 8)
Player B
Do not believe
Check Turn over Player A told the truth Collect Fig 8
Appendices
The Winner:
When, after several rounds, the player who first runs out of cards is declared the winner.
Additional Rules:
1. Rule One: There is no specific restrain of how many truth or lie statements a player can make. However, telling a truth is always safe for players. Therefore, there is a rule that if a player is found by his/her opponent that he/she has continuously made three times truth statements, then, this player will be declared loss. Rule two: Player can play at most four cards at one time. (e.g. four cards runs, two pairs, four cards all one suits, a full or one treble plus a single card)
2.
Appendices
Appendices II Extracted Pitch Value for Non-deceptive Voices

128.09827993605720 117.13998195132154 150.04536144545861 142.86173347436414 144.50009412585621 110.24915254577955 133.64154616480025 129.65026539027838 215.86441616436517 168.93623659826687 150.92715760272989 159.52205920227138 165.17514287451897 166.33624948709746 132.17670667424915 144.22565907554747 138.39900176351705 235.70341111617813 134.59344263040094 159.82757729270855 161.66550443364807 234.15691046288850 144.01341366004061 153.07557529906299 154.64171066098103 143.82168475796416 155.95329992143817 169.66969782315215 174.05653148285978 172.85521914126579 162.12925603249772 253.12954270971699 157.36048582514746 156.23242175736991 145.89537404126824 148.87491500561043 118.82025793920715 105.66292535725498 255.31140725308424 146.53924893328642 158.98384365354167 129.18784741144296 276.36196791510992 136.59200922969990 171.39538458657662 137.99929399146325 165.07465687451304 158.62420998673900 173.77921079189528 152.66365653548090 143.62085955132949 246.93732652512671 147.56551596852430 145.19943479975652 133.41619177397226 177.38543620053443 176.20352613953662 158.82834200924168 147.97811161492586 143.26950342593602 142.30217360153537 149.05317469273871 194.45540385772293 169.22049871195520 127.18017143941134 159.41779667422267 156.58683004013005 288.40518496241191 142.78829363210079 145.97223515020866 152.61728590075884 147.99073384668085 118.36345753410458 115.68389095397200 156.78170011673640 154.10036184950832 169.97785652395112 147.33263100060381 111.57168564820746 172.15426918317317 180.71345179660156 131.29455828762588 166.39340054878008 188.65636427267464 160.71899642964965 146.57943541732669 124.68025369322488 143.30992712717969 127.93780613631078 232.64263947070015 169.72618359199282 154.36054099419312 154.44021937912285 147.84238346937889 150.20780094618650 152.49401810875685 152.66188879746358 154.49670157616043 166.98404220833342 169.81134241654007 165.89361525079906 144.19842785221761 146.77687026774703 163.27280664670809 142.58841876734593 149.45978941244695 149.00051552350504 152.40524751151818 126.04486108321123 109.33741730417672 169.35161309890498 145.76864351762782 147.56718099257364 141.35354627864024 137.86079830816632 171.79901117138439 139.90380839668063 136.49161314639349 155.23337094912267 129.15277953595611 149.66810766682880 131.69853295578480 154.95748178591492 111.03997338537835 135.53235283480180 192.82350594402186 135.06641221888634 134.06368796956923 214.03150258081772 203.26907917657274 168.05900440332883 160.68975550390096 142.29638492580909 147.38263675721618 168.06814297669183 172.49729472532968 168.43870852101111 152.08838197240283 168.25898161978225 175.35373201953286 158.01683996266459 147.41748428539216 139.18524708565900 118.04664667543052 117.06480864262316 105.15556372785949 148.25932742057225 149.37900155994689 129.49389137693618 206.23229278092586 132.57624522583771 232.24379352890190 124.52821135963218 159.18804011290021 147.76767983726722 156.13994085507198 140.42903127078421 230.19451372455995 110.93523056849458 192.63990110493469 136.83658019038202 163.49940460050223 136.37963682771502 187.44804330812937 210.49848788422057 203.86242940932391 142.44696166958710 149.51834463462168 276.16774547122981 155.90874982926587 176.73745374670730 169.56145039213197 164.83404983138502 163.89624869782784 172.57136022174495 274.07471223119251 152.74023433740319 147.39786684603484 141.61260098865574
Appendices
Appendices III
Extracted Pitch Value for Deceptive Voices
124.51886994534780 150.99806283176522 120.71132481070872 117.86311554169824 157.44946484640840 142.03154223032970 171.31695196260807 130.55120050686321 137.23107483383160 156.76639276358588 193.72614620202114 148.03893752699372 123.99455145292255 142.71562289534097 134.38373989446734 148.88599288109222 291.54108439040419 129.33270345605891 140.00837424935975 239.48501033579544 137.10214728613400 256.24022656184877 155.45648872052678 157.32013197359470 163.11755391142449 173.98374801918419 160.94977303197444 178.43526636221955 136.94983603806369 146.24090465736722 157.06170664309914 121.42176146497435 243.90242788673416 144.76555332211527 138.38796825488447 134.31764016520520 130.64904100050001 136.18832682317137 137.93235282244638 154.54009110081000 162.47236008998126 154.95219686243817 140.59999959168309 145.56350503981994 121.20501646675467 137.52103851350714 136.56326949104275 129.47776803724292 157.33301397792431 144.91610273233999 186.73240054806286 154.86385462221804 147.15599540238091 129.37808720319148 169.04147033351498 154.60282098175844 156.82221242970448 127.57352531133353 157.81670104588369 157.66544784172950 110.97021358944944 122.52324246557809 135.62892522726466 161.21138482070810 141.31366150787162 127.22010802561051 211.54877017320086 134.76505446380841 207.82745725488041 233.43448607574300 149.97669313049664 147.04056224300527 168.30647690379360 127.43184218402409 139.65262217676133 131.23725289211379 140.53997292194691 148.01768800166340 120.65512730497741 173.21878648943715 159.30148505853032 163.90038159769594 172.98931583295743 273.30286501317926 153.22047488192914 134.29123823832950 140.49179540302163 154.04687346328154 150.04937337681778 116.96530277454313 151.70634898133719 142.11053676946011 134.42226550991509 140.00139060449610 136.40754863771801 127.78076647741396 192.79432083263626 178.23832458039163 154.65694486894259 160.97713541399395 137.45871276777660 157.56355366813693 229.32819366711826 148.29164967200430 133.05885868787840 146.59949269582839 151.79600065085273 188.58573050528221 154.68997592080723 317.98335147130496 162.71116430235551 133.89767210312334 169.26730074366370 175.66924128454127 131.08867000560116 172.36876031702894 144.41076543052017 119.07202517693973 138.10288558900726 149.11839957962710 138.11588600288869 275.05592995383421 276.07973660942366 265.25610095205246 124.10787442109483 150.64656058708204 154.49786707029176 170.00925476167853 120.61702748519949 127.58719122046762 167.53902374301319 139.38056242542069 139.02912475240800 153.34185125364732 179.22235511254866 145.74208591184248 144.80844376060335 154.77884650063848 167.36147935396204 174.90859670556921 253.24967509388125 262.70927786573270 156.08179737113340
Appendices
Appendices IV Extracted Jitter Value for Non-deceptive Voices

0.01239462825660 0.01216485576507 0.01854170865751 0.01099453683372 0.00946559084699 0.00847673641362 0.01453598572587 0.00894960363954 0.00976638593004 0.01163734519193 0.01643569970913 0.01506883359076 0.01194542638103 0.01349456992223 0.01237936813375 0.01656314291732 0.01979025642212 0.00318277917993 0.01381102608594 0.01418427440875 0.01344476247792 0.01663963619570 0.01500900151985 0.01531448639875 0.01347253976443 0.02167114142947 0.02164266901926 0.02790210351452 0.01489219874102 0.00960550576217 0.01244777693575 0.01209047720719 0.01133127881525 0.01480829006322 0.01223393659934 0.01867361089426 0.01357300176124 0.01000702722166 0.00862246377159 0.01077473187173 0.01146491653180 0.00649010653825 0.01236163337366 0.01658757460967 0.00904925744897 0.00447661067584 0.01048881878832 0.01428364215000 0.00672267826889 0.00373619849872 0.01744700882721 0.01564580079921 0.00966301834003 0.02127332533199 0.00498592455215 0.01320726653873 0.01143075227049 0.02132094393134 0.00982515301287 0.01625351847804 0.01147669689308 0.01750291424368 0.01818827920535 0.01079042780861 0.01749659029427 0.01426523492847 0.00875029965932 0.01370578499724 0.01382408694175 0.01203291025330 0.01565647769769 0.01301762551055 0.01611153016019 0.00761712110582 0.01266375966048 0.01115835097402 0.01116209080919 0.01098675666646 0.00972001190024 0.01356306930215 0.00787552980439 0.00661962555991 0.00760291267184 0.01156198067217 0.01070750094086 0.00935224810751 0.00591843525006 0.02000336958128 0.01421575802084 0.01815214661982 0.01702133649171 0.01200422723932 0.02256883161386 0.00971964976656 0.01091202607353 0.01246802148900 0.02243002225291 0.00821293311644 0.01113541088445 0.01587154872317 0.01435833396333 0.01461653507471 0.01885546071331 0.01007219363494 0.01386760295615 0.01017342288595 0.00904987541104 0.01829048286172 0.01162886717988 0.00907169019984 0.01242236790967 0.01334736925665 0.00926547963643 0.00805527653438 0.01207414039318 0.01426666640101 0.01340673901064 0.01400378328405 0.01148100178208 0.01547749091230 0.00696421863224 0.01558903349567 0.01171864812113 0.01663062732701 0.01094204116342 0.01137662314152 0.01006691622868 0.01249043674660 0.00334260373099 0.01834798520344 0.02531380311633 0.02180197710025 0.00717772682266 0.01236120300799 0.02277641199108 0.01204087490573 0.01625255407196 0.01206218822395 0.01091510350809 0.00873523815840 0.00879606962362 0.01034275341499 0.00809040295530 0.01183661082119 0.01562692686423 0.01317842507893 0.01827292361060 0.00933599913791 0.00779339198290 0.01234321087564 0.01307082511835 0.01192982444861 0.01836401277029 0.00968561846436 0.01707194653935 0.01186491832382 0.00888190653264 0.01353430082863 0.01794358655044 0.01545544822086 0.01319543019752 0.01094861864912 0.00716387136147 0.00948686942559 0.01233634504138 0.01971377779441 0.02051119281917 0.01543568267123 0.01754875600208 0.01791612533761 0.01452907736980 0.01163758841771 0.01397748475953 0.01647694010607 0.01188074381608 0.01518714065254 0.00664892529305 0.01134471254695 0.01206361469831 0.01366229162381 0.01327476241290 0.00937787898377 0.02054140079680 0.00859657411659
Appendices
Appendices V
Extracted Jitter Value for Deceptive Voices
0.01492581246668 0.00984389283145 0.01175373319154 0.01170249719809 0.01679892557882 0.01510127659945 0.00834784454056 0.01379144929662 0.01305397703912 0.01903410377829 0.01223102171688 0.01472635751643 0.00751458841628 0.01444496380038 0.01158455965225 0.01764682331536 0.01498901446446 0.01388282012566 0.02452619860987 0.01363078062008 0.01476277032853 0.01697177362866 0.02284619564532 0.01541582705262 0.01560761282559 0.01561153247806 0.01391139866312 0.01046439056603 0.01554442924674 0.01262286447068 0.01369850325308 0.01482756999096 0.01189765451018 0.00777683089468 0.00828882182444 0.01577657075003 0.01279489064647 0.01435992377204 0.01793129476423 0.01192524802549 0.01464900923126 0.01311443338811 0.00935302198365 0.02238431387682 0.01471534368684 0.02053933554115 0.01297329536826 0.01155871149902 0.01615266002204 0.01802621379240 0.01231212634696 0.01359245043438 0.01608652609875 0.01672278531928 0.01756047727338 0.01489339101058 0.01619849360436 0.01450886162700 0.01874067855117 0.01261734141761 0.01216468357403 0.01159251459575 0.01190577057462 0.00663099325906 0.01092470096475 0.01366326954624 0.01072889450297 0.01259658789575 0.00725580041376 0.01634978975942 0.00960973369465 0.01536675588171 0.01640408767927 0.01940597137752 0.01242754133274 0.01468964356737 0.01052989364111 0.01622072641366 0.02067115197647 0.00814677651027 0.02424875030457 0.01605916963648 0.01387941217318 0.01558871622650 0.01046141747288 0.01939510199233 0.00796269328134 0.01537220717245 0.00845970793719 0.00868064751174 0.02269994823543 0.01592793546229 0.01513987449389 0.00934116614735 0.01168495800325 0.00655978366228 0.01154625107952 0.00648324094082 0.01269614783643 0.01146371132469 0.01946558264594 0.01394289034357 0.00692366545307 0.01344964774929 0.01018880190150 0.01773334877610 0.01257043830390 0.01321431092560 0.01796597318077 0.01034309957330 0.02114266980824 0.01711050431144 0.01544089128958 0.01044023304028 0.01194054681956 0.01118101775695 0.00831335092537 0.00708471889028 0.01158084701383 0.00691971108044 0.01463832753507 0.01328239729091 0.01352830070041 0.01469473756940 0.01076186884375 0.01689405000316 0.01098368353274 0.01251060668573 0.01631779013586 0.01872830083158 0.01663961661292 0.01821179746597 0.01896503856515 0.01594558491363 0.00677534255393 0.01776134054226 0.00840377774634 0.01504583358920 0.01251155799993 0.01582167540874 0.02089663240153 0.01806792522159 0.01362025187616

VOice Stress Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VOice Stress Analysis

Uploaded by

Copyright:

Available Formats

University of Sheffield Department of Computer Science

Voice Stress Analysis: Detection of Deception

Name: Xianfeng LIU Signature: Date: 31/Aug/ 2004

Chapter 2 Literature Survey

Chapter 2 Literature Survey

2.2 Voice Stress Analysis (VSA)

Chapter 2 Literature Survey

Chapter 2 Literature Survey

Chapter 2 Literature Survey

2.3 Deception Detection (DD)

Chapter 2 Literature Survey

Chapter 2 Literature Survey

2.4 Bayesian Hypothesis Testing

Chapter 2 Literature Survey

Chapter 2 Literature Survey

feature vector, x, which are defined as,

Chapter 3 Requirements and Analysis

3.2 Data Collection

Chapter 3 Requirements and Analysis

Fig 3.1 Voice Data Collection

3.3 Analysis Software

Chapter 3 Requirements and Analysis

Chapter 3 Requirements and Analysis

Chapter 3 Requirements and Analysis

Chapter 3 Requirements and Analysis

Fig 3.5 Use Transcriber to segment and label

Chapter 4 Design and Implementation

Fig 4.1 Knock-out game table

Chapter 4 Design and Implementation

Chapter 4 Design and Implementation

Chapter 4 Design and Implementation

4.2 Pitch Extraction

Chapter 4 Design and Implementation

autocorrelation ones. The algorithm he used is that: rx ( ) rxw ( ) / rw ( ) Eq 4.1

Chapter 4 Design and Implementation

4.3 Jitter Extraction

Chapter 4 Design and Implementation

Chapter 4 Design and Implementation

Fig 4.6 Pulses (Jitter)

Chapter 4 Design and Implementation

Fig 4.7 Jitter extraction script for Praat

4.4 Classification and Detection

Chapter 4 Design and Implementation

, m is the size of the vector) the

A.T 147.352 29.9724

J.E 155.9583 97.3767

L.W 153.1072 1489

L.G 116.3107 53.5314

Y.D 166.6665 549.6449

Y.P 134.452 31.3271

Y.T 135.8364 5.8552

A.T 155.0814 70.5265

J.E 166.6267 1073.1

L.W 168.5697 2034.5

L.G 119.3156 71.3751

Y.D 195.4759 1879.9

Y.P 144.428 82.0586

Y.T 144.1616 49.7894

Tab 4.1 mean and variance value of Pitch by speaker.

A.T 0.0112 9.16E-06

J.E 0.0126 2.14E-05

L.W 0.0152 1.92E-05

L.G 0.0167 2.40E-05

Y.D 0.0119 6.63E-06