Robert Allen Fox

Forensic Phonetics
273 Whitaker Avenue, South
Powell, OH 43065

25 January 2011

Report: Acoustic processing/analysis of recorded utterance for the Columbia, MO Police


In the early part of July, 2010, I was contacted by Sergeant Chris Kelley in the Patrol
Division of the Columbia, MO Police Department regarding analysis of a covert digital recording
made by an undercover police officer just before a subject was arrested. The recording was
relatively noisy so Sergeant Kelley was particularly interested in whether I could reduce the
noise in a particular section of the recording and determine just what the suspect had said. This
report provides a description of my analytic procedures and, on the accompanying CD, a copy of
the unprocessed recording as well as the processed recordings. In this report I also provide my
expert opinion on what the suspect actually said.
On or around 7/8/2010, Sergeant Kelley had a CD which contained a copy of the original
recording sent to me via Fedex. I have copied this original digital recording to a CD
accompanying this report and named it Complete_Original_Recording.wav (the original name of
the recording on the CD sent to me via Fedex was WS400014.wav). Subsequently, I created a
shorter wavefile which contained the first 1:15 minutes of the original recording. This I have
included as a wavefile named initial_v1.wav it serves as the source of the utterance in question.
Unless otherwise note, the waveform manipulations (and editing) were done using Adobe
Audition (1.0 and 3.0). Noise reduction processing was done using procedures available both in
the Adobe Audition and Adobe Soundbooth CS5 programs. A figure of the waveform
represented in this wavefile can be seen in Figure 1.
Figure 1. Graphic display (time by amplitude) of the waveform of the portion of the recording
named initial_v1.wav.

Next, I copied a 2.057 sec portion of the recording which had the utterance in question into a
separate soundfile. This file I have copied and named short_utterance_v1.wav. If you listen
carefully, you can hear the suspect’s speech in the background seeming to say (in the utterance
form the time position 0.341 to 1.621 sec) “give me (your) fucking wallet.” However, the noise
in the recording makes it somewhat difficult to hear. Figure 2 shows a graphic display of the
waveform in short_utterance_v1.wav. The highlighted portion contains the utterance of interest.
What is evident in Figure 2 is the “noisiness” of the recording although you can hear the
utterance, albeit as a low amplitude level, in the noise when this wavefile is played. The goal of
my analysis was to reduce the noise level in this signal while preserving the speech signal.
Figure 2. Graphic display (time by amplitude) of the waveform of the portion of the recording
named short_utterance_v1.wav. The portion highlighted contains the utterance of interest.

The first step in the noise reduction process is to eliminate persistent background noise that is
found throughout the recording. This is done using a three-step process. First, the
phonetician/acoustician searches the wavefile for a stretch of the recording during periods during
which no one is talking and when there is no any other identifiable noise source (such as a
barking dog, or banging on a table). Going back to the initial_v1.wav file, one can find such a
stretch of recording from 26.689 to 28.220 (a length of 1.53 seconds). Using FFT analysis (Fast
Fourier Transform) using the Adobe Audition noise reduction option, one can calculate a
frequency profile of the sound in this section of the waveform (which represents background
noise). This was done, and the frequency profile saved (this information is provided in the file
named Profile_26.689.fft). The profile was then loaded it into the Adobe Audition program and
this background noise was removed from short_utterance_v1.wav. The modified waveform
(saved in a file named short_utterance_v2.wav) is shown in Figure 3 following this noise
reduction step. Although there is some improvement in the signal that can be heard, there was
not a dramatic change in the waveform itself.
Figure 3. Waveform display of short_utterance_v2.wav.

An analysis of the amplitudes of various frequencies in short_utterance_v2.wav (shown in

Figure 4) reveals much energy in the lower frequencies (0-400 Hz) that are likely not part of the
speech pattern as well as higher level energy (3500Hz and above). Therefore, the next step was
to digitally filter the speech using a FFT bandpass filter (400 to 3500 Hz) to eliminate most of
the energy in the 0-400 Hz and 3500+Hz frequency ranges.

Figure 4. Frequency analysis of short_utterance_v2.wav.

With this goal in mind, a FFT bandpass filter was constructed as shown in Figure 5, and it
was used to filter then signal in short_utterance_v2.wav to produce the waveform shown in
short_utterance_v3.wav (graphically displayed in Figure 6).

Figure 5. Schematic of FFT filter used.

Figure 6. Graphic display of short_utterance_v3.wav (following bandpass filtering).

This signal was then boosted (amplified) by 18 dB to produce the waveform (saved as
short_utterance_v4.wav) found in Figure 7 (that part of the signal which contains the utterance
of interest is highlighted).

Figure 7. Graphical display of short_utterance_v4.wav.

Next, the section of interest was extracted, saved to a new soundfile and the amplitudes of the
obvious noisy “clicks” (seen as sharp vertical lines in the waveform) were reduced by 8 dB,
yielding the waveform shown in Figure 8. What is left, after noise reduction, bandpass filtering
and amplification, is to the trained (and likely to the untrained) ear, a production of the utterance
“Give me (your) fucking wallet.”
Figure 8. Graphical display of short_utterance_v5.wav.

While completing the analysis of this signal, I created several other noise reduced/speech
enhanced versions of this same section of the recording using Adobe’s Soundbooth program.
The end result was very similar, but short_utterance_v5.wav is the clearest, in my opinion.
In summary, after the noise reduction process described above was completed in my opinion
(as a phonetician and speech scientist well-versed in acoustic analysis of speech), the utterance
produced by the suspect and recorded was “Give me (your) fucking wallet.” The “your” is in
parentheses as this word was likely reduced in the speaker’s production to a transitional voicoid
(a vowel-like sound) between the “me” and “fucking” that is not readily discernible (the same
way that the “a” in “One small step for (a) man” produced by Neil Armstrong when he first
stepped on the moon is not discernible).