You are on page 1of 6

Aculab Answering Machine / Live Speaker Detection

What’s this all about?

The aim of this work is to allow us to train our Artificial Intelligence (AI) systems to
differentiate between live human responses and recorded or synthetically generated
messages. This is termed Answering Machine Detection (AMD), or sometimes Live
Speaker Detection (LSD).

To use AI for this task, we need a very large number of telephone calls, to exemplify
the different ways that people and machines answer calls. What’s more, we also
need to know when the important “acoustic events” occur in these recordings.

Essentially that means we need labels to identify when any speech starts and stops,
and what type of speech it is (machine or human). Because many of these
recordings also include tones (such as the ring-tone before the call is answered, or
tones produced if someone has pressed the keys of the phone) it’s also helpful to
label when these tones occur.

We’ve reduced the list of acoustic events of interest to those listed below.

a) The time when the phone stopped ringing (if it ever rang at all)

b) The start of a live speaker’s response when answering the call (i.e. not
general background speech)

c) The end of the live speaker’s response

d) The start of an answering machine or voicemail message

e) The end of the machine message

f) The start of any short “beep”, such as one denoting the end of an answering
machine message, or a telephone key-press

g) The end of the “beep” or key-press

Getting ready to go…

The first thing you’ll need to do, is to make sure you have Microsoft OneDrive
installed on your computer. If not, you can install it via one of the links below:

Microsoft Windows:
https://www.microsoft.com/en-gb/microsoft-365/onedrive/download

Apple MacOS:
https://apps.apple.com/us/app/onedrive/id823766827

Then you need to set up your OneDrive according to Microsoft’s instructions and sign
up or sign in with a personal email address (not a corporate one, i.e. not
“...@aculab.com”).
Open “OneDrive” on your computer, and right-click on the background of the file
explorer window that’s displayed. A list appears including a section with a blue cloud
icon next to “View online” – click “Settings” just below that, then “Settings” again,
and un-check the “Files on demand” box.

Before closing this window, click on “Backup” and un-check “Photos and videos”
and “Screenshots”, then go to the “Office” tab and un-check “Use office
applications...”

Close the “Settings” dialogue, and again right-click on the background of the
OneDrive folder. This time, select “New” then “Folder” to create a new folder. Name
it as your Aculab ID (usually 2 letters and 2 digits). This is where all your audio data,
annotation software, and the results of your annotation will be stored.

Right-click on the new folder, and select “View online”. This will take you to the cloud
storage for your OneDrive, where you can create a link that will allow us to write files
into the folder. Once you’ve signed into the website, you should see a screen
showing the contents of your (currently empty) folder. Click on the blue “Share”
button, and make sure it says “Anyone with the link can edit”. Click on “Copy link” to
get the link that you need to send us.

Paste the link into an email but delete the letters “https:” from the start of the link,
otherwise our company email scanner will treat the link as a virus and bounce the
email back to you. Address the email to steve.beet@aculab.com and send it.

Now wait a few minutes to make sure it doesn’t get bounced by our email scanner,
and then post a message on the Discord “Aculab / Digital analyst” channel to let us
know you’ve sent it. This is important, because emails from non-business addresses
are often treated as “suspected spam” and put on hold by the email scanner, so we
might not see it until sometime later.

We will then copy the files you will need into your online OneDrive, and they should
appear in the OneDrive folder on your computer soon afterwards. We will notify you
via Discord once we’ve uploaded the files, and you should let us know if they don’t
appear on your computer within the next few minutes.

Note: Our software relies on the files and folders within the shared folder, so don’t
delete, move, or rename anything in there unless we ask you to. Also note that
anything you create within the shared folder will be uploaded to the cloud, and
shared with us at Aculab – only add things to the folder if they’re not confidential, and
not too big!

If your computer is running Apple MacOS you will also need to perform a couple of
extra steps:

1. Open a terminal and navigate to your OneDrive folder. You might want to enable
the “New terminal at folder” service to simplify this process – you can search the
internet to find how to do this if you’re unsure. Once you’re in the folder
containing “Annotate.sh”, type the following into the terminal window:

chmod +x *.sh

This will allow you to execute the shell scripts (the “.sh” files).
2. Install the Praat audio annotation software, which will be in the “bin” sub-folder of
your OneDrive – called “praat6148_mac.dmg” – or you can download it yourself
from https://uvafon.hum.uva.nl/praat/download_mac.html

3. You can then test that everything is installed correctly by typing the following
command into the terminal window (still in the same folder as “Annotate.sh”)?

./Annotate.sh

Make sure you type this exactly. The dot, slash, and capitalisation of the “A” are
important!

This will start the annotation programme, Praat, and then you can carry on to the
next section.

These steps are not needed under Microsoft Windows – files are executable by
default and Praat does not need to be installed because it’s already present in the
“bin” sub-folder of your OneDrive.

Running the Praat script

The annotation process involves running a Praat script. This is stored in a “.praat”,
file but is completely different from an Apple MacOS shell script or a Microsoft
Windows batch file. The method for running the Praat script depends on your
computer’s operating system:

• If you’re using Apple MacOS, open a terminal window in the folder containing
“Annotate.sh” and run the script by typing:

./Annotate.sh

You should see two new windows – the “Praat Objects” and “Praat Picture”
windows.

Alternatively, you can simplify this process by creating a shortcut – and again,
search the internet if you’re unsure how to do this. Later on you will also be
executing the other shell script, “Cleanup.sh”, so you might want to make a
shortcut for that too.

• On the other hand, if you’re using Microsoft Windows, you can start by going to
the new folder you created in your computer’s OneDrive via the “OneDrive”
shortcut from the Windows Start Menu. There you should see a batch file,
“Annotate.bat”. Run (double-click) on this file and it should open two new
windows – the “Praat Objects” and “Praat Picture” windows.
Starting to annotate

Whichever operating system you’re using, you should now select the “New” menu
from the “Praat Objects” window, then select “Annotate...” near the bottom.

That will display a window with three fields:

1. The full path of the Praat script that you’re running

2. The folder that contains all the audio files you’re about to transcribe

3. The folder that will be used to store the labels that you create (the “TextGrid”
files)

At some point you may have to change the text in these fields, but for now you
should just leave them as they are and click on “OK”. Two windows should pop up –
one titled “Pause...” and one called “TextGrid*”

From this point on, you will be annotating one file after another until they’ve all been
labelled:

a) Find the “TextGrid*” window, showing the first 30 seconds of the file. There are
two bands near the top of the screen – the first a black-on-white line-graph of the
audio waveform, and the second is a “spectrogram” that shows how the signal
energy changes during the recording.

b) If the upper part of the spectrogram is dark, it means there is a lot of energy in the
higher frequencies. The only speech sounds that have this characteristic are
“fricatives” (“shhh?”, “ffff?”, “thhh”) and “unvoiced plosives” (“p”, “t”, “k”). Vowels
have more energy at lower frequencies, so they have spectrograms with dark
regions near the bottom of the display.

c) The other common types of signals that you will be dealing with will be noise,
crackles and steady tones. Noise and crackles mostly consist of high frequencies,
like fricatives and plosives, while tones show up in spectrograms as perfectly
horizontal, thin, dark lines.

d) To listen to part of the audio, select it by clicking-and-dragging across the


waveform or the spectrogram, from the start to the end of the section you want to
hear. The section will also be shown on the first grey band at the bottom of the
screen, with a number that represents its duration (in seconds). Clicking on the
selected region of that band should cause the audio to be played out.

e) To see or listen to the later parts of long recordings, use the scroll-bar at the
bottom of the window. Sometimes speech will not be visible until you scroll
across, so try to remember to at least take a quick look through the whole
recording.

f) Between the waveform at the top and the grey bands at the bottom of the
window, there are a number of “tiers”, with numbers on the left, and names on the
right. Each one is used to label a different feature. The bottom tier can be used to
record general comments about the audio (e.g. “background chatter”, “car noise”,
“music”, etc. etc.) but can be left blank.
g) To put a marker on a tier, click on the waveform or spectrogram at the time you
want to mark, and click on the small circle at the top of the appropriate tier.

h) If you make a mistake, you can delete a label (and the description, if there is
one), by clicking on the respective line in the tier and press “alt” and
“backspace” together.

i) If there’s something unusual about the waveform you can select the “Comment”
tier and type a short description of the issue. This tier is slightly different from the
others, in that it’s divided into intervals rather than points in time, and the markers
are boundaries between adjacent intervals. Therefore you should add a marker at
the start and end of any unusual feature, before clicking between the boundaries
of the interval you created and typing a label to describe it (e.g. “music” or “fax”).

j) Once you’ve finished labelling the recording, or if you want to abandon it, close
the “TextGrid*” window, and find the “Pause*” window. If you’re happy with
the labels you just created, press “Save”. This will save the labels as a “TextGrid”
file and open a new “TextGrid*” window for the next file. If you want to take a
break at this point, just click on “Quit” and close the “Praat Objects” window.
Otherwise go back to step (a).

k) Alternatively if you want to discard the labels and have another go, press “Retry”,
or if you’re not sure how to label the file, and want to come back to it later, press
“Skip”.

l) The “Stop” button does the same as “Quit”, but doesn’t “tidy up” after itself – if
you press it you will need to dismiss a warning message before closing the “Praat
Objects” window. It’s best not to use the “Stop” button!

Next time the Praat script is run, it will scan through all the files until it finds one
which doesn’t already have a “TextGrid”, and it will carry on from there.

Coping with the unexpected…

It won’t be long before you find a recording that doesn’t fit the normal pattern. Here
are a few rules-of-thumb that you can use if you get stuck:

1) You only need to label the audio for 20 seconds following the first response (i.e.
20 seconds after the replayed message starts, or the called party speaks to
answer the call).

2) If there’s no speech visible initially, scroll through the waveform to find when (or
if) it starts, and label the following 20 seconds (if present).

3) If two speech segments are separated by less than 2 seconds of silence, label
them as a single segment.

4) Treat groups of tones that occur in quick succession (e.g. phone key-presses or
auto-dialled numbers) as a single beep.

5) Mark music-on-hold as a machine response, and add a label “music” to the


comment tier.
6) Some recordings have an answering machine message with an embedded
recording of the person saying their name. In other cases, people or machines
may use different languages (announcing themselves first in Spanish, then in
English, for example). We’re not concerned with language or speaker identity, so
just mark the speech as a single message, unless there’s a long (2 seconds or
more) gap between words.

7) When annotating one or two files, you may see an error message saying the file
was too short. This is usually because the file has been corrupted somehow. You
should carry on and label the file, but add “truncated” in the “Comment” tier so we
know there’s a problem.

8) If Praat doesn’t allow you to label a file for any reason, close the programme and
delete the file from the “wav” folder of your OneDrive. Please let me know (either
by email or Discord) which file you deleted so I can check up on the cause of the
problem.

9) If you can’t decide what to do, look in the pinned message on the Discord
“digital-analyst” channel to see if there’s something relevant there.

Keeping things neat & tidy…

As well as “Annotate.bat” (or “Annotate.sh” on Apple MacOS), there are two other
batch / shell script files you should be aware of: “Cleanup.bat” (or “Cleanup.sh” on
Apple MacOS) and “Synchronise.bat” (only needed on Microsoft Windows).

At the end of an annotation session, when you’re confident that all the labels you’ve
created are correct, you should run “Cleanup.bat” or “Cleanup.sh”. This will delete
all the audio files that have already been labelled, leaving behind only those that
have not yet been labelled. It will also transfer the labels from the “textgrid” folder
to one called “completed”. We will then periodically retrieve the files from the
completed folder, and delete them from your OneDrive.

Under Microsoft Windows, if you suspect that your computer’s OneDrive has not
been updated fully, you can force it to re-synchronise by running “Synchronise.bat”
or by logging out-and-in-again on your computer. This should not normally be
necessary but can help if your OneDrive gets stuck.

You might also like