You are on page 1of 20

PERK Corp.

Detailed Experiments Lab Report

Pranav Pannala, Evan Colenbrander, Ryan Miller, Kaleb Ryan

1
Abstract

Our first two experiments focused on the audio portions of our product. We
tested different accents and different distances to determine how effective our
product would be under those two different conditions. We also tested the male
and female versions of each accent. In order to test the experiment we used two
computers. These two computers remained the same throughout the entire test
to not introduce any changing variables and the device playing the audio was
also set at the same volume level for the entire test. For the distance test we
recorded each audio file three times and then calculated the average accuracy.
After each trial, we moved the computer back 1 additional foot until we reached
the 16 foot mark. For the accent test we placed the computers 2 feet apart. We
then tested the male and female versions of four different accents three times
each and calculated the average accuracy. The first result from our findings was
that the female voices worked much better than the male voices. Across both
tests the female voices always had higher accuracies than their male
counterparts. This is because the males voices are lower pitch so they blend in
more with the background noise. This made it difficult for the microphone to pick
up on the lower pitched voices compared to the higher pitched female voices.
The second result from our tests was that the American accent works the best
closely followed by the Indian and Australian accents. All three of these options
had consistent high accuracies during our tests. The only accent that did not
work as well was the English accent. Luckily, this will not be a significant issue
because our product will be sold in the US and there is not a very large British
population in the US. The final finding from our tests was that our product works
best when in the 1 to 8 foot range. Outside of this range the results are more
inconsistent but this will not be a significant issue because users will not be
communicating with the product from very far away.

2
Table of Contents

1. Abstract

2. Table of Contents

3. Introduction

4. Test Plan and Set-Up

5. Data

6. Analysis / Conclusions

7. References

3
Introduction

Purpose:

This report details our findings for our first two tests: the distance and accent
tests. The goal of our tests is to determine if the speech to text capabilities of the
RoboBuddy are accurate enough for the product to be successful. We will analyze the
effects of speaking to the RoboBuddy from increasingly farther distances and the
effects of speaking to the RoboBuddy using different accents. In the future, we plan on
conducting one more test to look at the physical properties of the product. For this test
we will stress test ABS plastic to determine how strong the product will be.

Distance Test Background:

Research has determined that the average standard size bedroom in the United
States is 11 feet by 12 feet.1 For our distance test, we will test the maximum possible
distance that would fit in a 11 by 12 foot rectangle. This value would be the diagonal of
the 11 by 12 foot space which is approximately 16 feet. This will be the maximum value
we test for the distance test.

1 https://www.doorwaysmagazine.com/average-bedroom-size/#:~:text=We%20determined%20that%20the%20average,14%20feet%20by%2016%20feet

4
Voice recordings Background:

For the accent test we will use 8 different recordings. These recordings have
been selected to represent accents from North America, Europe, Asia, and Australia. In
addition to each of the accents we will also be testing the male and female variants.

The different voices we have selected are listed below:


● American Accent
○ English (United States) Zoe
○ English (United States) Evan
● English Accent
○ English (Great Britain): Malcolm
○ English (Great Britain): Kate
● Indian Accent
○ English (India): Rishi
○ English (India): Veena
● Australian Accent
○ English (Australia): Karen
○ English (Australia): Lee

Text to Speech Background:

For each accent, we used an online website with a text to speech field to
generate the audio recordings.2 We decided to use the default message from the
website because it was a decent length with varying sentence structures.

Default Message:

“Hi, how can I help you? Great, no problem. Your transaction is complete. Have a great
day. Feel free to type anything else, and even choose another voice.”

2 https://www.nuance.com/omni-channel-customer-engagement/voice-and-ivr/text-to-speech.html#!

5
Test Plan

Test 1: Accent Test


Purpose:
The United States is a very diverse country so our product needs to be compatible with all
different types of accents. By testing each accent, we can determine if our product will be able
to be used by all different types of consumers.

Experimental Setup:
We will test different audio recordings of regional accents to determine if the RoboBuddy is able
to pick up all different accents. We will test male and female versions of the
American, English, Indian, and Australian accents.

Setup Sketch:

Materials:
● 8 Audio Files: 4 different accents male/female
● Phone
● Computer
● Recording device: spreadsheet on computer
● Web Speech API
● Visual Studio Code

Dependent Variables:
● Accent of voice
● Male or female

Independent Variables:
● Distance from the microphone
● Speaker volume
● Speaker and microphone type

6
Procedure:
1. Setup the experiment by loading the accent audio files onto a phone. Load the
RoboBuddy code on a computer and set up the two devices to be exactly two feet apart.
2. Play the first audio recording. Repeat the recording three times and record the accuracy
for each test.
3. Determine the accuracy by dividing the number of correct words from the total number of
words said. Look at the text to speech field on the computer to see which words the
RoboBuddy incorrectly recorded.
4. Repeat the second step seven more times for each of the other audio files.
5. Determine the averages across each test by adding up the accuracies for each trial and
dividing by three.
6. Analyze the results to determine if the RoboBuddy is compatible with different accents.
7. Determine if a better speaker or speech to text system will be needed for the final
prototype.

Test 2: Distance Test


Purpose:
We must determine if the RoboBuddy can effectively work at longer distances. We also need to
determine the effective range of the microphone and speaker system. Our distance test will be
able to give us results to form conclusions on both of these questions.

Experimental Setup:
This test will be used to determine how effective the RoboBuddy will be in larger rooms. In order
to eliminate as many changing variables, only two audio files will be used. We will have a male
and female American accent to test.

Setup Sketch:

7
Materials:
● Tape measure
● 2 Audio Files: 1 male and 1 female
● Phone
● Computer
● Recording device: spreadsheet on computer

Dependent Variables:
● Distance from the speaker
● Male / female audio file versions

Independent Variables:
● Distance from the microphone
● Speaker volume
● Audio files used
● Speaker and microphone type

Procedure:
1. Setup the experiment by loading the two audio files onto a phone. Load the RoboBuddy
code on a computer and set up the two devices to be exactly one foot apart.
2. Play the first audio recording. Repeat the recording three times and record the accuracy
for each test.
3. Determine the accuracy by dividing the number of correct words from the total number of
words said. Look at the text to speech field on the computer to see which words the
RoboBuddy incorrectly recorded.
4. Move the phone one foot farther from the microphone. Repeat steps 2-3 using the same
two audio files. Continue to move the phone back until the accuracy falls below 50%.
5. Determine the averages across each test by adding up the accuracies for each trial and
dividing by three.
6. Analyze results to determine the effectiveness of the RoboBuddy and determine if any
improvements need to be made for the microphone and speaker system.

8
Data

Test 1: Accent Test


Voice Test 1 Test 2 Test 3 Average

American “how can i help “how can i help “hi how can i 93% accuracy
Accent (Male) you great no you great no help you great
problem have a problem your no problem your
great day feel transaction is transaction is
free to type complete have a complete have a
anything else great day feel great day feel
and even free to type free to type
choose another everything else anything else
voice” and even and even
choose another choose another
82% voice” voice”

96% 100%

American “hi how can i “hi how can i “hi how can i 100% accuracy
Accent (Female) help you great help you great help you great
no problem your no problem your no problem your
transaction is transaction is transaction is
complete have a complete have a complete have a
great day feel great day feel great day feel
free to type free to type free to type
anything else anything else anything else
and even and even and even
choose another choose another choose another
voice” voice” voice”

100% 100% 100%

English Accent “how can i help “how can i help “feel free to type 31% accuracy
(Male) you no problem you” anything else”
your transaction
is complete have 18% 21%
a great day”

54%

English Accent “great no “hi how can i “how can i help 49% accuracy
(Female) problem have a help you have a you great no
great day” great day” problem you will
transaction is

9
25% 36% complete have a
great day see
you free to type
anything else
and even
choose another
voice”

86%

Indian Accent “no problem “how can i help “no problem 69% accuracy
(Male) your transaction you great no your transaction
is complete have problem your is complete have
a great day feel transaction is a great day”
free to type complete have a
anything else great day feel 36%
and even free to write
choose another anything else
voice” and even
choose another
75% voice”

96%

Indian Accent “play how can i “play how can i “how can i help 94% accuracy
(Female) help you great help you great you great no
no problem is no problem your problem your
complete have a transaction is transaction is
great day feel complete have a complete have a
free to type great day feel great day feel
anything else free to type free to type
and even anything else anything else
choose another and even and even
voice” choose another choose another
voice” voice”
89%
96% 96%

Australian “no problem “how can i help “how can i help 76% accuracy
Accent (Male) your transaction you right no you right no
is complete. problem. have a problem. have a
have a great day great day feel great night feel
feel free to type free to type free to type
anything else i anything else anything else
need to choose and even and even
another voice” choose another choose another
voice voice”
75%
79% 75%

10
Australian “hi how can i “how can i help “how can i help 86% accuracy
Accent (Female) help you say no you no problem you say no
problem you'll is complete have problem in is
transaction is a great day feel complete have a
complete have a free to talk great day feel
great day feel anything else free to type
free to talk and even anything else
anything else choose another and even
and even voice” choose another
choose another voice”
voice” 82%
86%
89%

Test 1 Rankings (based on average percentage):

1. American Female
2. Indian Female
3. American Male
4. Australian Female
5. Australian Male
6. Indian Male
7. English Female
8. English Male

Setup:

The setup consisted of two computers placed exactly 2 feet apart. One device was used to play
the audio recordings and the other device had the RoboBuddy interface loaded.

11
Test 2: Distance Test

Female Voice: Zoe


Distance (feet) Trial 1 Trail 2 Trail 3 Average

1’ “how can i help “hi how can i “hi how can i 99% accuracy
you great no help you great help you great
problem your no problem your no problem your
transaction is transaction is transaction is
complete have a complete have a complete have a
great day feel great day feel great day feel
free to type free to type free to type
anything else anything else anything else
and even and even and even
choose another choose another choose another
voice” voice” voice”

96% 100% 100%

2’ “hi how can i “hi how can i “hi how can i 100% accuracy
help you great help you great help you great
no problem your no problem your no problem your
transaction is transaction is transaction is
complete have a complete have a complete have a
great day feel great day feel great day feel
free to type free to type free to type
anything else anything else anything else
and even and even and even
choose another choose another choose another
voice” voice” voice”

100% 100% 100%

3’ “hi how can i “hi how can i “hi how can i 100% accuracy
help you great help you great help you great
no problem your no problem your no problem your
transaction is transaction is transaction is
complete have a complete have a complete have a
great day feel great day feel great day feel
free to type free to type free to type
anything else anything else anything else
and even and even and even
choose another choose another choose another
voice” voice” voice”

100% 100% 100%

4’ “hi how can i “hi how can i “hi how can i 100% accuracy
help you great help you great help you great
no problem your no problem your no problem your

12
transaction is transaction is transaction is
complete have a complete have a complete have a
great day feel great day feel great day feel
free to type free to type free to type
anything else anything else anything else
and even and even and even
choose another choose another choose another
voice” voice” voice”

100% 100% 100%

5’ “hi how can i “how can i help “hi how can i 90% accuracy
help you great you great no help you great
no problem problem your no problem
transaction is transaction is transaction is
complete have a complete have a complete have a
great day feel great day feel great day and
free to type free to type even choose
anything else anything else another voice”
and even and even
choose another choose another 75%
voice” voice”

100% 96%

6’ “hi how can i “hi how can i “hi how can i 100% accuracy
help you great help you great help you great
no problem your no problem your no problem your
transaction is transaction is transaction is
complete have a complete have a complete have a
great day feel great day feel great day feel
free to type free to type free to type
anything else anything else anything else
and even and even and even
choose another choose another choose another
voice” voice” voice”

100% 100% 100%

7’ “hi how can i “i how can i help “how can i help 92% accuracy
help you great you great no you no problem
no problem your problem your have a great day
transaction is transaction is feel free to type
complete have a complete have a anything else
great day feel great day feel and even
free to type free to type choose another
anything else anything else voice”
and even and even
choose another choose another 82%
boy” voice”

13
96% 96%

8’ “hi how can i “how can i help “hi how can i 73% accuracy
help you no you great no help you no
problem your problem you will problem”
transaction is transaction is
complete have a complete have a 29%
great day feel great day feel
free to time free to type
anything else anything else
and even and even
choose another choose another
voice” voice”

96% 96%

9’ “how can i help “hi how can i how can i help 72% accuracy
you no problem help you great you no problem
have a good no problem your complete have a
day” transaction is nice day feel
complete have a free to type
36% great day feel anything else
free to type and even
anything else choose another
and even voice”
choose another
voice” 79%

100%

10’ “hi how can i “hi how can i “hi how can i 81% accuracy
help you great help you no help you great
no problem you problem have a no problem is
would great day too for complete have a
transaction is you to type great day and
complete have a anything else time anything
great day for me and even else and even
to type anything choose another choose another
else and even voice” voice”
choose another
voice” 75% 79%

89%

11’ “hi how can i “hi how can i “hi how can i 96% accuracy
help you great help you great help you great
no problem no problem no problem you
transaction is transaction is were transaction
complete have a complete have a is complete have

14
great day feel great day for you a great day feel
free to type to type anything free to type
anything else else and even anything else
and even choose another and even
choose another voice” choose another
voice” voice”
93%
100% 96%

12’ “hi how can i “hi how can i “how can i help 49% accuracy
help you great help you great you great no
no problem no problem have problem have a
transaction is a great day” great day”
complete have a
great day” 46% 43%

57%

13’ “hi how can i “hi how can i “how can i help 80% accuracy
help you have a help you no you great no
great day feel problem have a problem have a
free to type great day feel great day feel
anything else free to type free to type
and even anything else anything else
choose another and even and even
voice” choose another choose another
voice” voice”
75%
82% 82%

14’ “hi how can i “hi how can i “hi how can i 71% accuracy
help you great help you no help you great
no problem have problem have a no problem
a great day” great day feel transaction is
free to type complete have a
46% anything else great day feel
other voice” free to type
anything else
71% and leave a
choose another
voice”

96%

15’ “hi how can i “hi how can i hi how can i help 79% accuracy
help you no help you no you no problem
problem have a problem have a great
great day too for complete have a day”
you to type great day feel
anything else free to type 43%

15
and even anything else”
choose another
voice” 68%

89%

16’ “hi how can i “hi how can i “how can i help 36% accuracy
help you no help you no you no problem”
problem have a problem choose
great day” another voice” 25%

43% 39%
*There were varying amounts of background noise during this experiment, and the voice
recognition was much more effective when tested while there was little to no background noise.

Male Voice: Evan


Distance (feet) Trial 1 Trail 2 Trail 3 Average

1’ “hi how can i “how can i help “how can i help 96% accuracy
help you great you great no you great no
no problem your problem your problem your
transaction is transaction is transaction is
complete have a complete have a complete have a
great day feel great day feel great day feel
free to type free to type free to type
everything else anything else anything else
and even and even and even
choose another choose another choose another
voice” voice” voice”

96% 96% 96%

6’ “how can i help hi how can i help “how can i help 76% accuracy
you no problem you great no you great no
your transaction problem your problem your
is complete have transaction is transaction is
a great day feel complete have a complete have a
free to type great day feel great day and
everything else free to type even choose
and even anything else another voice”
choose another and even
voice” choose another 75%
voice”
89%
100%

11’ “no problem “your transaction “how can i help 54% accuracy
your transaction is complete have you great no
is complete have a great day” problem your

16
a great day” transaction is
29% complete have a
36% great day feel
free to type
anything else
and even
choose another
voice”

96%

16’ “how can i help “have a great “” 10%


you” day”
0%
14% 14%

Setup:

The setup for our second test used the same two devices from the first test. One device was
placed on a chair which was used to move the position of the device after each trial.

17
Conclusions

Test 1:

Our results show us that the product works best when used with an American accent.
The second best option would be an Indian accent, followed by an Australian accent, and finally
an English accent. When looking at the graphs for both the male and female voices the
accuracies follow that trend. The American, Indian, and Australian accents all had relatively
similar accuracies when you look at the grouping in the graph below. This means that we do not
expect any of these three accents to have issues. The only accent that raises concerns would
be the English accent. Because we will be producing our product in the United States, and only
0.6% of the US population consists of British Americans 3, we do not believe this will lead to any
significant issues.

Our results additionally show us that in addition to particular accents having higher
accuracies, the female voices did much better than their male counterparts. Every female voice
had a higher accuracy than the male version as seen in the data tables and the graphs below.
The male voices tended to be much deeper compared to the higher pitched female voices. This
made the male audio harder to pick up because it blended in more with the background noises.
Luckily, our product is targeted to small children who will not have deep voices. When listening
to children's voices, they align much more closely with an adult female voice than an adult male
voice. Our findings from the first test raise no noticeable issues that must be addressed and the
tests confirm that our RoboBuddy speech recognition system works as intended.

Test 2:
3 https://en.wikipedia.org/wiki/British_Americans#:~:text=0.6%25%20of%20the%20total%20U.S.%20population.

18
When looking at the results for the second test, the accuracy stays above 75% when the
product is used around the 1 - 8 foot range. At that point the accuracy becomes more
inconsistent and dips up and down. The accuracy still stays above 75% in some places like at
11 feet and 15 feet but it also drops below 75% around 12 feet and 16 feet. Most users would
not be standing on the farthest possible side of a room when using the product so this data tells
us that the product will be functional at standard distances. When the user starts to move farther
away in the 8 - 16 foot range the results can be more inconsistent but we do not believe this will
be a significant issue because most conversations will happen in a closer range.

When looking at the male vs female data we can see that the male voice did significantly
worse than the female voice. This finding aligns with the results from our first test when we
found that the male accents did worse than the female accents. Again, this is due to the male
voice being a lower pitch compared to the female voice. Because the RoboBuddy is being used
with children we can expect the children to have accuracies that align more with the female
voice than the deep adult male voice.

Overall Conclusion:

The overall takeaway from our tests is that the product works best at a one to eight foot
range with female voices, especially American accents.

19
References

1. https://www.doorwaysmagazine.com/average-bedroom-size/#:~:text=We%20determined%20that

%20the%20average,14%20feet%20by%2016%20feet

2. https://www.nuance.com/omni-channel-customer-engagement/voice-and-ivr/text-to-speech.html#!

3. https://en.wikipedia.org/wiki/British_Americans#:~:text=0.6%25%20of%20the%20total%20U.S.%

20population.

20

You might also like