Professional Documents
Culture Documents
A COMPARISON OF COMMERCIAL
SPEECH RECOGNITION
COMPONENTS FOR USE IN POLICE
CRUISERS
Outline of Presentation
Project54 Overview
C o m p u te r
G PS F in g e r p r in t a id e d
v e h ic le checks d is p a tc h
tr a c k in g
C e n tra l d a ta
re s o u rc e s :
m o to r v e h ic le ,
V o ic e D ig ita l c r im in a l,
com m and r a d io fin g e r p r in ts
V o ic e
re s p o n s e
R e m o te a c c e s s to
v e h ic le r e s o u r c e s
V id e o
C e n tra l d a ta b a s e a c c e s s
a n d fo rm s e n try
A Free sample background from www.powerpointbackgrounds.com
Slide 5
Introduction
Introduction
– Limit distraction
– Limit frustration
– Standard Process
Introduction
Introduction
Accuracy
Efficiency
– false recognitions
– weighted
Outline of Presentation
SR ENGINE OPTIONS
Speed of Speech
– Discrete
– Continuous
Type of Application
– Command-and-control
– Dictation
User-Dependency
– Speaker dependent
– Speaker independent
Field of Application
– PC
– Telephone
– Noise robust
Grammar File
A Free sample background from www.powerpointbackgrounds.com
Slide 12
Comparing SR Engines
Field test
Simulated tests
– Speaker source
– Background noise
– Number of speakers
Accuracy Ratings
Not consistent
– Different conditions
Hyde’s Law
Component Requirements
Microphone
– Must be far-field
– Mountable on dashboard
– Cancel noise
• Array
• Directional
Outline of Presentation
Application
A
Application Application
H B
Application Application
F D
Application
E
LOOP ENGINES
LOOP BACKGROUND
LOOP
COMMANDS
Laptop w/ SoundBlaster
Earthworks M30BX
Background recorded on patrol
Speech commands in lab
– Microsoft Audio Collection Tool
– 5 Speakers (4 male, 1 female)
– 40 phrases
Matlab script
Background Noises
– WAV filename
– Desired SNR
– Signal strength
– Description of file
Voice Commands
– WAV filename
– Number of loops
– Signal strength
– Phrase
Outline of Presentation
PRODUCTS TESTED
Four microphones
– A, B, C and D.
Four SR engines
– 1, 2, 3, and 4.
16 unique combinations
– A1 through D4
SR ENGINES
SR Engine 1
– Microsoft SR Engine 4.0
SR Engine 2
– Microsoft SR Engine 4.0
SR Engine 3
– Dragon NaturallySpeaking 4.0
SR Engine 4
– IBM ViaVoice 8.01
PREPERATION
TEST SCENERIO
Identical conditions
42 phrase grammar
10 speech commands
5 speakers
6 background noises
3 SNR levels
Outline of Presentation
ACCURACY BY ENGINE
80
70
60
Accuracy (%)
50 MIC A
40 MIC B
MIC C
30
MIC D
20
10
0
ENG 1 ENG 2 ENG 3 ENG 4
A Free sample background from www.powerpointbackgrounds.com
Slide 32
ACCURACY BY MIC
80
70
60
Accuracy (%)
50
ENG 1
40 ENG 2
ENG 3
30
ENG 4
20
10
0
MIC A MIC B
A Free sample background from www.powerpointbackgrounds.com
MIC C MIC D
Slide 33
RANKED ACCURACY
80 C2
A2
70 D2
A1
60 C1
Accuracy (%)
50 B2
D1
40 B1
30 D4
C4
20 B4
B3
10 A3
0 C3
D3
Configuration
A4
A Free sample background from www.powerpointbackgrounds.com
Slide 34
Efficiency Score
Specific to Project54
False recognitions
Efficiency Score
SAID HEARD
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS LOSS = 0
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS
Efficiency Score
SAID HEARD
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS LOSS = 1
LIGHTS UNRECOGNIZED
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS
Efficiency Score
SAID HEARD
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS LOSS = 1.5
LIGHTS SIREN ON
SIREN OFF SIREN OFF
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS
LIGHTS LIGHTS
Efficiency Score
Scoring system
– Correctly recognized = 1.5
– Unrecognised = 0.5
– Falsely recognized = 0
Extreme scores
– All correct => Eff. = 100
– All unrecognised => Eff. = 33
– All falsely recognised => Eff. = 0
RANKED EFFICIENCY
80 C2
A2
70 A1
Efficiency (max 100)
60 C1
D2
50 D1
D4
40
B2
30 C4
B4
20 B1
10 B3
A3
0 C3
Configuration D3
A4
A Free sample background from www.powerpointbackgrounds.com
Slide 40
WINNER
Accuracy
– Configuration C2 accuracy = 70.3 %
Efficiency
– Configuration C2 efficiency = 72.4
Logical choices
– Microphone C
– SR Engine 2
Speakers SR experience
Limited training
Training Environment
Default settings
Microphone and speaker placement
SNR
Outline of Presentation
CONCLUSION
CONCLUSION
– Limit distraction
– Limit frustration
CONCLUSION
CONCLUSION
Configuration C2
– Most accurate
– Most efficient
SR ENGINE 2
Microsoft SR Engine 4.0
Telephone mode
CURRENT STATUS
9 vehicles on road
300 in production
MORE INFORMATION
www.project54.unh.edu
andrew.kun@unh.edu
brettv@unh.edu