Professional Documents
Culture Documents
E. H. Wrench, Jr.
ITT Defense Communications Division, 9999 Business Park Avenue, San Diego, Ca. 92131
INTRODU[I ON
193
recognition features. In this study, both steady 40) 0040000iE T0000040JO. 00 5.2004 MOd.)..
0) 700000000 7000000110, 00500000 UOd.I..
flOlfl0107000014000, 005.00,00004.1..
state vowel regions and voiced speech were tried (4)
194
0) 000 000000
8)2000700.220
90
80
70
?00m)obi000V11oo0.e.oo.Ooo. 0,1.0"
Markel's algorithm was chosen for the Figure 5: Performance of Markel' s Technique
laboratory realtime demonstration system based on With Noisy Speech
the results obtained from the algorithm selection 20 Second Models (15dB Signal to Noise tio)
studies. The speaker recognition system is
implemented using a PDP—ll/60 as the system
controller and an ITr developed, high speed signal phase of the operation with the options that are
processor, the iintrell PAM, for the available at that time. The operator simply
computationally intensive realtime processing. A selects from a displayed menu the desired operating
block diagram of the system is shown in Figure 6. mode (model generation, recognition, etc.) , and
Ml operator interaction with the system is done indicates the action to he taken via simple
through the PDP—ll. The control prog ram in the commands.
PDP—ll directs the Quintrell to perform the
appropriate tasks. In addition, the PDP—ll is used The system is capable of either model
for such functions as display formating, plotting generation or speaker recognition for up to 30
and hard copy. The Quintrell is used for the talkers in realtime. During recognition, the
processing that must be done in realtime, such as system continuously displays the similarity between
the LFC—lø analysis and the continuous averaging of the unknown speaker and the stored models. A
coefficients. sample display is shown in Figure 7. The input
speech from the unknown talker is analyzed frame by
aie of the goals for the realtime system was frame, and the LL—l0 reflection coefficients
to integrate it with an operator interface to extracted. A voicing detection algorithm is used
a flexible, easy to use system.
provide This was to discard all but the voiced frames. The voiced
accomplished by prompting the operator at every frames are then continuously averaged. A
J Model
-,
1GeneratiOnation
Speech
20
•0
commands to QoolotreU
Oorator Command
model data to/from o.olntreli
Control
Peinal Interpreting
& p t oiJ _..
c 0
01
00
'-4
'195
background process, which runs any time the LR, CCLUSI ONS
voicing, or averaging routines are idle, calculates
the Mahalanobis distance between the current Markel's technique was shown to be well—suited
averagecoefficient vector, and all of the models. for text independent speaker recognition with
With 30 models, the similarity between the unknown limited amounts of speech for both model generation
and every model is updated about once per second. and recognition. In addition, the study showed
that Markel's technique performs better than the
Pfeifer technique under these conditions.
The implementation of Markel's algorithm used in
this study achieved recognition rates in excess of
Io 90% for all speakers when used with limited amounts
of speech for both the reference models and the
unknowns. In addition, this algorithm has been
implemented in a realtime speaker recognition
demonstration system and achieves similar high
recognition scores. The realtime demonstration
system has proven to he easy to operate with little
or no instruction. As operator can generate a
model using "live" speech, document the model with
pertinent speaker data, and use the model for
Figure 7: Sample clitput From the Ièaltime realtime speaker recognition, all within less than
Speaker recognition System a minute.
Limited testing of the realtime speaker
recognition system was conducted. A thirty talker
data base was generated by recording speakers from
commercial television. Speaker models were
generated with 10 and 20 seconds of the recorded
speech for each talker. Different 10 and 20 second
segments from each talker were then used as
unknowns.
196