Professional Documents
Culture Documents
Cepstral Analysis Technique For Automatic Speaker Verification Furui
Cepstral Analysis Technique For Automatic Speaker Verification Furui
net/publication/3176892
CITATIONS READS
1,265 1,890
1 author:
Sadaoki Furui
Tokyo Institute of Technology
405 PUBLICATIONS 9,956 CITATIONS
SEE PROFILE
All content following this page was uploaded by Sadaoki Furui on 05 August 2014.
w
effectiveness of each coefficient.
A crucial property of the system is automatic time registra- EMPHASIS
tion of the time functions of the sample utterance to the time
functions retrieved as the reference template of the claimed
identity. An overall distancebetween the sampleutterance
and the reference template is obtained as the result of time
registration using a dynamic programming technique. The dis- I CORRELATION
COEFFICIENTS 1 (ORDER lo)
I I
tance of each element is weighted by intraspeaker variability I
andsummed t o produce the overall distance. Finally, the LINEARPREDICTIVE
COEFFICIENTS
overall distance is compared withJathreshold distance value to 1
determine whether the identity claim should be accepted or
CEPSTRUM
rejected. COEFFICIENTS
Details concerningthe analysis procedures,referencecon-
Fig. 2. Block diagram for cepstmm extraction.
struction,time registration, anddistancecalculation willbe
presented in the following sections.
dictor coefficients are extracted from each frame by the auto-
A. Normalized Cepstrum Extraction correlation method. The linear predictor coefficients are
The speech wave is bandlimited from 100 Hz to 3.0 kHz and transformed into cepstrum coefficients, using thefollowing
sampled at a 6.67 kHz rate, or bandlimited from 100 Hz to recursive relationshps [9] :
2.6kHz and sampled at 6 kHz. The digitized speech is then
scanned forward from the beginning of the recording interval
and backward from the end to determine the beginning and
end of the actual sample utterance. The endpoint detection is
accomplished by means of anenergy calculation. Ahigh
emphasis filter (1 - 0.95Z-l) is applied to thedelimited where ci and ai are the ith-order cepstrum coefficient and
speech, and a 30 ms Hamming window is applied to the em- linear predictor coefficient, respectively. Fig. 2showsthe
phasized speech every 10 ms. First to tenth-order linear pre- block diagram of these processes.
256 IEEE
TRANSACTIONS
ON
ACOUSTICS,
SPEECH, AND SIGNAL
PROCESSING, VOL. ASSP-29, NO. 2, APRIL 1981
1 -1
I I
can derive from them a set of parameters which are invariant
to any fixed frequency-response distortion introduced by the POLYNOMIAL FUNCTION l*l
recording apparatus or the transmission system. The new pa-
rameters are obtained simply by subtracting from the cepstrum
coefficients a set ofvalues representing their time averages
over the duration of the entireutterance. This process can FEATURE
SELECTION
normalize the gross spectral distribution of the utterance, and
it is similar to the inverse filtering process which has been used . REFERENCE DYNAMIC
TEMPLATE
in a spoken word recognition system at Bell Laboratories [ 6 ] .
The normalization technique introduced by Atal is used in the 1
OVERALL DISTANCE
COMPUTATION
speaker verification system studied in this paper. I
In previous studies by the author [7], [SI it was shown that
this normalization process is also effective in reducing long-
termintraspeaker spectral variability for maintaining high
speaker verification andidentification accuracy over a long
period. Fig. 3. Block diagram indicating the principal operations of the system
(modification of Fig. 1).
B. Polynomial Coefficients
Time functions of the normalized cepstrum coefficients are of feature extraction is modified as shown inFig. 3, since
expanded byan orthogonal polynomial representation over cepstrum normalization does not affect the first- and second-
90 ms intervals every 10 ms. The 90 ms interval length seemed order polynomial coefficients.
adequate for preserving transitional information between pho- Accordingly, the utterance is represented by time functions
nemes. The first three orthogonal polynomials are used. They of the cepstrum coefficients x,(i), and the first- and second-
are [IO] order polynomial coefficients, b,(i) and c,(i), where t is the
Poi = 1 frame number and i is the index of the cepstrum coefficient
(1 < i < p). Since p is set to ten in this system, the result is a
P .=j- 5 representation by a time function of a 30-dimensional vector.
1J
dijklm
( Z f m , if j = k )
CONVERSION
d-
TABLE I
SETSUSEDIN EXPERIMENTS
UTTERANCE
Number of
No. Impostors
Female
Recording
FEATURE
PARAMETER (1) Male 10 + 40 Telephone
ANALYSIS
t
I
I
SPEAKER
VERIFICATION I
Fig. 6. Block diagram indicating the procedure to make the utterance ““
sets. (4)
-
CODER ADAPTIVE
OUANTIZER
Microphone
I
Telephone
I ’b A‘
I
0 1 1 1 1 1 I l l I 1J
( 2 3 4 5 6 7 8 9 1 0 1 2 3 4 5 6 7 8 9 1 0
CEPSTRUM COEFF INMX CEPSTRUM COEFF INDEX
Fig. 9. Interspeaker to intraspeaker distance ratio for the first ten Fig. 10. Interspeaker to intraspeaker distance ratio for the middle ten
utterances by f i i e speakers each extracted from utterance set (1). utterances by five speakers each extracted from utterance set (1).
TABLE 111
AVERAGE SET:No. (6). FR FALSE
ERRORRATES. UTTERANCE REJECTION
(FALSEALARM).FA: FALSE
ACCEPTANCE (MISSRATE).
I I I I I
A Posteriori 0.06%
of the most important problems in speaker verification [7], Fig. 11. Two reference updating procedures used for utterance set (5).
[8]. In order to check the effect of the time interval between In Method 1, a reference template is updated every seventh access by
the customer. In Method 2, a reference templateis updated each time
training and test utterances on speaker verification accuracy the system is accessed by the customer. In both methods, the latest
this interval was varied from six days to six weeks comparing five utterances are used to update thereference template.
test utterances with reference templates constructed at times
corresponding to the specified intervals. In this experiment 14
utterances were used as test inputs by ten customers each, and C Results for Utterance Set (5)
five utterances were used toconstruct each reference tem- In the third experiment, utterance set ( 5 ) which comprises
plate. The experimental results indicated that verification ac- 26 utterances by 21 male customers each and a single utter-
curacy is not affected by time intervals between training and ance by 5 5 impostors recorded over a high-quality microphone
input utterances of at least six weeks. was used to test the speaker verification system. In this case
the 18 selected parameters include the first nine cepstrum co-
B. Results for Utterance Set ( 6 ) efficients and the first nine first-order polynomial coefficients.
The second experiment was performed using utterance set Fig. 11(a) shows the time relation between training and test
(6) which comprises the utterances by ten female customers utterances in speaker verification experiments for the condi-
and 40 female impostors recorded over a conventional tele- tion that five utterances were used to construct a reference
phone connection. templateforeachcustomer andthatit was updated every
The ratio of interspeaker distance to intraspeaker distance seventh access by the customer. Table IV shows error rates
for each parameterwas calculated using the first ten utterances averaged over 21 customers. Results of the first, middle, and
by five customers each. The result was similar to the result for last seven input utterances are averaged separately. Falsere-
male speakers shown in Fig. 9. The original cepstrum coeffi- jection error is very large for the first seven input utterances.
cients are most effective and the second-order polynomial co- Initial variability like this was also shown in the previous ex-
efficients are less efficient than the first-order ones. This re- periment by Rosenberg [4].
sult was used for the selection of 18 parameters, which include In order to improve the results forthe first seven input
all ten cepstrum coefficients and all the first-order polynomial utterances, the second method for reference updating was in-
coefficients except coefficients index numbers 4and 9. troduced. The reference template was updated at each time of
Table I11 presents the speaker verification results under the the customer's access using the latest five utterances as'shown
same conditions observed for the male speaker set described in in Fig. 1 l(b). Table V shows the results of the verification ex-
the previous section;a reference file was constructed using periment using this method. In this case, the error rate for the
five utterances and updated every seventh access by each cus- first seven input utterances is not significantly larger than
tomer. Although the error rates for the female utterance set those for the middle and last seven utterances. Compared with
are slightly larger than those for the male utterance set, they Table IV, it can be also concluded that frequent updating of
are still very small. the reference template is quite efficient for several initial input
Results for a speaker verification experimentin which a utterances but it is not necessary to do it for the remaining
reference template was constructed using five' utterances and utterances. If we apply the second reference updating method
the time interval between training andtest utterances was to the first seven input utterances and the first reference up-
varied up to six weeks were quite similar to that of the male dating method to the remaining utterances, we canachieve
speaker set. There was no significant increase inerrorrate verification errorrate ofless than onepercent using the
when the interval is extended to six weeks. a priori estimated threshold.
2 62 IEEE
TRANSACTIONS ON ACOUSTICS,
SPEECH, AND SIGNAL PROCESSING,
VOL. ASSP-29, NO. 2 , APRIL 1981
TABLE IV TABLE VI
ERRORRATES.UTTERANCE
AVERAGE SET:No. (5). REFERENCE
UPDATING: ERRORRATES.UTTERANCE
AVERAGE SET:No. (3). FR: FALSEREJECTION
METHOD1 . FR: FALSE (FALSEALARM).
REJECTION FA: FALSE (FALSE
ALARM). FA: FALSE
ACCEPTANCE (MISSRATE).
(MISSRATE).
ACCEPTANCE II I I I
Threshold
Estimated
A
Threshold
1 1 I 1FR
FR
0.29%
FA
FA
0.83%
FR+FA
2
0.56%
Priori
Estimated
A Posteriori 0.04%
TABLE VI1
ERRORRATES BY ESTIMATED
AVERAGE a priori THRESHOLD.
Priori
I Utterance
set No. ( I ) No. (6) No.(5) No. (3)
Fixed
I Customers IO Male 10 Female 21 Male IO Male
TABLE V
(False Alarm) 0.29% 0.29% 0.68% 0.29%
AVERAGE
ERROR SET:No. (5). REFERENCE
RATES.UTTERANCE UPDATING:
1 1 La;1
METHOD2. FR: FALSE
REJECTION
(FALSE A L A R M ) . FA: FALSE
False Acceptance
(MISS RATE).
ACCEPTANCE
(Miss Rate) 0.08% 0.43% 0.86% 0.83%
-.--I Utterances
I Average 0.19% 0.36% 0.77% 0.56%
First Middle
I Number of Trials 5,500 1 5,500 I 12,726 1 5,500
I
I I 068%
0.68%
1 Estimated
FR
0.86%
0.60% 0.60%
seventh access. Although the errorrates are slightlylarger
than thoseobtained for clean speechpresentedin Table 11,
both false rejection and false acceptance are still less than one
percent even whenthe decision threshold is set a priori by
ir
(12). Speaker verification results showing the effectofex-
A ! I tending the interval between training and test utterances up to
prior^ FR 1.36% 1 0.68% I 0.68% I six weeks indicated thatthe errorrate was slightly greater
Fixed FA 0.56% 1
0.74% 0.74% I 0 57% 1 than that obtainedfor clean speech.
Table VI1 shows the summary of the results of speaker veri-
0.96% 1 0.71% 1 0.63% 1 fication experiments for utterance sets (l), ( 3 ) , ( 9 , and (6),
0.94% 1 0.17% 1 0.348 i using the a priori threshold specified by (12). For utterance
set ( 9 , reference templates were updated following each access
for the first seven test utterances using the latest five utter-
ances. After the seventh utterance, updating was carried out
D. Results for Utterance Set (3) every seventh access. For all other utterance sets updating was
In the fourth experiment, utterance set (3) which comprises carried out only after each seventh access. For all utterance
50 utterances by ten male customers each and a single utter- sets except (5) there were 35 customer test utterances and 515
ance by 40 impostors, recorded over a conventional telephone impostor test utterances per customer for a total of 350 cus-
connection and transformed by a 24 kbit/s ADPCM system, tomerand 5150 impostortrials, respectively. Forutterance
was used to evaluate the speaker verification system. In this set (5) there were 21 customer test utterances and 585 im-
case, the 18 selected parameters include all ten cepstrum co- postor test utterancesper customer for a total of 441 customer
efficients and all the first-orderpolynomial coefficients ex- and 12 285imposter trials, respectively.
cept coefficients withindex numbers 1 and 2. Although this table indicates a higher error rate for micro-
Table VIshows the results of speaker verification experi- phone speech than for telephone speech, the difference is sta-
mentswhen an initial reference template for each customer tistically insignificant since the number of utterances which
was constructed usingfive utterances andupdated every caused the verification error is very small.
FURUI: AUTOMATIC SPEAKER VERIFICATION 263
TABLE X
ERRORRATES WITHNo CEPSTRUM
AVERAGE NORMALIZATION.
TRAINING ADPCM. TESTUTTERANCES:
UTTERANCES: LPC.
c-v
T 1'" Threshold Error Rate Error
1
False Rejection
c-A 1: A Priori
(False Alarm)
False Acceptance
1.14%
1.22%
v-v Average
1lo
A Posteriori !?qual 0.52%
v-v u: tionerrorrate
information.
In thenext
by removing thelong-term speaker-related
TABLE XI
AVERAGE ERROR RATES. TRAINING UTTERANCES:
PROCESSED
BY
PREEMPHASIS.
TESTUTTERANCES:
UNPROCESSEDBY PREEMPHASIS.
F R FALSEREJECTION
(FALSE ALARM).
F A FALSEACCEPTANCE
(MIS RATE).
Cepstrum
Normalization Threshold FR FA
2
YES A Priori
A Posierion 0.14%
A Priori
I Fixed
A Posterior
17.14% 16.82%
1644%
9.66%
I 2 3 4 5 6 7
Loo AREA RATIO INDEX
8 9 1 0
VI. DISCUSSION Fig. 13. Interspeaker to intraspeaker distance ratio for each time func-
tion of log wea ratios and polynomialcoefficients derived from
A. Comparison Between Cepstrum and Log AreaRatio them. Ten utterances by five male speakers each were used for the
analysis.
In order to check the advantage of the cepstrumCoefficients
derived through LPC analysis &PC-cepstrum),which have
been used in this paper, log area ratio parameters, which are TABLE XI1
arctanhtransformatioriof PARCOR coefficients, were ex- AVERAGEERROR RATES. UTTERANCE SET: SUBSETOF THE UTTERANCE
SET(1). THRESHOLD:A posteriori EQUALERRORTHRESHOLD.
tracted from the utterance set (1) and studied. Log area ratios TIMEREGISTRATION: CONSTRAINED ENDPOINT DYNAMIC TIME
were found to be very good parameters for speaker verification WARPING. NUMBER OF TRAINING UTTERANCES: 3.
in previous experiments by the author [ 191 . Fig. 13 shows
distance ratios for each time function of log area ratios and
polynomial coefficients derived fromthem. The results for Cepstrum 0.80%
IO
,ENVELOPE BY LOG
RATIO
AREA 1
ENVELOPE BY LPC-CEPSTRUM
2-
L SHORT
TIME SPECTRUM
ENVELOPE BY FFT-CEPSTRUM
- FREPUENCY
(a)
0
0
I I I
I
I I
2
FREOUENCY ( k H z )
I 1 I I
3
(4
10
ENVELOPE BY LOG AREARATIO
I
ENVELOPE BY LPC -CEPSTRUM
ENVELOPE dY FFT-CEPSTRUM I
0
0 1 2 3
p FREOUENCY FREOUENCY (kHZ)
(b) (b)
Fig. 14. Spectralenvelopesderived fromten cepstrumcoefficients Fig. 15. Comparison of four kinds of spectra; short time speech spec-
(a) or ten log area ratios (b) for a spoken sentence, “We were away a trum, spectral envelope derived from log area ratio, spectral envelope
year ago.” Time sequences of the envelope for the first 100 frames derived from LPC-cepstmm, and spectral envelope derived from FFT-
(1 s long) are shown. .
cepstrum. (a) For the sound /i/ in “We . . .” (b) For the sound lo/
in “. . . ago.”
cepstrum computation time which includes two Fourier trans- TABLE XI11
formations is almost twice that of the LPC-cepstrum. AVERAGEERRORRATES.UTTERANCESET:No. (6) (FFT-CEPSTRUM).
The distance ratio for each parameter derived from the FFT- FR. FALSE (FALSE
REJECTION ALARM).FA FALSE ACCEPTANCE
(MISS RATE).
cepstrum was calculated. The result was similar to that for the
LPC-cepstrum except that the speaker dependent information
Threshold FR FA
in the FFT-cepstrumtends to concentrate in the first-order 2
spectively, for LPC-cepstrum, and 2.07, 1.83, and 1.37, Priori Fixed 0.86% 0.74% 0.80%
respectively, for FFT-cepstrum. LPC-cepstrum has slightly
A Posteriori 0.02%
largervalues than FFT-cepstrum, but the difference is very L
small.
Table XI11 shows the results of a speaker verification experi- same results in speaker verification as the conventional FFT-
ment using FFT-cepstrumunder the same condition as that cepstrum, while it takes only half the time to calculate the
using LPG-cepstrum whose results were presented in Table 111. LPG-cepstrum compared with the FFT-cepstrum.
The difference in error rates between these two experiments is
very small. Speaker verification results using FFT-cepstrum C. Effectiveness of Polynomial Coefficients
when the interval between training andtest utterances was In order to study the effectiveness of the use of polynomial
long was similar to the results obtained using LPC-cepstrum. coefficients on speaker verification, an additional experiment
It is apparent that the LPC-cepstrum produces almost the was performed in which the polynomial coefficients were
FURUI: AUTOMATICSPEAKERVERIFICATION 267
TABLE XIV
ERROR
AVERAGE RATES.TRAININGUTTERANCES:PROCESSED BY
PREEMPHASIS. UNPROCESSED
TESTUTTERANCES: BY PREEMPHASIS.
FR FALSE (FALSEALARM).
REJECTION F A FALSEACCEPTANCE A PRIORI THRESHOLD
(MISSRATE). 1.5 (FR + FA112
Feature
\
Parameters Threshold
Estimated
FR
2.29%
I 1
FA
1.69%
FR + FA
1.99%
!t
W
0
I \
\
Cepstrum A
A Posteriori 0.64%
Estimated 0
0 50 100 (50 200
Cepstrum A SEGMENT
LENGTH (mS)
and Priori Fixed Fig. 16. Error rate versus the length of the speech segment for orthogo-
'
nal polynomial expansion. Training utterances were processed with
Polynomial preemphasis, whereas test utterances were not.
Coefficients
7
dated every seventh access by the customer for the remaining \
\
testutterances(method 1).Decision thresholds were set \
Mean error rates for this utterance set using the UEGI pro- \
TABLE XVI
AVERAGE
ERRORRATES.UTTERANCE SET:No. (1). TIMEREGISTRATION
CONSTRAINED
END-POINT DYNAMIC TIMEWARPING. NUMBER OF
TRAINING 3 . F R FALSE
UTTERANCES: REJECTION (FALSE ALARM).
F A FALSEACCEPTANCE (MISSRATE).
-
.
40
.
0
&
,
,,
30-
,
,, e
.
A Posterior 0.14%
utterances which were tested in the speaker verification ex- Fig. 20. Error rate versus b parameter in the threshold estimation equa-
tion. The effects of parameter variation on the average of false rejec-
periment using themixedtransmissionsystem(Section V) tion andfalse acceptance errorratesare shown using the estimated
were scannedby varying theparameter b. Theparameter a decision threshold. Results for nine training systemtestingsystem
was set to 0.6, which was determinedexperimentally. The pairs are plotted.
number of errors was tabulated at each step by comparing the
actual overall distances with the estimatedthreshold using of b. The optimum value of the parameter bywhich produces
the varied parameter value b. False acceptance rate and false the minimum average error rate, is seven almost irrespective of
rejection rate wereaveraged and plotted in Fig. 20. Results the experimental condition. This is the value consistently used
for nineconditions,which are ninecombinations of three in this paper to estimate an optimum threshold for each cus-
training systems and three testing systems, are shown. As the tomer. As there is sometradeoffbetweenthe two kinds of
result of increase in false acceptance rate for large threshold error rates, if it is desirable to keep the false acceptance rate at
values and increase in false rejection rate for small threshold a much lower value, the parameter b should be set at a value
values, the averaged error rate has a concave slopeas a function smaller than seven, even though it increases the false rejection
270 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING,VOL. ASSP-29, NO. 2, APRIL 1981
7r TEST TRAINING
-FALSE
REJECTION --- FALSE
ACCEPTANCE
UTTERANCE SET (1) x UTTERANCESET (61
CLEAR o CLEAR
0 UTTERANCE SET(31 A UTTERANCE
SET (51
~
5-
-3 1.0-
-E W
w 4- 5
a
a
a
B 3-
a
P
05-
L.-.
6
2-
0-
I-
WITHHOLDING
THRESMLD
(a)
---
24 25 26 27 28
-FALSE ACCEPTANCE
''jl
FALSE REJECTION
THRESHOLD
UTTERANCESET (11
Fig. 21. Error rate versus decision threshold. The effects of threshold 0 UTTERANCESET (31
variation on the average of false rejection and false acceptance error X UTTERANCE SET(6)
rates are shown using the threshold. Results for nine txaining system-
testing system pairs are plotted.
0
I. Withholding Decision
Another tabulation was carried out to assess the effect on -SHORTTIMEINTERVALBETWEENTRAINING
error rate ofwithholdingdecision(sequentialdecision)on ---LONG ) AND TEST
UTTERANCESET ( 1 I
UTTERANCES
trials for which I D T - 0 I < A, where DT and 0 are the overall 0 UTTERANCESET (31
a UTTERANCE
SET (61
distanceandthreshold, respectively. When the decisionis UTTERANCESET (51
centageofwithheld trials, which is thepercentage of addi- Fig. 23. Percentage of withheld trials versus withholding threshold.
R FURUI: AUTOMATIC 271
VII. SUMMARY
WITHHOLDING
RATE I%)
A new system for automatic speaker verification has been
(a) implemented on a 16-bit .laboratory computer and evaluated.
--- FALSE ACCEPTANCE A fixed, sentence-long utterance is analyzed by cepstrum co-
-FALSE REJECTION efficients by means of LPC analysis. Frequency-response dis-
UTTERANCE SET ( 1 tortionsintroducedbytransmissionsystemsare removed
o UTTERANCESET (3)
1.0r R automatically. Time functionsofcepstrum coefficients are
expanded by orthogonal polynomial representations and com-
- 0.8 - paredwithstoredreferencefunctions.Afterdynamictime
W
% warping, a decision is made to acceptor reject anidentity
01
01
claim. Referencefunctionsanddecisionthresholds are up-
0.6-
L
dated for each customer. The total processing time is approxi-
0
W
mately 40 times real time in this computer simulation.
i 0.4 - In the first part of the experiment, three sets of utterances
s were used fortheevaluationof the system.The first and
0
z second sets eachcomprises 50 utterancesbytencustomers
o.2 t I 0
each and a single utterance by 40 impostors recorded over a
conventionaltelephoneconnection. The third set comprises
O LL- O 20
WITHHOLDINGRATE
30
I
(%)
40
I KJ50
26 utterances by 21 customers each and a single utterance by
55 impostors recorded over ahigh-qualitymicrophone.The
(b) first and third sets were uttered by male speakers, whereas the
Fig. 24. Normalized error rates versus percentage of withheld trials. second set was uttered by female speakers. The evaluation in-
(a) and (b) correspond to (a) and (b) in Fig. 22, respectively. dicated mean error rates of 0.19 percent, 0.36 percent, and
0.77 percent for each utterance set,respectively.
Second, the first utterance set was processed by an ADPCM
tional trials, as afunctionofthewithholdingthreshold A. codingsystem and an LPC codingsystem.Theseutterance
Fig. 24 shows the relation between the percentage of withheld sets were used for a speaker verification experiment together
trials and the error rate normalized by the error rate obtained with an unprocessed utterance set. Experimental results indi-
without withholding. Part (a) and part (b) correspond to part cate that the transmission system affects the verification ac-
(a) and part (b) in Fig. 22, respectively. Fig. 24 indicates that curacy only slightly even if the reference and test utterances
at least 30 percent improvement in error rate can be obtained are subjected to different transmission conditions.
with decisions withheld on five percent of the trials and an Third,thetime interval betweenreference andtestutter-
average 73 percentimprovement can beobtainedwith de- ances was changed from six days to six weeks. Results of the
cisions withheld on ten percent of the trials. experiment indicate no significant increase of verification error
with the increase of time interval.
J. Combination with Pitch and Intensity Contours These results verify the robustness of the new speaker verifi-
Speaker verification systemsbasedonpitchandintensity cationsystempresented inthis paper.Some discussions on
contours have beenevaluated in Bell Laboratories using the new techniques used in this system are also included in this
same utterance sets used in this paper [l] -[5]. paper.
As theinformationconveyedbypitchandintensitycon- Further investigations, current or projected, include a large-
tours is considered to be almost independent of that conveyed scale and long-term evaluation over telephone lines permitting
by the cepstrum, the combination of these two kinds of in- direct customer access and on-line response,and specialized
formation will improve the performance. hardware processing to improve response time.
212 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL. ASSP-29, NO. 2 , APRIL 1981