Professional Documents
Culture Documents
Chinese Acoustic Model
Chinese Acoustic Model
()
SCSPM
1(SCSPM) HMM
MGCPM
SCSPM
MGCPM SCSPM
2 HTK HTK
3Context Dependent, CD
4CD-Phone
CD-Phone
10%
5CD Initial/Final,
CD-IF
Extended Initial/Final, XIF
XIF IF CD-XIF
80% 25%
SCSPM
-I-
- II -
Abstract
Acoustic Modeling is one of the key problems in the field of speech recognition.
In this paper, the techniques of acoustic modeling and parameter tying strategy are
deeply studied. Two main aspects are focused on: the proposition and
implementation of the Semi-Continuous Segmental Probability Model (SCSPM);
The study of Context Dependent (CD) acoustic modeling with decision tree based
state tying, and the implementation of CD phone modeling and Initial/Final (IF)
modeling respectively. Including:
1The SCSPM is proposed and implemented. It is based on the traditional
Hidden Markov Model (HMM) and the modified HMM namely Mixed Gaussian
Continuous Probability Model (MGCPM), the Vector Quantizaztion (VQ) technique
and the feature of continuous probability density distribution are integrated, and the
method of Tied Mixture is adopted to describe the probability distribution of each
state. Moreover, the mixture weight reduction method is studied and analyzed, and a
new effective method which prunes the small tying weight through the iterative
training process is proposed. Compared with the MGCPM, SCSPM can reduce the
model scale and computational complexity significantly with little degradation in
recognition accuracy.
2The HMM Tool Kit (HTK) platform is studied and analyzed. Based on HTK,
an effective method is implemented for acoustic model training and performance
evaluation.
3The Decision Tree (DT) based state tying strategy in the Context Dependent
(CD) acoustic modeling is deeply studied. Two different DT design methods are
analyzed, the design of question set and the DT node splitting strategy are discussed.
Furthermore, the method for treating the silence model is studied to enhance the
robustness.
4The CD-Phone model with decision tree based state tying is implemented.
The phone question set and different DT structure are designed. Compared with the
baseline syllable model, CD-Phone model reduces the syllable error rate (SER) by
about 10%.
5The CD Initial/Final (IF) model with decision tree based state tying is
implemented. To maintain the connection between Initial and Final, the Extend IF
(XIF) set is proposed by adding the Zero Initials to the prior standard IF set.
Experiments show that the XIF model outperforms the IF model. The syllable
correct rate of the implemented CD-XIF model can achieve over 80%, and the
CD-XIF model can reduce the SER by over 25% compared with the baseline
syllable model.
Keyword: Semi-Continuous Segmental Probability Model, Parameter Tying
Strategy, Decision Tree based State Tying, Context Dependent Phone
Model, Context Dependent Initial/Final Model
- III -
- IV -
1.1 1
1.1.1 1
1.1.2 2
1.1.3 2
1.2 4
1.2.1 4
1.2.2 5
1.3 6
1.4 7
1.5 8
11
2.1 HMM 11
2.1.1 HMM 11
2.1.2 HMM 12
2.1.3 HMM 15
2.1.4 HMM 17
2.2 MGCPM 18
2.3 20
23
3.1 SCSPM 23
3.1.1 (SPM) 23
3.1.2 SCHMM 23
3.1.3 SCSPM 24
3.2 24
3.2.1 25
3.2.2 27
3.3 27
3.4 SCSPM 28
3.5 29
-V-
3.5.1 29
3.5.2 30
3.5.3 SCSPM MGCPM 30
3.6 31
33
4.1 33
4.1.1 33
4.1.2 34
4.2 35
4.2.1 35
4.2.2 37
4.2.3 37
4.3 40
4.3.1 40
4.3.2 40
4.3.3 CD-Phone 41
4.4 42
43
5.1 43
5.2 44
5.3 45
5.4 46
5.4.1 IF XIF 46
5.4.2 SP 47
5.4.3 47
5.4.4 48
5.5 48
51
6.1
51
6.2 51
HTK 53
HTK 53
54
55
- VI -
57
61
63
63
1-1 ............................................................................4
3-1 SCSPM .........................................................................24
3-1 ......................................................................................30
3-2 ......................................................................................30
3-2 SCSPM MGCPM ............................................................31
4-1 ......................................................................................33
4-2 ..........................................................................34
4-3 ..................................................................................35
4-4 ..................................................................................35
4-1 ...............................................................................38
4-2 I ...................................................................39
4-3 II .................................................................39
4-5 CI-Phone ...............................................................................40
4-4 ..................................................41
4-6 CD-Phone .........................................................42
5-1 (Initial/Final, IF)...........................................................43
5-2 Extended Initial/Final, XIF ..................................44
5-1 sp ........................................................................46
5-3 IF XIF ......................................................................46
5-4 sp ..........................................................................47
5-5 ..................................................................47
5-2 ..................................................................................48
-1 .....................................................................55
-2 .....................................................................56
- VII -
1.1
1.1.1
50
1952 Davis
10 50
1960 Denes
70
LPC
DTW[Vintsjuk 1968]VQ
-1-
1.1.2
Speaker Dependent, SD
Speaker Independent, SI
Small VocabularyMedium VocabularyLarge
VocabularyIsolated Word
Connected WordContinuous Speech
1.1.3
-2-
[Wilpon 1991]
2 HMM
HMM
Viterbi [Viterbi
1967]
HMM
HMM
[Franzini 1990][Morgan 1990]
HMM
3
Statistical Language ModelKnowledge-based Language
Model
N-Gram
Jelinek [Jelinek 1983] N-Gram
Nadas
[Nadas 1985]Katz [Katz 1987]
4
-3-
HMM Viterbi
[Lee 1989]
1.2
1.2.1
1.1
A
A
L
1-1
A w
-4-
w*
w* = arg max P( w A) = arg max
w
P( A w) P( w)
P( A)
(1.1)
Bayes P(A|w)
P(w)
w P(A)
w* = arg max P( w A) P( w)
(1.2)
1.2.2
HMM
HMM
ANN
ANN [Verhasselt
1998]
-5-
ANN ANN
ANN
ANN
1.3
[Zheng 1996]
1
(Phoneme) (Triphone)
Triphone
(Syllable)
-6-
(Phoneme)
Coarticulation
Context Dependent,
CD triphone
triphone
(Initial/Final, IF)
[Li 2000]
Context Dependent
Initial/Final, CD-IF
1.4
HMM
Semi-Continuous HMM, SCHMM HMM Continuous HMM,
CHMM
-7-
SCHMM
EasyTalk [Zheng
1999]
[Reichl 2000]
[Breiman 1984]
1.5
(1)
Semi-Continuous Segmental Probability ModelSCSPM
(2) HTK
HTK (3)
-8-
-9-
2.1 HMM
HMM
HMM HMM
[Rabiner 1989]
2.1.1 HMM
HMM {S (t ): t T }
t st = S (t ) t
t
{S (t ): t T }
T
I T {S (t ) :t = 0,1,2,L}
= P{S (t + 1) = s S (t ) = s t }
(2-1)
aij (t ) = P S (t + 1) = j S (t ) = i
(2-2)
t i j
aij (t ) 0, i , j I
(2-3)
(2-4)
j I
ij
(t ) = 1, i I
(HMM)
HMM t
- 11 -
st
R Q
{ }
HMM aij
HMM
NN
{ }
(3) B = b j ()
N 1
( )
b j ( x ) = Pd output = x state = j Pd {}
(4) = { i } N 1 i = P{s(1) = i}
HMM
{ }
A = aij
NN
N HMM = {, A, B}
2.1.2 HMM
HMM
HMM O
O
- 12 -
M {O ( m ) , m = 1 ~ M } HMM
HMM
O
O O
(Forward-Backward) Baum-Welch [Baum 1972]
( 1 j N )
N
(2-5)
1 ( j ) = i bi (o1 )
(2-6)
i =1
( 1 i N )
N
(2-7)
T (i ) = 1
(2-8)
j =1
1 (i ) 1 (i )
i =1
T 1
aij =
(2-9)
t =1
(i )
(i )aij b j (ot +1 ) t +1 ( j )
(2-10)
T 1
t =1
(i ) t (i )
bi ( x ) = f ( , O) ( O )
(2-11)
bi () HMM
P{O } = T (i )
(2-12)
i =1
1 ( j ) = j b j (o1 ), 1 j N
(2-13)
( 2 t T ,1 j N )
(2-14)
(2-15)
(2-16)
)
st( ML ) = t +1 ( st(+ML
1 ), 1 t T 1
(2.17)
1i N
t =2
t 1
} {
= s ( ML ) a s ( ML ) s ( ML ) bs ( ML ) (ot ) = P S ( ML ) Pd O S ( ML ) ,
t
t
1
t 1
t =2
t =1
(2-18)
()
Pd {O } = Pd {O, S } = Pd {O , S} P{S }
S
t =2
S
(2-19)
t =2
S
S = {st 1 t T } Viterbi
2.1.3 HMM
b j ( x ) VQ
HMM HMM (Discrete HMM, DHMM) HMM (Continuous
t =2
S
T
(2-20)
Pd {O , S} = bst (ot )
(2-21)
t =1
bst (ot )
2.1.3.1 HMM
DHMM (Vector QuantizationVQ)
(codeword)(codebook)
bst (ot ) bst (V (ot )) V ()
bst (v)
T
(i) (i)
bi (v) =
t =1
V ( ot ) = v
(2-22)
(i) (i)
t =1
DHMM
- 15 -
DHMM
2.1.3.2 HMM
DHMM CHMM CHMM
1989][Huang
1989]
bn ( x ) = g nm N ( x; nm , nm )
(2-23)
m =1
g
m =1
=1
nm
(2-24)
M n
M MGD
2.1.3.3 HMM
MGD ( nm
nm g nm ) M bn ( x ) M
SCHMM VQ
} { }
bst (ot ) = f ot st = f ot Vl , st P Vl st
L
l =1
= f ot Vl , st b
l =1
( D)
st
{V } = f {o V } b {V }
l
l =1
( D)
st
(2-25)
{ }
f Vl Vl
bn (ot ) = g nl f ot Vl
l =1
(2-26)
2.1.4 HMM
HMM
HMM
HMM
HMM
HMM
HMM
HMM HMM
- 17 -
LDA
HMM
HMMDHMM
HMMCHMM SCHMM
2.2 MGCPM
HMM Viterbi
HMM MGCPM
CHMM
K k
Z k = {z1 , z 2, ..., zT } , z j
M
p( z j | ) = { pi p( z j | i )
(2-27)
i =1
z j Z d j =1,2,, T M
1
1
p( z j | ) = (2 ) d / 2 | Ri |1/ 2 exp ( z j ui )T Ri ( z j ui )
2
i u i Ri
- 18 -
(2-28)
pi i p i = 1, p i 0
j =1
pi i i =1,2,, M
Z
Z p( Z | ) Dempster EM
[Dempster 1977]
Z
T
p( Z | ) = p( z j | )
(2-29)
j =1
p( Z | ) p( Z | )
f = ( ln p( Z | ) ) = 0
(2-30)
i Ri Pi i 1,2,L , M i
f = (ln p ( Z | ) )
i
= p( z j | ) 1 pi p( z j | i )
i i =1
j =1
= p( z j | ) 1 pi p( z j | i )
(2-31)
j =1
T
= p( z j | ) 1 pi p( z j | i ) Ri ( z j i )
1
j =1
=0
Pi , j = p ( z j | ) 1 p i p ( z j | i )
(2-32)
i = (Pi , j z j )
T
j =1
P
j =1
i, j
(2.33)
)
)
i i i i
)
)
pi Ri pi Ri
- 19 -
i = (Pi , j z j )
T
j =1
P
j =1
(2-34)
i, j
T
)
Ri = Pi , j ( z j i )( z j i ) T
j =1
) P
T
j =1
(2-35)
i, j
T
)
pi = T 1 Pi , j
(2-36)
j =1
(2-36) pi = 1, pi 0.
j =1
(2-29)
2.3
1
418
60 40
40 64,000
192,000
(State Tying)Tied
- 20 -
Mixture HMM
3
Decision Tree
- 21 -
HMM HMM
MGCPM
3.1 SCSPM
3.1.1 (SPM)
HMM HMM
[Juang 1985][Zheng 1997]
Segmental Probability Model, SPM
MGDEqual
1989]
HMM SPM
3.1.2 SCHMM
HMM(CHMM) HMM(SCHMM)
pdf CHMM SCHMM
CHMM
HMM HMM
SCHMM
CHMM
SCHMM
3.1.3 SCSPM
SCSPM SCHMM
MGCPM SCSPM
SCSPM
3-1 SCSPM
3.2
SCSPM
SCSPM
- 24 -
3.2.1
K-
(1) S
(2) L
(3)
(4) M C1(0)C2(0) CM(0)
(5) D(0)=
(6) m=1
(7) S M S1(m)S2(m) SM(m)Sl(m)
d ( X , Yl ( m 1) ) d ( X , Yi ( m 1) ), i
(3-1)
(8) D(m)
M
D ( m) =
l =1
d ( X ,Y
( m 1)
(3-2)
X Sl( m )
(9) D(m)(m)
( m)
D ( m1) D ( m)
D ( m)
= ( m) =
D
D ( m)
(3-3)
1
Ni
(3-4)
X Si( m )
(11) (m)<(13)
(12)
(12) m<L?m=m+1(7)(13)
(13) C1(m)C2(m) CM(m) D(m)
(14)
- 25 -
L
0 1 (m)<
L
L
20
K-
LBG LBG
K- S
X X0 1.01
1.01 X0X1
K- 2
X 2 B
M=2B
LBG
LBG
MGD MGCPM
1 n1
2 LBG n n2
n n2
3 MGD
4 nN 2
N4,096
4,096
- 26 -
3.2.2
MGCPM MGCPM
MGCPM
v
v
D(G 1 , G 2 ) =|| 1 2 ||
(3-5)
v
Gt t t Gt
2 Gi G j Gi G j
v
v
G
= ( i + j ) / 2 , = ( i + j ) / 2
(3-6)
3 n n = N 1
3.3
M m (m=1,2,..,M) m
Z J Z Zj Z
j(j=1,2,..J) g m m
MGD, Z j
M
p( z j | ) = {g m ( p( z j | m )}
(3-7)
m =1
g
m =1
=1
(3-8)
- 27 -
j =1
j =1
h( ) = ln( p ( Z | )) = ln( p( z j | )) = ln p (z j | )
(3-9)
3-9
M
f = h( ) + ( g m 1)
(3-10)
m =1
t=1,2,M,
J
j=1
m =1
p( z j | ) 1 gt ( g m p( z j | m )) + = 0
(3-11)
g t = p ( z j | ) 1 g t p ( z j | t )
(3-12)
j =1
= p ( z j | ) 1 g t p ( z j | t ) = J
(3-13)
t =1 j =1
3-123-13 g t
J
(
g t = J 1 p ( z j | ) 1 g t p ( z j | t )
(3-14)
j =1
3.4 SCSPM
SCHMM
1
0
2 N
3
0
- 28 -
(3-8)
SCSPM
MGD MGD
4 MGD
1 4
(3-8)
3.5
863
13 520
10 3
16KHz 16 MFCC
6
3.5.1
SCSPM
1,0242048 4,096
4
4,096 3-1
1,024 4,096
36.6%
- 29 -
3-1
1,024
2,048
4,096
60.39
61.33
69.50
70.21
75.49
(%)
5
10
85.74
91.56
86.08
91.64
88.79
92.76
88.93
92.94
90.48
94.42
3.5.2
2,048
(%)
50
1
2
3
4
100
95
90
85
80
75
70
65
60
(th=0.005)
(th=50)
(th=90%)
(th=0.00001)
10
3-2
SCSPM
3-2 SCSPM MGCPM
MGCPMs
(4)
MGCPMs
(8)
SCSPMs
(%)
10
9,669
74.35
89.15
93.78
18,685
75.87
90.62
94.55
4,096
75.49
90.48
94.42
SCSPM 4
3.6
SCSPM
MGCPM SCSPM
SCSPM
- 31 -
MGCPM SCSPM
418
73,000,000
4.1
4.1.1
[Ma 2000]
4-1
/aI/
/a/
/Ie/
/eI/
/eN/
/e/
/Ci/
/CHi/
/Bi/
/oU/
/o/
/u/
/v/
ai,ana
a
iee
eie
ene
e
cisizii
chishizhii
i
ouo
o
u
yu
- 33 -
13
/b/, /c/, /ch/, /d/, /f/, /g/, /h/, /j/, /k/, /l/, /m/, /n/, /ng/, /p/, /q/, /r/, /s/, /sh/, /t/,
/x/, /z/, /zh/
/ng/ 22
/er//sil/ 37
4.1.2
4-2
a
ang
ei
er
ou
ian
ie
iong
ua
uang
ueng
van
io
/a/
/a/ng/
/eI/Bi/
/er/
/oU/u/
/Bi/Ie/n/
/Bi/Ie/
/Bi/u/ng/
/u/a/
/u/a/ng/
/u/eN/ng/
/v/Ie/n/
/Bi/o/
ai
ao
en
o
i
iang
in
iou
uai
uei
uo
ve
/aI/Bi/
/a/u/
/eN/n/
/o/
/Bi/
/Bi/a/ng/
/Bi/n/
/Bi/oU/u/
/u/aI/Bi/
/u/eI/Bi/
/u/o/
/v/Ie/
- 34 -
an
e
eng
ong
ia
iao
ing
u
uan
uen
v
vn
/aI/Bi/
/e/
/eN/ng/
/u/ng/
/Bi/a/
/Bi/a/u/
/Bi/ng/
/u/
/u/aI/n/
/u/eN/n/
/v/
/v/n/
4.2
4.2.1
[wu 1989]
4-3
(High)
(Medium)
(Low)
(Top Vowel)
(Front)
(End)
(Unrounded)
(Rounded)
(Apical Vowel)
E1(Evowel)
E2(Evowel2)
I(Ivowel)
O(Ovowel)
UV(u and v)
4-4
(Stop)
(Aspirated Stop)
(Unaspirated Stop)
(Affricate)
(Aspirated Affricate)
- 35 -
(Unaspirated Affricate)
(Fricative)
2(Fricative2)
(Voiceless Fricative)
(Voice Fricative)
(Nasal)
2(Nasal2)
3(Nasal3)
(Labial)
2(Labial2)
(Apical)
(Apical Front)
1(Apical1)
2(Apical2)
3(Apical3)
1(Apical End)
2(Apical End2)
(Tongue Top)
(Tongue Root)
2(Tongue Root2)
37
3
Stop 3
Left Question
QS L_Stop
Right Question
QS "R_Stop"
Central Question
QS "C_Stop"
- 36 -
4.2.2
[Gao 1998]
L ( S ) = log P ( X | S ) S X = {X 1 , X 2 ,L , X N }
N X 1 = {X 11 , X 21 ,L, X 1N 1} , X 2 = {X 12 , X 22 , L , X N2 2 }
X = X 1U X 2 , X 1 I X 2 =
L parent , L1child L 2child
= L1child + L2child L parent
L (S )
[Reichl 2000]
Q( s ) = s ( xt ) log N ( xt | ( S ), ( S ))
(4-1)
xt sS
s ( xt ) xt s N(|,)
Q (S ) L (S )
Q( S ) Q( S ) L( S ) L( S )
(4-2)
Q (S )
4.2.3
37
l-c+r
c l r
8,757
(tie)
- 37 -
4-1
1 5
23 4
4-2 /aI/
2
II
4-3 2
- 38 -
b-aI+f.s[2]
f-aI+m.s[2]
j-aI+z.s[2]
s-aI+t.s[2]
R_Affricate?
n
Root
R_Nasal?
y
L_Stop?
n
L_Affricate?
y
4-2 I
*.s[2]
C_Affricate?
n
Root
y
R_Nasal?
y
L_Stop?
n
C_Affricate?
y
4-3 II
I 373111
- 39 -
4.2.1
II 3
4.2.1
4.3
863 80
520
16KHz 13 MFCC 1
42
4.3.1
CI-Phone
CI-Phone
4-5 CI-Phone
1
2
4
8
(%)
31.30
37.69
43.42
47.37
8 47%
4.3.2
4.2.3 I II
- 40 -
I (=350 6,587)
II (=350 6,285)
II (=100 10,528)
(%)
80
75
70
65
60
1
4-4
II
I II
II
I 3 II
350 100
I I
4.3.3 CD-Phone
CD-Phone
6
4-6
4-6
CD-Phone
8 CD-Phone
10%
- 41 -
4-6 CD-Phone
1
2
4
6
8
(%)
CD-Phone
60.51
68.92
65.04
72.35
69.92
74.59
72.37
75.62
73.83
76.47
4.4
I
CD-Phone
CD-Phone
10% CD-Phone
- 42 -
5.1
21 38 [Li 2000]
5-1 (Initial/Final, IF)
(21)
b, p, m, f, d, t, n, l, g, k, h, j, q, x, zh,
ch, sh, z, c, s, r
(38)
ang
angzhangzhang
- 43 -
_a, _o_e_i_u_v
ang_aang
(27)
b, p, m, f, d, t, n, l, g, k, h, j, q, x, zh,
ch, sh, z, c, s, r, _a, _o, _e, _i, _u, _v
(38)
XIFs
IFs
5.2
Affricate
QS R_Affricate
QS L_Affricate
+-
AffricateAspirated
Aspirated Affricate
- 44 -
QS R_Type_A
QS L_Type_A
61
32 29
QS R_p
{*+p }
QS L_p
{p-*}
IFs XIF
a
QS L_Type_A
5.3
1. sil 2 4
0.2
2. Short Pause, sp sp 3
1 3 2 sil
3
3. sp 1 3 0.3
4. sil 3 sp 2
- 45 -
Sil
1
sp
5-1 sp
5.4
4.3
5.4.1 IF XIF
IF XIF
5-3 IF XIF
1
2
4
CI-Phone
31.30
37.69
43.42
(%)
CI-IF
43.71
50.13
54.25
CI-XIF
47.09
55.26
58.67
XIF IF 4
- 46 -
XIF
5.4.2 SP
sp CD-XIF
5-4 sp
1
2
4
(%)
SP
SP
74.85
75.24
76.83
77.28
78.77
79.30
sp 0.5
5.4.3
1
5-5
100
200
350
21,222
14,630
9,311
(%)
76.23
76.19
75.24
42,255
- 47 -
5.4.4
(CI-Syllable)CD-Phone
CD-IF CD-XIF
I350
CI-Syllable
CD-Phone
CD-IF
CD-XIF
85
80
75
70
65
60
55
1
5-2
8 8
CD
CI-Syllable CD-Phone CD-IF CD-XIF
CD-IF 4
80.43% CI-Syllable 25%
5.5
- 48 -
(CD-IF) IF
6 XIF XIFs
CD-XIF CD-IF 4 CD-XIF
80.43% 25%
- 49 -
6.1
1(SCSPM) HMM
MGCPM
SCSPM
MGCPM SCSPM
2 HTK HTK
4CD-Phone
CD-Phone
10%
5CD Initial/Final,
CD-IF
Extended Initial/Final, XIF
XIF IF CD-XIF
80% 25%
6.2
- 51 -
- 52 -
HTK
1. HTK
HTK Hidden Markov Model HMM
HTK
Hbuild HTK
HCopy
HDMan
HLEd
HList HTK
HLStats HTK
HSGen HTK
HSLab
HCompV
(Embedded Training)
z
HHEd HMM
HInit HMM
HQuant HTK VQ
HSmooth HMM
HResultsHTK
HVite Viterbi
2.
HTK
HERest
HInit HRest HERest
Baum-Welch
- 54 -
HMM
(Proto)
HCompV
HInit
(mlf)
HRest
HERest(5)
-1
3.
HLEd HHEd
HERest
- 55 -
(mlf)
HLEd
(mlf)
HHEd(
)
HERest(
5)
HHEd(
)
HERest(
5)
-2
- 56 -
[1] Bahl, L.R., Brown, P.F., Souza, P.V., et al, (Bahl 1986) Maximum mutual information
estimation of hidden Markov model parameters for speech recognition, ICASSP, pp.49-52,
1986
[2] Baum, L.E., (Baum 1972)
[10] Gao,S., Xu,B., Huang, T.Y., (Gao 1998) Class-triphone Acoustic Modeling Based on
Decision Tree for Mandarin Continuous Speech Recognition, International Symposium on
Chinese Spoken Language Processing (ISCSLP 98), ASR-A1, Singapore , 1998
- 57 -
[11] Huang, X.D., Jack, M.A., (Huang 1989) Semi-continuous hidden Markov models for
speech signals, Computer Speech and Language, vol.3, pp.239-251, 1989
[12] Jelinek, F., Mercer, R.L., Bahl, L.R., (Jelinek 1983) A maximum likehood approach to
continuous speech recognition, IEEE Trans. On Pattern Analysis and Machine Intelligence,
Vol. PAMI-5, pp179-190, 1983
[13] , (Jiang 1989), , [
]. , 1989
[14] Juang, B.H., Rabiner, L.R., (Juang 1985) A probabilistic distance measure for hidden
Markov models, AT&T Technical Journal, 64(2), pp.391~408, 1985
[15] Katz, S., (Katz 1987) Estimation of Probabilities from Sparse Data for the Language
Model Component of a Speech Recognizer, IEEE Trans. Acoust., Speech, Signal
processing, vol. ASSP-35, pp. 400-401, 1987
[16] Lee, C.H., Rabiner, L.R., (Lee 1989) A frame synchronous network search algorithm for
connected word recognition, Proc. of IEEE, 77(2), pp.257-285, 1989
[17] Li, J., Zheng, F., and Wu, W.H., (Li 2000) Context-Independent Chinese Initial-Final
Acoustic Modeling, International Symposium on Chinese Spoken Language Processing
(ISCSLP00), pp. 23-26, Beijing, 2000
[18] Ma, B., and Huo, Q., (Ma 2000) Benchmark results of triphone-based acoustic modeling
on HKU96 and HKU99 Putonghua corpora, International Symposium on Chinese Spoken
Language Processing (ISCSLP00), pp.359~362, Beijing, 2000
[19] Morgan, N., Bourlard, H., Morgan 1990Continuous Speech Recognition Using
Multilayer Perception with Hidden Markov Models, ICASSP, 1990
[20] (Mou 1998) []
1998
[21] Nadas, A., (Nadas 1985) On Turings formula for word probabilities, IEEE Trans.
Acoust., Speech, Signal processing, vol. ASSP-33, pp.1414-1416, 1985
- 58 -
[22] Reichl, W. and Chou, W., (Reichl 2000) Robust Decision Tree State Tying for
Continuous Speech Recognition, IEEE Trans. Speech and Audio Proc., Vol.8, No.5,
pp.555-566, 2000.
[23] Rabiner, L.R., (Rabiner 1989) A Tutorial on Hidden Markov Models and Selected
Applications in Speech recognition, IEEE Proceedings, Vol. 77, No.2, pp.257-286, 1989
[24] Sadaoki, F., (Sadaoki 1985) Digital Speech Processing, Synthesis, and Recognition,
Tokai University Press, 1985
[25] Verhasselt, J., Martens, J.P., (Verhasselt 1998) Context modeling in hybrid
segment-based/neural network recognition systems, Proceedings of ICASSP, Vol.1,
pp.501-504, 1998
[26] Vintsjuk, T.K, (Vintsjuk 1968) Recognition of Words of Oral Speech by Dynamic
Programming, Kibernetica, Vol. 81, No. 8, 1968.
[27] Viterbi, A.J., (Viterbi 1967)
- 59 -
[34] Yong, S., Kershaw, D., Odell, J., Ollason, D., et al, (Yong 1999) The HTK Book (for
HTK Version 2.2), Cambridge University, 1999
[35] (Zheng 1996)
NCMMSC-96pp.32~35, 1996
[36] , , , Zheng 1997
3 (NCCIIIA). : ,
pp. 98-103, 1997
[37] Zheng, F., Mou, X.L., Wu, W.H., et al, (Zheng 1998) On the embedded multiple-model
scoring scheme for speech recognition, International Symposium on Chinese Spoken
Language Processing (ISCSLP98), ASR-A3, pp49-53, Singapore, 1998
[38] Zheng, F., Song, Z.J, and Xu, M.X, (Zheng 1999) EASYTALK: A large-vocabulary
speaker-independent Chinese dictation machine, EuroSpeech'99, Vol.2, pp.819-822,
Hungary, 1999
- 60 -
- 61 -
1977 1 1994
1999 6
1.
Beijing
2.
Jiyong ZHANG, Fang ZHENG, Mingxing XU, Shuqing LI, Intra-Syllable Dependent
Phonetic Modeling for Chinese Speech Recognition, International Symposium on
Chinese Spoken Language Processing (ISCSLP2000), pp. 73-76, 2000, Beijing
3.
4.
Jiyong ZHANG, Fang ZHENG, Jing LI, Chunhua LUO, and Guoliang ZHANG,
Improved Context-Dependent Acoustic Modeling for Continuous Chinese Speech
Recognition, EuroSpeech, 2001 ()
- 63 -