Professional Documents
Culture Documents
139501392 Nghien cứu về nhận dạng giọng noi tiếng Việt va ứng dụng
139501392 Nghien cứu về nhận dạng giọng noi tiếng Việt va ứng dụng
Trang | 2
Trang | 3
Trang | 4
Trang | 5
Trang | 6
Trang | 7
CHNG 2. C S L THUYT
2.1. Tng Quan V m Hc V Ting Ni
2.1.1. m hc
2.1.1.1. Khi nim
Khi c ngun pht ra m thanh (nh ting trng, ting nhc c, ting ni), ta s
nghe v cm nhn c m thanh pht ra. Vt to ra c m thanh cn c gi l
ngun pht m, m thanh chnh l s dao ng c ca cc thnh phn vt cht trong
mt mi trng no lan truyn v n tai ta v khi ta cm nhn c m thanh.
Trong mi trng khng c vt cht tn ti nh chn khng, khng c dao ng song
c do cng khng c m thanh tn ti. Trong i sng x hi, m thanh l phng
tin giao tip, truyn t thng tin ph bin v xu hin t lu i nht ca con ngi.
Khi nghin cu v m thanh, ngi ta thng quan tm n 2 c im: c trng vt
l v t trng sinh hc.
2.1.1.2. Biu din tn hiu m thanh trong min thi gian v tn s.
Thng thng, ngi ta dng hm ton hc x(t) biu din m thanh trong
min thi gian. Trong :
- t: thi gian
- x: bin bin thin, hay cn gi l ly .
Nh vy, ta c th biu din x(t) bng th theo thi gian.t x(t) = A.sin
= A. sin 2 F0t
Trang | 8
loi m:
Trang | 9
2.1.1.4. n v o m thanh
Ngi ta thy rng con ngi cm nhn to ca m thanh khng t l thun
vi cng m thanh m theo hm s m.
Bel = 10lg P2/P1. (Pht m l Ben)
decibel = 20lg I2/I1 (Pht m l xi ben)
2.1.2. Ting ni
Ting ni l m thanh pht ra t ming (ngi). Nghin cu ting ni gm: B
my pht m ca con ngi. Th cm m thanh ca tai ngi. Phn loi ting ni.
B my pht m ca con ngi gm:
i dy thanh (vocal fold, vocal cord)l hai c tht trong cung hng, c hai
u dnh nhau, cn hai u dao ng vi tn s c bn l Fo, ting Anh gi l
pitch, fundamental frequency. Fo ca nam gii nm trong khong 100-200 Hz,
ca n gii l 300-400 Hz, ca tr em l 500-600 Hz.
Trang | 10
sc, nng khi vit. Phn tch my mc cho thy thanh iu l s thay i Fo, tn s c
bn pitch, trong qu trnh pht m cc nguyn m v tai ngi cm nhn c. Ting
Vit c 6 thanh th hin s phong ph v c o, trong khi ting Trung quc c 4
thanh. Tuy nhin c dn mt s vng Vit Nam c th khng phn bit du ? v du
~ nn hay vit sai chnh t.
Ging bng (high voiced pitch, hay high pitched) hay ging trm (low voiced
pitch) l Fo cao hay thp. Nh vy Fo ng vai tr rt quan trng trong cm nhn,
trong th cm m thanh ca con ngi.
Ting bng hay ting trm tng ng vi di tn s cao hay thp. Trong thc
t ngi ta dng loa trm l loa loa bass hay loa sub woofer, loa tp hay loa bng
tng ng vi loa thch ng pht cc m trong vng tn s cao, treble.
2.2. H Thng Ng m Ting Vit
2.2.1. c im ca ting Vit
Khc vi mt s ngn ng khc nh ting Anh, Php , ting Vit l ngn
ng n m tit, tc l cc t khi vit ra ch c ln thnh mt ting, khng c t no
(thun Vit) pht m t 2 ting tr ln. Mt t c cu to gm 2 phn l: nguyn m V
(vowel) v ph m C (consonant) v c kt hp theo 3 cch to nn t trong
ting Vit:
- C+V (ph m + nguyn m). V d: ba, m, i
- C+V+C (ph m + nguyn m + ph m). V d: bn, con, mong
- V+C (nguyn m + ph m). V d: an, ng, n
Trong ting Vit, ngoi 2 thnh phn chnh l nguyn m, ph m, cn c cc
thnh phn khc gip cho Vit phn loi trong m tit tr nn r rng nh nh hp m,
tam hp m, ph m n, ph m kp. Khi hc ting Vit, ngay t c phi hc thuc
cc nguyn m, ph m, nh hp m, tam hp m, ph m n, ph m kp, quy tc
Trang | 11
Cch vit
Pht m
Cch vit
Pht m
//, //, //
//
/o/, /w/, //
//
//, //
//
/u/, /w/
Trang | 12
/e/, //
//
/i/, /j/
/i/, /j/
Nguyn m i & ba
Pht m
Cch vit
Pht m
Cch vit
Nguyn m i
/uj/
ui
/iw/
iu
/oj/
/ew/
/j/
oi
/w/
eo
/j/
/w/
/j/
y,
/w/
u,
/j/
ai
/w/
ao
/j/
ay, a
/w/
au, o
/j/
/w/
/i/
ia, ya, i,
y
/u/
ua, u
//
a,
Nguyn m ba
/iw/
iu, yu
/uj/
ui
/j/
/w/
Trang | 13
Vn
m m
m chnh
m cui
a. m u:
Ti v tr th nht trong m tit, m u c chc nng m u m tit. Nhng
m tit m chnh t khng ghi m u nh an, m, m c m u bng ng tc
khp kn khe thanh, sau m ra t ngt, gy nn mt ting bt. ng tc m u y
c gi tr nh mt ph m v ngi ta gi l m tc thanh hu (k hiu: /?/). Nh vy,
m tit trong ting Vit lun lun c mt m u (ph m u). Vi nhng m tit
mang m tc thanh hu nh va nu trn th trn ch vit khng c ghi li, v nh
vy v tr xut hin ca n trong m tit l zero, trn ch vit n th hin bng s vng
mt ca ch vit.
b. m m
m m l yu t th hai trong m tit, n thng nm trong cc m tit to
nn s khc nhau gia m trn mi (nh ton) v m khng trn mi (nh tn).
m m trong ting Vit c miu t gm 2 dng: m v bn nguyn m /u/ (trong
ton) v m v trng (trong tn). Trn ch vit, m m trng th hin bng s
vng mt ca ch vit, m m /u/ th hin bng ch u (nh tun) v ch o
(nh loan).
c. m chnh
ng v tr th ba trong m tit, m chnh c xem nh l nh ca m tit,
mang m sc ch yu ca m tit v lun l nguyn m. Do c xem l thnh phn
ht nhn ca m tit, nn khng bao gi c mt t no c c li khng c m
chnh, trong m tit, m chnh cng ng vai tr l m mang thanh iu ca m tit.
d. m cui
m cui nm cui cng trong m tit, n c chc nng kt thc mt m tit,
trong cc m tit ting Vit ta thng thy c s i lp bng cc cch kt thc khc
Trang | 14
Trang | 15
Hnh 2.3 l cu trc ca mt h nhn dng ting ni. Tnh hiu ting ni u
tin c tin x l v rt trch c trng. Kt qu thu c sau qu trnh ny l tp
cc c trng m hc (acoustic features), c to dng thnh 1 hay nhiu vct c
gi l vector c trng.
c th thc hin vic so snh, trc ht h thng phi c hun luyn v
xy dng cc c trng, sau mi c th dng so snh vi cc tham s u vo
nhn dng.
Trong qu trnh hun luyn, h thng dng cc vector c trng c a vo
c lng, tnh ton cc tham s cho cc mu (c gi l mu tham kho). Mt
mu tham kho chnh l bn mu dng so snh v nhn dng, cc mu tham kho
ny m phng cho mt t, mt m tit, hoc thm ch mt m v.
Trong qu trnh nhn dng, dy cc vector c trng c em so snh vi cc
mu tham kho (c xy dng trn). Sau , h thng tnh ton tng ng
(likelihood) ca dy vector c trng v mu tham kho. Vic tnh ton tng ng
c c thc hin bng cch p dng cc thut ton c chng minh hiu qu
nh thut ton Vitertbi (trong Hidden Markov Model). Mu c tng ng cao
nht c cho l kt qu ca qu trnh nhn dng.
2.3.2. Phn loi cc h thng nhn dng ting ni
2.3.2.1. Nhn dng t lin tc v nhn dng t tch bit
Mt h nhn dng ting ni c th l mt trong hai dng: nhn dng lin tc v
nhn dng tng t. Nhn dng lin tc tc l nhn dng ting ni c pht lin tc
Trang | 16
Trang | 17
Trang | 18
Trang | 19
lm r rng hn s khc bit gia 2 tn hiu ting ni. Hnh bn di minh ha cho
qu trnh rt trch c trng.
C nhiu phng php thc hin rt trch c trng, 2 trong s l phng
php MFCC v LPC.
Trang | 20
(2.1)
(2.2)
2.4.3. Tch t
Tn hiu ting ni s(n) sau khi c lm ni tn hiu, s c chuyn sang
tch t, tch t l cng on chia ton b tnh hiu thu c thnh nhng on tnh
hiu m trong ch cha ni dung ca m t
C nhiu phng php tch im u v im cui ca mt ra khi ton b
tn hiu ting ni, trong phng php dng hm nng lng thi gian ngn l
phng php c s dng ph bin. Vi mt ca s kt thc ti mu th m, hm
nng lng thi gian ngn E(m) c xc nh:
Em =
(2.3)
Trang | 21
x (n) ~
s ( M n)
n 0,1,..., N 1 0,1,... L 1
(2.4)
Trang | 22
~
~
Tc l, khung ting ni u tin, x0(n), cha cc mu ting ni s (0) , s (1) , ..,
~
s ( N 1) , khung ting ni th hai, x (n), cha cc mu
1
~
s ( M N 1) ,v khung th L, x (n), cha cc mu ~
s ( M ( L 1)) ,
L-1
~
s (M) , ~
s ( M 1) ,...,
~
s ( M ( L 1) 1) , ...,
~
s ( M ( L 1) N 1) .
0 n N 1
(2.7)
Trang | 23
Trang | 24
k=0,1,2,, N-1
Cng thc php bin i nghch (dng tng hp li tn hiu):
(2.9)
n=0,1,2,, N-1
Trong : x(n)=a(n)+b(n)
Kt qu chng ta c c khi thc hin FFT l dy tn hiu Xt(k) a vo b
lc Mel-scale
g. Lc qua b lc Mel-scale
Trong lnh vc nghin cu v nhn dng ting ni, i hi chng ta phi hiu
v m phng chnh xc kh nng cm th tn s m thanh ca tai ngi. Chnh v th
cc nh nghin cu xy dng mt thang tn s - hay gi l thang tn s Mel (Mel
scale) da trn c s thc nghim nhiu ln kh nng cm nhn m thanh ca con
ngi. Thang tn s Mel c nh ngha trn tn s thc theo cng thc:
(2.10)
Trang | 25
Trong :
Trang | 26
i. Bin i cosin ri rc
Da vo ph tn hiu ting ni ca con ngi trn min tn s, ta c th thy
rng ph tn hiu kh trn, nn khi ly cc gi tr nng lng ra t cc b lc, cc gi
tr ny c s tng quan kh gn nhau, dn n cc c trng ta rt c s khng r
rng. Chnh v th, ta thc hin bin i DCT (Discrete Cosin Transform) lm ri
rc cc gi tr ny ra cho n t tng quan vi nhau, lm tng tnh c trng ca cc
tham s. Gi tr thu c sau bc ny ta gi l h s Cepstral.
(2.11)
N l s knh lc.
Trong :
Trang | 27
Trang | 28
(2.13)
Trong a1, a2, ..., an coi l cc hng trn ton khung phn tch ting ni. Ta
chuyn ng thc (1) trn tng ng bng cch thm gii hn kch thch, Gu(n), c:
p
(2.14)
i 1
(2.15)
i 1
dn n hm chuyn:
H (z)
S ( z)
GU (z)
1
p
1 ai z i
1
A(z)
(2.16)
i 1
1
, to ra tn hiu ting ni s(n).
A( z )
Trang | 29
(2.17)
k 1
Trang | 30
s ( n) a k ( n k )
(2.18)
k 1
(2.19)
k 1
vi hm chuyn i li:
A( z)
p
E ( z)
1 ak z k
S ( z)
k 1
(2.20)
N 1 m
x~ (n )x~ (n m )
m 0,1,..., p
(2.21)
n0
Trang | 31
(2.22)
L 1
(2.23)
i(i ) ki
(2.24)
(2.25)
E (i ) (1 ki2 ) E (i1)
(2.26)
(2.27)
k m cc h s PARCOR
(2.28)
1 km
k
m
(2.29)
k
cm am ck am k , 1 m p
k 1 m
(2.30)
m 1
k
cm ck a mk , Q m p
k 1 m
(2.31)
Trang | 32
vi
(2.32)
(2.33)
q. Tnh o hm cepstral
(2.34)
Vi
Tham s
F1=6.67kHz
F1=8kHz
F1=10kHz
300 (45ms)
240 (30ms)
300 (30ms)
100 (15ms)
80 (10ms)
100 (10ms)
Trang | 33
10
10
12
12
12
M hnh LPC l m hnh thch hp cho vic x l tn hiu ting ni. Vi min
ting ni hu thanh c trng thi gn n nh, m hnh tt c cc im cc i ca
LPC ch ta mt xp x tt i vi ng bao ph c quan pht m. Vi ming ting ni
v thanh, m hnh LPC t ra t hu hiu hn so vi min hu thanh, nhng n vn l
m hnh hu ch cho cc mc ch nhn dng ting ni. M hnh LPC n gin v d
ci t tr phn cng v phn mm.
2.4.6. Tm hiu v Formant
hiu r hn v khi nim Formant, trc tin chng ta cn hiu khi nim
cng hng l g? l mt hin tng xy ra trong dao ng cng bc khi mt vt
dao ng c kch thch bi mt ngoi lc tun hon c cng tn s vi giao ng
ring ca n.
Trang | 34
Trang | 35
(x )2
1
p( x)
exp
2 2
2
(2.35)
p( x )
1
(2 ) D / 2
1/ 2
1
exp ( x )' 1 ( x )
2
(2.36)
p ( x)
x2
1
exp
2
2
(2.37)
Trang | 36
pGMM ( x ) wi pi ( x )
(2.38)
i 1
w
i 1
vi m hnh GMM. Nh vy, phn phi Gauss c phng sai v trng s ln bao
nhiu th c mc nh hng ln by nhiu i vi kt xut ca m hnh. Hnh 2.17
cho thy mc nh hng ca tng phn phi Gauss ln GMM.
Trang | 37
A = { aij }
aij pqt 1 S j | qt Si ,
1 i, j N
, sao cho
aij 0, aij 1
j 1
= {i}
Trang | 38
i pq1 Si , 1 i N ,
i 1
Trang | 39
A = { aij }
aij pqt 1 S j | qt Si ,
1 i, j N
, sao cho
aij 0, aij 1
j 1
B = { bj(k) }
b j (k ) pvk at t | qt S j , 1 j N , 1 k M
M
b (k ) 1
k 1
i pq1 Si , 1 i N ,
i 1
thun tin cho vic trnh by, mi m hnh HMM s c quy c i din
bi b tham s = ( , A, B).
Trang | 40
(2.39)
Cc bin t(i) c th c tnh theo qui np tng bc (hay thut ton qui
hoch ng) nh sau:
1) Khi to:
1 (i) i bi (O1 ) , 1 i N
(2.40)
2) Qui np:
t 1 ( j ) t (i)aij b j (Ot 1 ) ,
i1
1 t T 1 , 1 j N
(2.41)
3) u ra:
Trang | 41
p(O | ) T (i )
(2.42)
i 1
(2.43)
T (i) 1,
1 i N
(2.44)
2) Qui np:
N
(2.45)
j 1
3) u ra:
N
p(O | ) 1 ( j )
(2.46)
j 1
Trang | 42
Thut ton viterbi nh ngha bin t(i) l xc sut cao nht ca on chui trng
thi dn n Si thi im t v quan st c on O1, O2, , Ot cho trc m
hnh :
(2.47)
(2.48)
1 (i) i bi (O1 ) ,
1 (i) 0
1 i N
(2.49)
2) Lp qui np:
2t T
1 j N
(2.50)
2t T
1 j N
(2.51)
3) Kt thc:
p* max[ T (i )]
1 i N
(2.52)
qT * arg max[ T (i )]
1 i N
qt * t 1 (qt 1*) ,
t T 1, T 2, ... , 1
(2.53)
Kt thc thut ton, chui Q q1* q2* ... qT* chnh l li gii tha ng ca bi ton 2.
2.6.3.3. Bi ton 3 learning (Forward Backward)
Mc tiu ca bi ton th 3, cng l bi ton phc tp nht trong ba bi ton, l
tm cch cp nht li cc tham s ca m hnh = (, A, B) sao cho cc i ha xc
sut p(O|) xc sut quan st c chui tn hiu O t m hnh.
Trang | 43
t (i, j ) p(qt Si , qt 1 S j | O, )
(2.54)
Suy ra:
t (i, j )
p(qt S i , qt 1 S j , O | )
p(O | )
t (i)aij b j (Ot 1 ) t 1 ( j )
p(O | )
(2.55)
Gi t(i) l xc sut trng thi Si vo thi im t cho trc chui tn hiu quan
st O v m hnh :
t (i) p(qt Si | O, )
(2.56)
Trang | 44
t (i)
p(qt S i , O | )
p(O | )
(2.57)
t (i)
t (i) t (i)
p(O | )
(2.58)
t 1
(2.59)
T 1
(2.60)
(2.61)
T 1
aij
exnum ( S i , S j )
exnum ( S i )
(i, j )
t 1
T 1
t 1
(2.62)
t
(i)
Trang | 45
b j (k )
exnum _ in ( S j , vk )
exnum _ in ( S j )
( j)
t 1
s .t . Ot vk
(2.63)
t 1
( j)
Trang | 46
(2.64)
trong pGMM j chnh l xc sut u ra ca m hnh GMM trong trng thi Sj.
Nh vy, kh nng quan st c cc vector trong mi trng thi s b chi phi bi
GMM ca trng thi . Hnh 2.23 minh ha m hnh MGHMM c 3 trng thi.
Cng nh trong nh ngha ca HMM, mt m hnh MGHMM c N trng thi v
M phn phi Gauss trong mi trng thi s c i din bi b tham s = {, A,
B}, trong :
A = { aij }, aij l xc sut chuyn t trng thi Si sang trng thi Sj.
= { i }, i l xc sut khi u ca trng thi Si.
B = { bj }, bj l hm mt xc sut trong trng thi Sj.
aij v i th khng c g thay i so vi HMM, im khc bit chnh nm bj. T
(2.38) ta c:
Trang | 47
b j (Ot ) c jmGOt , jm , U jm
M
(2.65)
m1
c
m 1
jm
1.
i 1 (i)
(2.66)
T 1
aij
(i, j )
t 1
T 1
t 1
(2.67)
t
(i )
( j) ( j)
t
t ( j, k ) N t
t ( j ) t ( j )
j 1
c G (O , ,U )
t
jk
jk
M jk
c G (O , ,U )
jm
t
jm
jm
m 1
(2.68)
Trang | 48
(covariance matrix) ca phn phi Gauss th k trong trng thi Sj. Vi i lng t(j,
k), ta c cng thc cp nht cc thnh phn ca bj nh sau:
T
c jk
( j, k )
t
t 1
T M
( j, k )
t
t 1 k 1
T
jk
t 1
( j , k ) Ot
U jk
t 1
(2.70)
t 1
(2.69)
( j, k )
t 1
( j, k )
Input:
m hnh = {A, B, }
-
Output:
* = {A*, B*, *}
t 1 ( j ) t (i)aij b j (Ot 1 )
i 1
Trang | 49
for i = 1 N
T(i) = 1
for t = T-1 1
for i = 1 N
N
Trang | 50
Sau , mt cu ni cha bit thng qua cc cng c nhn dng HTK s cho
kt qu l cc pht m (phin m) ca nhng ting ni ny.
HTK c s dng nh mt b th vin, d m rng v pht trin. y l mt
Trang | 51
Trong Hnh 24, m t kin trc tng qut ca HTK, bao gm nhiu mdun
c phn chi theo tng mc ch khc nhau, trong c cc cng c h tr c bit
cho vic nhn dng. Hnh 25 di trnh by cc cng on ca vic xy dng h thng
nhn dng ting ni, v cc cng c m HTK cung cp qua tng cng on c th
nh: chun b d liu, hun luyn, kim tra kt qu, nh gi v cui cng l phn
tch.
Trang | 52
HTK s dng nhng tp tin d liu khc nhau chuyn i d liu cho cc
cng c khc nhau. Cc tp tin ny c th cha d liu m thanh (dng sng m hoc
dy cc vector c trng m thanh), d liu m thanh c nh nhn (phin m), cc
HMM (cc tham s nh ngha m hnh HMM) hoc cc mng nhn dng. nhng
tham s ca HMM c th c chia s gia cc tp hp, cc trng thi, hay cc
HMM, Trong HTK, n v nhn dng nh nht (do ngi dng nh ngha, c th
l m v, m tit hoc mt t) c m hnh bng mt HMM c gi l mt phone.
Vic kt hp cc m hnh trong cc b phn nhn dng da trn t con (sub-word),
HTK c cung cp cc HMM ph thuc ng cnh, do m hnh nh m (bi-phone)
hay tam m (tri-phone) c s dng cho mi ng cnh c kh nng gia cc phone
Sau y l cc cng c chnh trong HTK
-
Trang | 53
HVite: cng c ny s dung thut ton Viterbi nhn dng ting ni lin
tc, da trn cc rng buc v c php v tm kim tn hiu.
Trang | 54
Trang | 55
Trang | 56
H tr nhn dng ting ni ch trc tip hoc chia l, c kh nng nhn dng
ting ni ri rc v lin tc.
Trang | 57
Trang | 58
Trang | 59
Trang | 60
CMU-Cambridge
Statictical
Language
Modeling Toolkit.
Trang | 61
Trang | 62
l tm
kim
Trang | 63
4.1.3. Ci t Sphinx
To mt th mc tn sphinx trong Home folder (trong my o Ubuntu). Chp
cc file (Sphinxbase, Sphinxtrain, Pocketsphinx, CMUclmtk) va download trong mc
trn vo , gii nn. (lu xa i ch s version sau khi extract).
S dng ca s Terminal trong Ubuntu: Ctrl+Atl+t.
Nhp vo sudo apt-get update sau nhp vo password ca root user
(password s khng hin ln, nhp cn thn v nhn Enter). Lnh trn update chn
b cho cc gi ci t cn dng bng lnh apt-get. Ch update xong, nhp vo: cd
sphinx di chuyn ti th mc sphinx va to.
Ci t cc gi cn thit trc khi ci SphinxBase:
G cc lnh:
sudo apt-get install bison, ng ti v ci bison
sudo apt-get install autoconf
sudo apt-get install automake
sudo apt-get install libtool
Trang | 64
a. Ci t SphinxBase
Nhp: cd sphinxbase i vo th mc sphinxbase.
G cc lnh sau v ch thi hnh:
./autogen.sh
./configure
make
sudo make install
b. Ci t Sphinxtrain
T th mc sphinxbase trn, g lnh chuyn sang th mc sang th mc
sphinctrain : cd ../sphinxtrain. G cc lnh sau v ch thi hnh:
./configure
make
sudo make install
c. Ci t PocketSphinx
Chuyn sang th mc pocketsphinx, g cc lnh sau v ch thi hnh:
./autogen.sh
./configure
make
sudo make install
g tip lnh sau vo Terminal: sudo ldconfig h iu hnh thc hin cp
nht cc thc vin ng.
Chi tit qu trnh ci t c th tham kho theo ngun [12].
Trang | 65
wav
|___ speaker_1
|___ file_1.wav - file thu m mt cu ni ca ngi hun luyn
Trang | 66
|___
|___ speaker_2
|___ file_2.wav
a. Phonetic Dictionary (your_db.dic)
File ny cha ni dung v cch pht m ca mt t trong b hun luyn. Vd: t
HELLO c pht m bng s kt hp ca cc m v sau:
HH AH L OW (theo nh trang ch Sphinx v d). Khi , trong file ny s ghi
l:
HELLO H AH L OW
Mi mt dng trong file l nh ngha cch c ca mt t.
File ny c phn bit k t hoa thng. thng thng xy dng c file
ny, cn tm hiu v cch pht m ca mt t trong mt ngn ng nht nh. Nu l
ting Anh th h c cch c cho t ting Anh c trong t in. y cng lm 1 bc
quan trng xy dng thnh cng b hun luyn.
Trong ting Vit, cch c v cc vit mt t l gn nh gng lin vi nhau.
Khng cn c hng dn cch c khi hc ting Vit, trong ting Anh cch c v
cch vit khng ph thuc nhau, vd lead (dn u) & head (ci u). V d: mun
xy dng file ny cho ting Vit, ta c th nh ngha cc t bng nhiu cch nh sau:
BAN B A N
Vi cch trn, ta xem t BAN l mt m tit vi s kt hp ca 3 m v l B,
A, N.
BAN B AN
Vi cch trn, ta xem t BAN l mt m tit vi s kt hp ca 2 m v l B,
AN.
Sphinx khng h tr nh ngha dng word-base, ngha l cch c ca mt
t khng c chnh l t . Vd: BAN BAN l khng c cho php. Tuy nhin c
th lm mt phng php tng ng thay th nu mn xy dnh theo kiu wordbase. Khi phi nh ngha t theo kiu 1 t c nhiu cch c, v d:
BAN BAN BANG
Trang | 67
Trang | 68
Example
20
200
100
20
2000
5000
30
4000
16
20000
80
4000
32
60000
200
6000
16
60000
2000
12000
64
Trang | 69
16 bit mono (dnh cho nhn dng trn thit b di ng), tt loa my tnh khi thu,
micro hi di ming trnh hi th t mi ra lm nhiu tn hiu. Cc thng tin cn
thit chun b cho vic thu m c th c tm hiu chi tit ti VoxForge [14].
4.4. Tin hnh hun luyn m hnh bng Sphinx
Sau khi chun b mt folder train (th mc cha ton b cc file chun b bn
trn,file m thanh, ngi ta gi tn folder hun luyn l task folder) nh phn trnh
by trn. Tip theo ta s dng mt s lnh ca Sphinx Train to t ng cc m
lnh hun luyn (Training Script). M lnh hun luyn c nhim v thc hin ton b
cc cng on hun luyn bao gm: Tin x l tn hiu m thanh, rt trch t trng
m hc, xy dng, c lng m hnh HMM nh thut ton Baum-Welch,..
bt u khi tao cc th mc chun b (cc thc mc ny Sphinx dng cho
qu trnh hun luyn, to t ng) v cc file Script hun luyn. Ta thc hin dng
lnh sau vo Command Line trong Linux:
Dnh cho Sphinxtrain t bn 1.0.7 tr v trc:
../SphinxTrain/scripts_pl/setup_SphinxTrain.pl -task [task_folder_name]
../pocketsphinx/scripts/setup_sphinx.pl -task [task_folder_name]
Dnh cho Sphinxtrain bn snapshot:
sphinxtrain -t [task_folder_name]setup
Thc hin dnh lnh trn, Sphinx s t ng to cho ta cc th mc do
Sphinxtrain chun b thc hin hun luyn:
bin (c th khng xut hin trong bn Sphinxtrain mi)
bwaccumdir
etc
feat
logdir
model_parameters
model_architecture
python (c th khng xut hin trong bn Sphinxtrain mi)
scripts_pl (c th khng xut hin trong bn Sphinxtrain mi)
wav
Trang | 70
b. Ci t ng dn n cc file chun b
Kim tra xem cc thng s sau y c thay i hay khng so vi th mc hin
ti, y l ng dn do Sphinx t to ra truy cp n cc file m ta chun b,
trong $CFG_DB_NAME l tn task folder ca ta pha trn.
$CFG_DICTIONARY
= "$CFG_LIST_DIR/$CFG_DB_NAME.dic";
$CFG_RAWPHONEFILE = "$CFG_LIST_DIR/$CFG_DB_NAME.phone";
$CFG_FILLERDICT
= "$CFG_LIST_DIR/$CFG_DB_NAME.filler";
$CFG_LISTOFFILES
= "$CFG_LIST_DIR/${CFG_DB_NAME}_train.fileids";
$CFG_TRANSCRIPTFILE = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.transcription"
Trang | 71
Trang | 72
4.6. Ci t HTK
4.6.1. Hng dn ci t HTK framework:
Hng dn ny c thc hin trn Linux v HTK version 3.4
Chun b:
HTK-3.4.1.tar.gz
Trang | 73
txt (other files): cha cc file linh tinh nh t in, danh sch file, wdnet, gram,
train
wave: cha cc file Wave.
Mfc : Cha cc file mfc
Chun b cc file hun luyn:
a. Chun b file t in dict.txt
Trong file t in gm 2 phn: phn t v cch phin m tng ng ca t
.Chng ta c bic coi trng cch phin m nng cao hiu qu nhn dng. Fifle
ny c to bng tay v tuy bin cch pht m sao cho t hiu qu cao nht.
Lu : V trong cu ni chung ta thng c khong lng gia cc t nn
nhn din cho hiu qu, trong b t in ta nh ngha thm phn phin m cho cc t
# dict.txt
anh
a nh
ba
ba
barn
b ar n
bary
b ar y
bawsn
b aws n
beejnh b eej nh
boosn
b oos n
caf
c af
car
c ar
chaajm
ch aaj m
Trang | 74
Trang | 75
<TransP> 5
0.0 1.0 0.0 0.0 0.0
0.0 0.6 0.4 0.0 0.0
0.0 0.0 0.6 0.4 0.0
0.0 0.0 0.0 0.7 0.3
0.0 0.0 0.0 0.0 0.0
<EndHMM>
Gii thch:
EX: Thay th mi t trong words.mlf bng phin m tng ng trong t in dict.
IS: Chn m hnh lng (silence - sil) vo u v cui ca mt t.
DE: Xa tt c cc short pause (sp) c thm vo sau lnh EX.
//mkphones1.led
EX
IS sil sil
Trang | 76
u vo:
wordlist .txt: n gin l danh sch cc t c s dng trong wordnet, mi
t mt dng c cu trc nh trn.
GVHD: TS. V c Lung
Trang | 77
#!MLF!#
"*/hoa2_01.lab"
cuoojn
xuoosng
GVHD: TS. V c Lung
Trang | 78
xosa
lijch
suwr
susng
guwri
thuw
haxy
ddosng
taast
..
u vo:
mydict.txt : t in ta c t trc.
words.mlf: va c to trn.
mkphones#.led : cha cc lnh script chuyn words.mlf thnh
phones#.mlf
u ra:
phones#.mlf:
Lu :
phones0.mlf khng cha m sp, cn phones1.mlf th c. Thm sp vo chc vi
mc ch tng tnh hiu qu cho qu trnh nhn dng sau ny.
To file "listwavmfc.scp" : cha ng dn ti cc file wave v cc mfc tng
ng cho file wave .
perl pl\listwavmfc.pl train\wav train\listwavmfc.scp
Trang | 79
u vo:
wave: th mc cha cc tp tin .wav
u ra:
Listwavmfc.scp: tp tin text cha danh sch a ch cc file wave.
To danh sch file .mfc tng ng cho cc file .wav
Ti bc ny, cc file m thanh m ta thu s c rt c trng. HTK h tr
2 dng c trng MFCC v LPC. y ta s dng cc c trng MFCC. Cc
thng tin cu hnh khc c lu trong tp tin cu hnh HCopy.cfg
#config_HCopy.txt
#coding parameters - HCopy
SOURCEKIND = WAVEFORM
SOURCEFORMAT = WAV
TARGETKIND = MFCC_0_D_A
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
NUMCEPS = 12
ENORMALISE = F
u vo:
HCopy.cfg: Tp tin cha cc tham s cu hnh rt trch c trng nh
trnh by trn.
u ra:
Listwavmfc.scp: cha danh sch file wave v file mfc tng ng.
Trang | 80
u vo:
mfc : th mc cha cc tp tin .mfc.
u ra:
train.scp: tn tp tin cha danh sch file .mfc.
4.6.4. Giai on hun luyn
1. To Flat Start Monophones
Ti bc ny, chng ta s nh ngha ra mt prototype cho HMM. Vic gn
thng tin no cho prototype l khng quan trng, ch yu l xy dng mt ci khung.
Mt m hnh tt m HTK Book xut l m hnh 3 trng thi tri gia phi tun
t.
u vo:
HCompV.cfg: tp tin cu hnh HCompV s dng(xem ni dung trong
file) .
-f 0.01: yu cu xut file vFloor cha vector floor c gi tr bng 0.01
vector variance.
-S train.scp: cha danh sch cc tp tin c trng mfc.
-M hmm0: th mc m HCompV s dng cha proto (phi c to
trc).
proto.txt: tp tin cha cu trc proto nh phn trn trnh by (nh l
lu trong th mc hmm0).
Trang | 81
u vo:
hmm0/vFloors: file vFloors c to t lnh HCompV trn.
u ra:
hmm0/macros: file macros cn to.
3. To hmmdefs t ng
perl pl\mkHmmdefsFile.pl hmm0\proto ph\monophones0.txt hmm0\hmmdefs
u vo:
hmm0/proto: tp tin proto c c t bc trc.
monophones0: tp tin monophones0 c t bc trc.
u ra:
hmm0/hmmdefs: tn tp tin hmm s c lu li.
4. c lng cc tham s trong hmmdefs.
HERest -A -D -T 1 -C cfg\HERest.cfg -I mlf\phones0.mlf -t 250.0 150.0 1000.0 -S
train\train.scp -H hmm0\macros -H hmm0\hmmdefs -M hmm1 ph\monophones0.txt
u vo:
C HERest.cfg: tp tin cu hnh
I mlf/phones0.mlf: tp tin MLF c to t trc.
t 250.0 150.0 1000.0: tham s prunning.
S txt/train.scp: danh sch cc file .mfc.
H hmm0/macros: va to.
H hmm0/hmmdefs: va to.
ph/monophones0: danh sch cc phones (ngoi tr sp).
u ra:
M hmm1: cha tp tin hmmdefs v macros mi.
Sau khi c hmm1, ta hun luyn tip hmm2 v hmm3 bng HERest. Lu
l khi hun luyn hmm bc sau th da vo hmm ca bc trc
Trang | 82
u vo:
hmm4/macros :file macro c to ra t cc bc trc
hmm4/hmmdefs: file hmmdefs sau khi thm m hnh sp
ins/sil.hed :file cha thng s hiu chnh hmm mi
ph/monophones1.txt : danh sch cc phones (ngoi tr sp).
u ra:
hmm5: th mc cha hmmdefs v macro mi
6. Trainning m hnh mi hiu chnh sp bc trn .
HERest -A -D -T 1 -C cfg/HERest.cfg -I mlf/phones1.mlf -t 250.0 150.0 3000.0 -S
train/train.scp -H hmm5/macros -H hmm5/hmmdefs -M hmm6 ph/monophones1.txt
Trang | 83
8. Training 2 bc na
HERest -A -D -T 1 -C cfg/HERest.cfg -I mlf/aligned.mlf -t 250.0 150.0 3000.0 -S
train/train.scp -H hmm7/macros -H hmm7/hmmdefs -M hmm8 ph/monophones1.txt
9. To triphones,wintri.mlf t aligned.mlf
HLEd -A -D -T 1 -n ph/triphones1 -l * -i mlf/wintri.mlf ins/mktri.led mlf/aligned.mlf
u vo:
ins/mktri.led: cha lnh to triphones t monophones
mlf/aligned.mlf: monophones transcription c c lng li.
u ra:
-n ph/triphones1: danh sch cc triphones.
-i mlf/wintri.mlf: triphones transcription.
10. To "mktri.hed"
perl pl/mkTriHed.pl ph/monophones1.txt ph/triphones1 ins/mktri.hed
Trang | 84
u vo:
H hmm9/macros -H hmm9/hmmdefs: hmm ca monophones.
Ins/mktri.hed: tp tin cha lnh thc hin tri buc cc ma trn
chuyn ca mi
u vo :
H hmm12/macros -H hmm12/hmmdefs: hmm to bc trc
tree.hed: l tp hp cc ch th tm kim cc ng cnh ph hp cho vic
gom nhm.
ph/triphones1: danh sch cc triphones
GVHD: TS. V c Lung
Trang | 85
u ra:
M hmm13: Th mc cha m hnh hmm mi
15. Training 2 vong
HERest -A -D -T 1 -C cfg/HERest.cfg -I mlf/wintri.mlf -s stats -t 250.0 150.0 3000.0 -S
train/train.scp -H hmm13/macros -H hmm13/hmmdefs -M hmm14 tiedlist
2. Tao danh sach file .mfc tuong ung cho tung file .wav
HCopy -T 1 -C cfg/HCopy.cfg -S listwavmfc_test.scp
3. Tao file "test.scp" : chua danh sach duong dan cac file .mfc
perl pl/mkTrainFile.pl mfc test/test.scp
4. Testing
HVite -T 1 -C cfg/HVite.cfg -H hmm15/macros -H hmm15/hmmdefs -S test/test.scp -i
test/recout.mlf -w wdnet txt/mydict.txt tiedlist
Gii thch
C cfg/Hvite.cfg: u vo, tp tin cu hnh.
-H hmm15/macros -H hmm15/hmmdefs: u vo
-S test/test.scp: u vo, tp tin cha danh sch cc file .mfc cn nhn dng.
-i test/recout.mlf: u ra, transcription nhn dng c.
-w txt/wdnet.txt: u vo, wordnet c to t nhng bc u.
txt/dict.dct: u vo, t in phin m.
tiedlist: u vo, danh sch phones to c t lnh CO tiedlist trong
tree.hed.
Trang | 86
Lu
Vi vic cu to triphones theo kiu word internal nh ni phn trc, trong
tp tin cu hnh Hvite.cfg cn c thm 2 tham s FORCECXTEXP = T v
ALLOWXWRDEXP=F. Mun hiu ti sao, xem chng 12 HTK Book.
C thm mt vi tham s ca Hvite nh p, s, ty ngi dng iu chnh.
4.6.6. Kt qu t c
-Vi 500 file wave d liu hun luyn v test trn 100 file wave th kh nng
nhn din ca chng trnh thu c nh sau:
------------------------ Overall Results -------------------------SENT: %Correct=22.80 [H=114, S=386, N=500]
WORD: %Corr=99.78, Acc=87.55 [H=3991, D=0, S=9, I=489, N=4000]
bn
chuyn
ca
cui
cun
u
i
ng
duyt
hy
kha
kim
kim
li
ln
lch
lu
m
mi
nghe
ngng
nhc
nh
phi
phng
qup
sang
sau
s
s
ti
tp
thu
th
tm
to
ti
tra
tri
trang
trnh
tr
trc
t
xa
xung
Trang | 87
b. Chng trnh
Chng trnh c vit bng ngn ng C#, kt hp th vin Julius.dll v m
hnh m hc c hun luyn t cng t HTK, bao gm 2 module chnh l: module
nhn dng v module iu khin trnh duyt.
Trang | 88
Trang | 89
c. Nhn xt
Kt qu nhn dng c t module nhn dng quyt nh chnh xc rt cao
ca chng trnh, nhiu khi nhn dng s gy ra cu lnh b sai lnh i cht so vi
cu lnh thc t, v d: ngi dng ni m tp mi, nhng do nhiu mi trng nn
kt qu nhn dng c th tr v hy m tp mi. Bin php khc phc n gin nht
chnh l so snh ging nhau gia cu lnh nhn dng c v cu lnh trong
mu cu thay v so khp chnh xc tng t. Nhn chung kt qu nhn dng kh chnh
xc trong mi trng khng qu n (~90%).
4.7.2. ng dng iu khin m hnh xe tng
a. Gii thiu
y l ng dng demo s dng ging ni iu khin hot ng ca mt m
hnh xe tng iu khin t xa. Ngi dng s ni nhng khu lnh iu khin di
chuyn, v.v vo microphone, v thng qua chng trnh demo, m hnh xe tng s
hot ng tng ng vi lnh c iu khin bng ging ni. Chng trnh h tr
thc hin khong hn 30 hnh thi cu lnh iu khin,c kt hp t 25 t n
GVHD: TS. V c Lung
Trang | 90
ba
dng
bn
mi
quay
ti
li
ngng sang
tri
chy
ln
nng
su
trm
lui
phi
sng
va
lui
qua
tin
xoay
d. Chng trnh
Chng trnh c vit bao gm 2 module, module nhn dng mnh lnh ting
ni (ngn ng Java) v module iu khin m hnh xe tng (ngn ng C#). Chng
trnh l s k hp gia 2 module c vit bng 2 ngn ng khc nhau, c kt ni
giao tip vi nhau bng socket.
Module nhn dng ting ni hot hot ng tng t nh phn demo trn.
Trong c b sung thm phn to kt ni thng qua giao thc socket vi module
iu khin. Sau khi nhn dng c kt qu ting ni, chng trnh s truyn d liu
qua giao thc socket n vi module iu khin.
Module iu khin c vit bng ngn ng C#, module ny lm vic ging
nh mt driver cho thit b USB, thng qua module ny, my tnh s truyn lnh trc
tip xung chic Remote iu khin m hnh xe tng, v thng qua thit b Remote
ny iu khin hot ng ca m hnh chic xe tng t xa.
Trang | 91
Hnh 4.8 Hnh chp bn trong Remote c lp thm mch iu khin qua USB
Trang | 92
Hnh 4.9 Giao din chng trnh iu khin xe tng t xa bng ging ni
e. Nhn xt
Kt qu iu khin ch yu ph thuc vo module nhn dng ting ni. Nhn
dng cc t trong iu kin bnh thng (khng qu n) l 89%, chnh xc nhn
dng tng i thp hn demo trc do b nhiu ca ting n ng c (t m hnh xe
pht ra).
4.8. Thc hin so snh vi HTK
4.8.1. Gii thiu
Nh chng 3 gii thiu, HTK v Sphinx l 2 trong s nhng Framework
nhn dng ging ni m ngun s dng ph bin nht trn th gii hin nay. c
nhiu bi vit, bo co, kha lun ti Vit Nam trnh by v HTK v cho thy c
kh nng ca HTK trong ng dng nhn dng ging ni ting Vit. Mt trong nhng
phng th nghim ti Vit Nam s dng mnh m HTK l phng th nghim
AILAB trc thuc H. Khoa Hc T Nhin do TS. V Hi Qun qun l. Trong khi
, Sphinx do pht trin sau nn cn kh mi m ti Vit Nam. Phn ny s trnh by
s so snh v kh nng p dng cho ting Vit ca 2 Framework ny. Mc ch ca
phn ny l cho ta thy c nhng im khc bit c bn cng nh hiu qu tng
i ca 2 cng c nhn dng ging ni t ng.
Trang | 93
Trang | 94
T l cu ng (%)
T l t ng (%)
chnh xc t (%)
HTK
41.60
99.97
94.38
SPHINX
68
98.2
96.7
Insertions
Deletions
Substitutions
HTK
833
28
SPHINX
206
43
227
Trang | 95
Trang | 96
Trang | 97
Trang | 98
[1]
[2]
[3]
[4]
L. C. Mai, "Pht trin cc kt qu tng hp, nhn dng cu lnh, chui s ting
Vit lin tc trn mi trng in thoi di ng," 2006.
[5]
[6]
[7]
"Vietnamese
alphabet,"
Wikipedia,
[Online].
Available:
"IPA
for
Vietnamese,"
Wikipedia,
[Online].
http://en.wikipedia.org/wiki/Wikipedia:IPA_for_Vietnamese.
Available:
[Accessed
2012].
[9]
Mellon
University,
[Online].
Available:
[Online].
Available:
http://cmusphinx.sourceforge.net/wiki/tutorialam.
[Accessed 7 2012].
[13] [Online]. Available: http://audacity.sourceforge.net/. [Accessed 7 2012].
[14] "Recording
the
Test
Data,"
[Online].
Available:
Trang | 99
[Online].
Available:
http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4.
[Accessed 7 2012].
[16] Steve Young, Gunnar Evermann, Mark Gales, Thomas Hain, Dan Kershaw,
Xunying (Andrew) Liu, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey,
Valtcho Valtchev, Phil Woodland, HTK Book, Cambridge University
Engineering Department, 2009.
[17] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Application in