You are on page 1of 5

K THUT NHN DNG TING NI v NG DNG TRONG IU KHIN

TS. Nguyn Vn Gip


KS. Trn Vit Hng
B mn C in t - Khoa C kh i hc Bch Khoa TPHCM
nvgiap@dme.hcmut.edu.vn; tvhong@dme.hcmut.edu.vn

TM TT Vn nhn dng ting ni ting Vit ch mi c


quan tm nghin cu trong nhng nm gn y v
Vn nghin cu cc phng php nhn dng
cha c mt chng trnh nhn dng hon chnh
ting ni v ang thu ht rt nhiu s u t v
no c cng b.
nghin cu ca cc nh khoa hc trn khp th
gii. Tuy nhin cho n nay kt qu mang li vn Trn th gii c rt nhiu h thng nhn dng
cha hon ton lm hi lng nhng ngi nghin ting ni (ting Anh) v ang c ng dng rt
cu do tnh cht qu phc tp v khng c nh ca hiu qu nh: Via Voice ca IBM, Spoken Toolkit
i tng nhn dng l ting ni con ngi. c ca CSLU (Central of Spoken Laguage Under-
bit, i vi ting Vit th kt qu cng cn nhiu standing) nhng trong ting Vit th cn rt nhiu
hn ch. Bi bo trnh by mt hng nhn dng hn ch.
ting ni ting Vit, da trn vic trch c trng
1.3 Mc tiu ca ti
ting ni bng phng php MFCC v b nhn
dng dng mng HMM. Kt qu c kim nghim ti ny nghin cu th nghim mt hng nhn
thc t bng m hnh xe iu khin t xa. dng ting ni - ting Vit da trn vic trch c
trng ca ting ni bng phng php MFCC (Mel-
ABSTRACT
Frequency Ceptrums Coefficients), v nhn dng
Researching and inventing speech recognition bng m hnh HMM (Hidden Markov Models).
methods have been paid much considerations by ng thi, mt m hnh iu khin bng ting ni
many scientists over the world. However, the ting Vit c xy dng vi b t vng nh, thit
achievements dont satisfy researchers demands lp h thng iu khin bng ting ni vi mt tp
because of the complexity and unstability of speech lnh c nh. Tp lnh ny dng iu khin
until now. Especially with Vietnamese speech, the Robot, v m hnh iu khin xe bng ting ni
results are more unsatisfied. The paper suggests a hon chnh l mt ng dng thc t mang tnh th
synthetic method for recogniting Vietnamese nghim ca ti.
speech: extract speechs particularities by MFCC
2 XY DNG H THNG NHN DNG
method and recognize by HMM network. The
TING NI
results are experimented through a model of RF
controlled car. Mt h thng nhn dng ni chung thng bao gm
hai phn: phn hun luyn (training phase) v phn
1 T VN
nhn dng (recognition phase). Hun luyn l qu
1.1 Gii thiu trnh h thng hc nhng mu chun c cung
cp bi nhng ting khc nhau (t hoc m), t
Ngy nay, cng vi s pht trin ca ngnh in t
hnh thnh b t vng ca h thng. Nhn
v tin hc, cc h thng my t ng dn thay
dng l qu trnh quyt nh xem t no c c
th con ngi trong nhiu cng on ca cng vic.
cn c vo b t vng c hun luyn. S
My c kh nng lm vic hiu qu v nng sut
tng qut ca h thng nhn dng ting ni c
cao hn con ngi rt nhiu. Song cho n nay, vn
th hin trn hnh 1.
giao tip ngi my tuy c ci thin
nhiu nhng vn cn rt th cng: thng qua bn thun tin cho vic kim tra v nh gi kt qu,
phm v cc thit b nhp d liu khc. Giao tip t s trn chng ti chia chng trnh nhn dng
vi thit b my bng ting ni s l phng thc thnh ba m-un ring bit:
giao tip vn minh v t nhin nht, du n giao
! M-un 1: Thc hin vic ghi m tn hiu ting
tip ngi my s mt i m thay vo l cm
ni, tch ting ni khi nn nhiu v lu vo
nhn ca s giao tip gia ngi vi ngi, nu
c s d liu.
hon thin th y s l mt phng thc giao tip
tin li v hiu qu nht. ! M-un 2: Trch c trng tn hiu ting ni
thu m-un 1 bng phng php MFCC,
Do c s khc bit v mt ng m gia cc ngn
ng thi thc hin c lng vector cc
ng nn ta khng th p dng cc chng trnh
vector c trng ny.
nhn dng khc nhn dng ting Vit. Mt h
thng nhn dng ting ni nc ta phi c xy ! M-un 3: Xy dng m hnh Markov n vi 6
dng trn nn tng ca ting ni ting Vit. trng thi, ti u ha cc h s ca HMM
tng ng vi tng t trong b t vng, tin
1.2 Tnh hnh nghin cu trong v ngoi nc
hnh nhn dng mt t c c vo micro.
Mun 1 Mun 2 Mun 3

Hnh 1 S tng qut h thng nhn dng tingShort-Time


ni. Energy
2.1 Thc hin m-un 1 4

3.5
Nhim v ca m-un ny l thu tn hiu t micro,
dng k thut x l u cui pht hin phn tn 3

hiu ting ni v phn tn hiu nhiu. T ta c 2.5

th tch ting ni ra khi nn nhiu (ch thu tn hiu 2


ting ni m khng thu tn hiu nhiu nn).
1.5

Tuy c nhiu phng php tch ting ni khc 1


nhau, nhng qua qu trnh nghin cu v th
0.5
nghim cc tc gi nhn thy s kt hp gia
phng php hm nng lng thi gian ngn v t (b)
0
0 10 20 30 40 50 60 70 80 90
l qua im zero cho kt qu tt hn. Time (frame)

Phng php ny da vo tnh cht nng lng ca Hnh 3 Tn hiu (a)


tn hiu ting ni thng ln hn nng lng ca v nng lng thi gian ngn (b)
tn hiu nhiu v t l qua im zero ca nhiu s T l qua im zero (zero crossing rate) l mt
ln hn tn hiu ting ni. Hnh 2 cho thy mi thng s cho bit s ln m bin tn hiu i qua
quan h gia tn hiu thu c, gi tr ca hm im zero trong mt khong thi gian cho trc
nng lng thi gian ngn v t l qua im zero. c xc nh bi:
Nhiu Ting ni 1 m sgn{s(n )} sgn{s(n 1)}
Z s (m ) = w(m n ) (2.2)
N n = m N +1 2
Hm nng lng
thi gian ngn trong , N l chiu di ca ca s w(m-n).
Nhiu thut ton pht hin u cui c da trn
ln ca tn hiu nng lng thi gian ngn v t
l qua im zero c gng pht hin chnh xc
n mc c th. Qu trnh c bn ca thut ton
nh sau: mt mu tn hiu nh ca nn nhiu c
T l qua im zero ly trong sut khong lng (silence) cho n
trc im bt u ca tn hiu ting ni. T y
Hnh 2 S tng quan gia tn hiu ting ni v ngng ting ni c xc nh da trn nng
nn nhiu. lng khong lng v nng lng nh. Ban u,
nhng im kt thc c xc nh nhng ni
Vi mt ca s kt thc ti mu th m, hm nng nng lng tn hiu vt qua ngng ny, sau ta
lng thi gian ngn E(m) c xc nh bi: tnh khong cch gia hai im xem c tho mn

di ca mt t hay khng. Tng t ta p dng
E (m ) = [s(n)w(m n)]
2 cho t l qua im zero.
(2.1) [4-6]
n = V d: tn hiu thu vo t micro bao gm nhiu nn
th ca hm nng lng thi gian ngn ca mt v ting ni c th nh sau:
on tn hiu c th hin trn hnh 3.
Signal

0.4

0.2

0
Am
p
-0.2

-0.4
Hnh 4 Tn hiu ca t ti.
-0.6
Qua qu trnh x l theo chu trnh trn ta c c
-0.8 th dng xung nh sau:
(a) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Time (s)
! Phn tch ph
Nu nhng gi tr c khong cch u nhau, tc l
2k
xem w = , th bin i Fourier ri rc (DFT)
N
ca tt c cc frame ca tn hiu l:

Hnh 5 Dng xung sau x l kt hp hm nng X t (k ) = X t (e j 2k / N ) k = 0,..., N 1.


lng thi gian ngn v t l qua im zero
Bn cnh nu s mu N l bi s ca 2 (N=2p, p
T hnh 5 ta thy ch cn xc nh di ti thiu l s nguyn) th phc tp tnh ton s gim
ca mt t l ta c th tch t ra khi nn nhiu. ng k khi dng phng php FFT (Fast Fourier
n y m-un 1 hon thnh nhim v. y l Transform).
mt phn rt quan trng trong mt h thng nhn
! Lc x l
dng ting ni, n nh hng rt ln n kt qu
nhn dng. Nhng nghin cu v sinh l hc chng t rng
mc cm nhn i vi tn s tn hiu ting ni
2.2 Thc hin m-un 2
ca con ngi khng theo mt t l tuyn tnh. ng
n y chng ta c c cc mu ting ni vi mi tone l c mt tn s f, c o bng n
c kh nhiu. M-un 2 thc hin vic trch c v Hz. m t chnh xc s tip nhn tn s ca
trng cc mu ting ni thu m-un 1. C h thng thnh gic, ngi ta xy dng mt
nhiu phng php trch c trng khc nhau nh: thang khc thang Mel. Thang tn s mel tuyn
wavelets, LPC, MFCC y chn phng php tnh tn s di 1000 Hz v logarit tn s trn
MFCC (trch c trng theo thang tn s Mel) do 1000 Hz. Mt quan h nh x tng ng gia thang
tc tnh ton cao, tin cy ln v c s tn s thc (vt l, Hz) v thang tn s sinh l Mel
dng rt hiu qu trong cc chng trnh nhn dng c cho bi cng thc sau:
ting ni trn th gii.
1000 F
S gii thut phng php MFCC nh sau: Fmel = 1 + Hz
log10 2 1000

F
hay Fmel = 2595 . log 10 1 + Hz (2.3)
1000
Vic phn tch ph s th hin nhng c trng tn
hiu ting ni m do chnh hnh dng ca vng pht
m to ra. Nhng c trng ph ca tn hiu ting
ni s c c sau khi cho qua nhng b lc. i
vi thang tn s Mel th mt lc cho mi thnh
Hnh 6 Qu trnh tnh cc h s MFCC. phn tn s mong mun (hnh 7). B lc ny c p
! Ca s ho tn hiu (Windowing) ng tn s dng tam gic, v khong cch hay bng
thng c xc nh bi mt hng s Mel.
Nhng phng php nh gi ph c in ch ng
tin cy trong trng hp tn hiu dng (stationary
signal), v d mt tn hiu m nhng c trng l
bt bin i vi thi gian. i vi tn hiu ting ni
th iu ny ch c c trong mt khong thi gian
ngn, vic ny c th thc hin c bng cch
ca s ho mt tn hiu x(n) thnh mt chui
lin tc nhng ca s tun t xt(n), t=1,2,,T,
gi l nhng frame.
Trong h thng nhn dng t ng th dng ca s
thng dng nht l Hamming window, p ng
xung ca n l mt hm cosin tng:
2n
0.54 0.46 cos n = 0,..., N 1
w(n ) = N 1
0 n khac
Hnh 7 Mt v d v b lc thang Mel

! Tnh nng lng logarit (LOG)
Cc bc trc ng vai tr lm phng ph, thc
hin mt x l ging nh tai ca con ngi. n
bc ny tnh ton logarit ca bnh phng ln Hun luyn:
nhng h s ti ng ra b lc. Ch rng tai ngi
Nhng mu
thc hin rt tt vic x l ln v logarit. Hn Ti Lui Tri hun luyn
th na, x l ln th loi b nhng thng tin
khng cn thit trong khi x l logarit thc hin
mt nn ng, trch c trng t nhy i vi nhng
bin i ng.
! Tnh ph tn s mel c lng
Bc cui cng trong vic tnh ph tn s mel thng s
(MFCC) bao gm thc hin bin i ngc DFT
ti lui tri
trn ln logarit ca ng ra ca b lc.
Ch rng do nng lng ph log l thc v i Nhn dng:
xng nn bin i DFT ngc c ni gn l
chuyn i cosine ri rc (Discrete Cosine O = ,,,,,,
Transform DCT). Tnh cht ca DCT l to ra
nhng c trng rt khc nhau. DCT cng c tc
dng lm phng ph nu ch c nhng h s u P(O/ti) P(O/lui) P(O/tri)
tin c gi li. Trong nhn dng ting ni th s Hnh 9 S m hnh HMM
h s MFCC thng nh hn 15. [6]
ng vi mi t cn nhn dng th chng ta c mt
Sau khi tn hiu ting ni c trch c trng th c s d liu cc c trng t cc ln c khc
mi t c c c trng bi mt ma trn h s nhau (nh trn s l 3 ln ly mu). Sau ta s
thc. Do m hnh HMM ri rc c ng dng
c lng cc thng s ca m hnh = ( A, B, )
nhn dng nn nhng vector c trng ny phi
c c lng vector (VQ) thnh mt ch s xc sut P(O|) t cc i, tng ng vi mi
codebook ri rc. Thut ton ph bin dng thit t l mt xc nh. nhn dng mt t th ta ch
k codebook l LBG (Linde, Buzo v Gray). vic tnh xc sut chui quan st ca t ng vi
cc c hun luyn, v chn mu no c xc
sut ln nht.
Da vo cc ti liu tham kho v nhng thng tin
v cc h thng nhn dng xy dng thnh cng
chng ti thy rng: i vi nhn dng tn hiu
ting ni th m hnh HMM thng c chn l
m hnh tri phi (left-right) c t 5 n 6 trng
thi. Qua qu trnh th nghim, m hnh c 6 trng
thi cho kt qu tt hn nn trong chng trnh ca
mnh, cc tc gi xy dng mt HMM vi s
trng thi l 6, xem hnh 10.

Hnh 8 c lng vector VQ trong nhn dng.


Phng php c s dng c lng vector l
phng php K-means.
2.3 Thc hin m-un 3
Sau khi thc hin xong 2 m-un trn th chng
ta c mt c s d liu cc vector c trng ng Hnh 10 M hnh HMM tri phi vi 6 trng thi.
vi tng t. Trong m un ny chng ta s xy
dng mt m hnh Markov n vi d liu hun 3 M HNH H THNG XE IU KHIN
luyn l cc vector c trng c c t m-un 2. S m hnh xe v tuyn iu khin bng ting
S hun luyn v nhn dng bng m hnh ni t my tnh c trnh by trn hnh 11.
HMM c th hin trn hnh 9 vi b t vng
gm 3 t: ti, lui, tri.
lui

ti

phi tri

B iu khin t xa anten anten B iu khin


pht thu trn xe
SW1 SW2 SW3 SW4

phi tri ti lui

Hnh 11 S tng quan h thng th nghim


Xe v tuyn c th c iu khin t xa bng 5. Claudio Becchetti and Lucio Prina Ricotti,
ting ni t my tnh. Ting ni l t lnh s c Speech Recognition Theory and C++
thu vo v nhn dng trn b nhn dng ting ni, Implementation, JOHN WILEY & SONS,
v cp chui t nhn dng c cho b quyt nh LTD, 2000.
xut lnh iu khin thng qua cng COM. Mt
6. Gordon E.Pelton, Voice Processing, McGraw
mch giao tip my tnh thng qua cng ni tip
Hill, 1992.
(RS232) c thit k iu khin. Mch giao
tip nhn tn hiu v ng m cc kho chuyn 7. John R.Deller & John G.Proakis & John H. L.
thnh tn hiu ca b iu khin t xa. Mi khi c Hansen, Discrete Time Processing of Speech
mt kho c ng hoc mt t hp phm c Signals, Macmillan Publishing Company,
nhn, b iu khin t xa s m ha thch hp v 1993.
a ra anten pht. Tn hiu iu khin c iu
8. F.J. Owens, Signal Processing of Speech,
ch v truyn n xe bng sng v tuyn vi tn s
sng mang FC = 27MHz. B iu khin trn xe s Macmillan, 1993.
tin hnh iu khin vn hnh xe. M hnh hot
ng tt vi b t vng gm 4 t: phi, tri, ti, lui
vi kt qu tt (99%).
4 KT LUN
M hnh th nghim nhn dng ting ni ting Vit
theo hng kt hp MFCC v HMM tuy cn nhiu
hn ch nhng p ng c mc tiu ca ti.
Chng trnh c s dng iu khin robot vi
b t vng nh (di 16 t) cho chnh xc c
th chp nhn c (trn 90%). Trong thi gian ti
nhm tc gi s ti u ha chng trnh nhn dng
t c kt qu cao hn v tng tc x l.
TI LIU THAM KHO
1. GS. Phm Vn t , K thut lp trnh C, Nh
xut bn Khoa Hc v K Thut, 1999.
2. Nguyn Hong Hi Nguyn Khc Kim, Lp
trnh Matlab, Nh xut bn Khoa Hc v K
Thut, 2003.
3. PGS.TS. Nguyn Hu Phng, X l tn hiu
s, Nh xut bn Giao thng vn ti, 2000.
4. L Tin Thng, X l tn hiu s v wavelets,
Nh xut bn i Hc Quc Gia TP. H Ch
Minh, 2002.

You might also like