You are on page 1of 5

K THUT NHN DNG TING NI v NG DNG TRONG IU KHIN

TS. Nguyn Vn Gip


KS. Trn Vit Hng
B mn C in t - Khoa C kh i hc Bch Khoa
TPHCM
n vgi ap @d me.h cmu t. ed u. vn ; t vh ong @d me.h cmu t. edu. vn
TM TT
Vn nghin cu cc phng php nhn dng
ting ni v ang thu ht rt nhiu s u t v
nghin cu ca cc nh khoa hc trn khp
th gii. Tuy nhin cho n nay kt qu mang li
vn cha hon ton lm hi lng nhng ngi
nghin cu do tnh cht qu phc tp v khng c
nh ca i tng nhn dng l ting ni con
ngi. c bit, i vi ting Vit th kt qu cng
cn nhiu hn ch. Bi bo trnh by mt hng
nhn dng ting ni ting Vit, da trn vic trch
c trng ting ni bng phng php MFCC v
b nhn dng dng mng HMM. Kt qu c
kim nghim thc t bng m hnh xe iu khin t
xa.
ABSTRACT
Researching and inventing speech recognition
methods have been paid much considerations by
many scientists over the world. However, the
achievements dont satisfy researchers demands
because of the complexity and unstability of
speech until now. Especially with Vietnamese
speech, the results are more unsatisfied. The paper
suggests a synthetic method for recogniting
Vietnamese speech: extract speechs
particularities by MFCC method and recognize
by HMM network. The results are experimented
through a model of RF controlled car.
1 T VN
1.1 Gii thiu
Ngy nay, cng vi s pht trin ca ngnh in
t v tin hc, cc h thng my t ng dn
thay th con ngi trong nhiu cng on ca cng
vic. My c kh nng lm vic hiu qu v
nng sut cao hn con ngi rt nhiu. Song cho
n nay, vn giao tip ngi my tuy
c ci thin nhiu nhng vn cn rt th cng:
thng qua bn phm v cc thit b nhp d liu
khc. Giao tip vi thit b my bng ting ni s
l phng thc giao tip vn minh v t nhin
nht, du n giao tip ngi my s mt i m
thay vo l cm nhn ca s giao tip gia
ngi vi ngi, nu hon thin th y s l mt
phng thc giao tip tin li v hiu qu nht.
Do c s khc bit v mt ng m gia cc ngn
ng nn ta khng th p dng cc chng trnh
nhn dng khc nhn dng ting Vit. Mt h
thng nhn dng ting ni nc ta phi c xy
dng trn nn tng ca ting ni ting Vit.
1.2 Tnh hnh nghin cu trong v ngoi
nc
Vn nhn dng ting ni ting Vit ch mi
c quan tm nghin cu trong nhng nm gn
y v cha c mt chng trnh nhn dng
hon chnh no c cng b.
Trn th gii c rt nhiu h thng nhn dng
ting ni (ting Anh) v ang c ng dng
rt hiu qu nh: Via Voice ca IBM, Spoken
Toolkit ca CSLU (Central of Spoken Laguage
Under- standing) nhng trong ting Vit th cn
rt nhiu hn ch.
1.3 Mc tiu ca
ti
ti ny nghin cu th nghim mt hng nhn
dng ting ni - ting Vit da trn vic trch c
trng ca ting ni bng phng php MFCC
(Mel- Frequency Ceptrums Coefficients), v nhn
dng bng m hnh HMM (Hidden Markov
Models). ng thi, mt m hnh iu khin bng
ting ni ting Vit c xy dng vi b t
vng nh, thit lp h thng iu khin bng ting
ni vi mt tp lnh c nh. Tp lnh ny
dng iu khin Robot, v m hnh iu
khin xe bng ting ni hon chnh l mt ng
dng thc t mang tnh th nghim ca ti.
2 XY DNG H THNG NHN DNG
TING NI
Mt h thng nhn dng ni chung thng bao
gm hai phn: phn hun luyn (training phase) v
phn nhn dng (recognition phase). Hun luyn
l qu trnh h thng hc nhng mu chun c
cung cp bi nhng ting khc nhau (t hoc m),
t hnh thnh b t vng ca h thng.
Nhn dng l qu trnh quyt nh xem t no
c c cn c vo b t vng c hun
luyn. S tng qut ca h thng nhn dng
ting ni c th hin trn hnh 1.
thun tin cho vic kim tra v nh gi kt
qu, t s trn chng ti chia chng trnh nhn
dng thnh ba m-un ring bit:
M-un 1: Thc hin vic ghi m tn hiu
ting ni, tch ting ni khi nn nhiu v
lu vo c s d liu.
M-un 2: Trch c trng tn hiu ting ni
thu m-un 1 bng phng php
MFCC, ng thi thc hin c lng
vector cc vector c trng ny.
M-un 3: Xy dng m hnh Markov n vi 6
trng thi, ti u ha cc h s ca HMM
tng ng vi tng t trong b t vng, tin
hnh nhn dng mt t c c vo micro.
Short-Time
s
Mun 1 Mun 2 Mun 3
Hnh 1 S tng qut h thng nhn dng ting ni.
2.1 Thc hin m-un 1
4
Nhim v ca m-un ny l thu tn hiu t micro,
dng k thut x l u cui pht hin phn tn
hiu ting ni v phn tn hiu nhiu. T ta c
th tch ting ni ra khi nn nhiu (ch thu tn
hiu ting ni m khng thu tn hiu nhiu nn).
Tuy c nhiu phng php tch ting ni khc
nhau, nhng qua qu trnh nghin cu v th
nghim cc tc gi nhn thy s kt hp gia
phng php hm nng lng thi gian ngn v t
l qua im zero cho kt qu tt hn.
Phng php ny da vo tnh cht nng lng ca
tn hiu ting ni thng ln hn nng lng ca
(b)
3.5
3
2.5
2
1.5
1
0.5
0
0 10 20 30 40 50 60 70 80 90
Time (frame)
Hnh 3 Tn hiu (a)
v nng lng thi gian ngn (b)
tn hiu nhiu v t l qua im zero ca nhiu s
ln hn tn hiu ting ni. Hnh 2 cho thy mi
quan h gia tn hiu thu c, gi tr ca
hm nng lng thi gian ngn v t l qua im
zero.
T l qua im zero (zero crossing rate) l mt
thng s cho bit s ln m bin tn hiu i qua
im zero trong mt khong thi gian cho trc
c xc nh bi:
Nhiu Ting ni
Z (m)
1
m
sgn{s(n)} sgn{s(n 1)}

w(m n)
(2.2)
Hm nng lng
N
n m N +1
2
T l qua im zero
thi gian ngn
trong , N l chiu di ca ca s w(m-
n).
Nhiu thut ton pht hin u cui c da trn
ln ca tn hiu nng lng thi gian ngn v t
l qua im zero c gng pht hin chnh xc
n mc c th. Qu trnh c bn ca thut ton
nh sau: mt mu tn hiu nh ca nn nhiu c
ly trong sut khong lng (silence) cho n
trc im bt u ca tn hiu ting ni. T y
Hnh 2 S tng quan gia tn hiu ting ni
v nn nhiu.
Vi mt ca s kt thc ti mu th m, hm nng
lng thi gian ngn E(m) c xc nh bi:

ngng ting ni c xc nh da trn nng


lng khong lng v nng lng nh. Ban u,
nhng im kt thc c xc nh nhng ni
nng lng tn hiu vt qua ngng ny, sau
ta tnh khong cch gia hai im xem c tho
mn di ca mt t hay khng. Tng t ta p
dng
E(m)

[s(n)w(m n)]
2
(2.1)
cho t l qua im zero.
[4-6]
n
th ca hm nng lng thi gian ngn ca mt
on tn hiu c th hin trn hnh 3.
Signal
0.4
0.2
V d: tn hiu thu vo t micro bao gm nhiu nn
v ting ni c th nh sau:
0
Am
p
-0.2
(a)
-0.4
-0.6
-0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Tim e (s)
Hnh 4 Tn hiu ca t ti.
Qua qu trnh x l theo chu trnh trn ta c c
th dng xung nh sau:
( )

Phn tch ph
Nu nhng gi tr c khong cch u nhau, tc l
2k
xem w , th bin i Fourier ri rc (DFT)
N
ca tt c cc frame ca tn hiu l:
Hnh 5 Dng xung sau x l kt hp hm nng
X (k ) X (e
j 2k / N
)
k 0,..., N 1.
t t
lng thi gian ngn v t l qua im zero
T hnh 5 ta thy ch cn xc nh di ti thiu
ca mt t l ta c th tch t ra khi nn nhiu.
n y m-un 1 hon thnh nhim v. y l
mt phn rt quan trng trong mt h thng nhn
dng ting ni, n nh hng rt ln n kt qu
nhn dng.
2.2 Thc hin m-un
2
n y chng ta c c cc mu ting ni
c kh nhiu. M-un 2 thc hin vic trch c
trng cc mu ting ni thu m-un 1. C
nhiu phng php trch c trng khc nhau nh:
wavelets, LPC, MFCC y chn phng php
MFCC (trch c trng theo thang tn s Mel) do
tc tnh ton cao, tin cy ln v c s
dng rt hiu qu trong cc chng trnh nhn dng
ting ni trn th gii.
Bn cnh nu s mu N l bi s ca 2 (N=2p, p
l s nguyn) th phc tp tnh ton s
gim ng k khi dng phng php FFT (Fast
Fourier Transform).
Lc x l
Nhng nghin cu v sinh l hc chng t rng
mc cm nhn i vi tn s tn hiu ting ni
ca con ngi khng theo mt t l tuyn tnh.
ng vi mi tone l c mt tn s f, c o
bng n v Hz. m t chnh xc s tip nhn
tn s ca h thng thnh gic, ngi ta xy
dng mt thang khc thang Mel. Thang tn s
mel tuyn tnh tn s di 1000 Hz v logarit
tn s trn
1000 Hz. Mt quan h nh x tng ng gia
thang tn s thc (vt l, Hz) v thang tn s sinh
l Mel c cho bi cng thc sau:
S gii thut phng php MFCC nh
sau:
F
mel

1000
1 +
F
Hz
_

log
10
2

1000
,
F
Hz
_
hay F
mel
2595. log
10
1 +
1000
(2.3)
,
Hnh 6 Qu trnh tnh cc h s MFCC.
Ca s ho tn hiu
(Windowing)
Nhng phng php nh gi ph c in ch ng
tin cy trong trng hp tn hiu dng (stationary
signal), v d mt tn hiu m nhng c trng l
bt bin i vi thi gian. i vi tn hiu ting
ni th iu ny ch c c trong mt khong thi
gian ngn, vic ny c th thc hin c bng
cch ca s ho mt tn hiu x(n) thnh mt
chui lin tc nhng ca s tun t x
t
(n), t=1,2,
,T, gi l nhng frame.
Trong h thng nhn dng t ng th dng ca s
thng dng nht l Hamming window, p ng
xung ca n l mt hm cosin tng:

n
_
Vic phn tch ph s th hin nhng c trng tn
hiu ting ni m do chnh hnh dng ca vng
pht m to ra. Nhng c trng ph ca tn hiu
ting ni s c c sau khi cho qua nhng b lc.
i vi thang tn s Mel th mt lc cho mi
thnh phn tn s mong mun (hnh 7). B lc ny
c p ng tn s dng tam gic, v khong cch
hay bng thng c xc nh bi mt hng s
Mel.

0.54 0.46 cos

w n

N 1
n 0,..., N 1
' ,

0 n khac
Hnh 7 Mt v d v b lc thang Mel
Tnh nng lng logarit (LOG)
Cc bc trc ng vai tr lm phng ph, thc
hin mt x l ging nh tai ca con ngi.
n
hun luyn
bc ny tnh ton logarit ca bnh phng ln
nhng h s ti ng ra b lc. Ch rng tai ngi
thc hin rt tt vic x l ln v logarit.
Hn th na, x l ln th loi b nhng thng
tin khng cn thit trong khi x l logarit thc
hin mt nn ng, trch c trng t nhy i vi
nhng bin i ng.
Tnh ph tn s mel
Bc cui cng trong vic tnh ph tn s mel
(MFCC) bao gm thc hin bin i ngc DFT
trn ln logarit ca ng ra ca b lc.
Ch rng do nng lng ph log l thc v i
xng nn bin i DFT ngc c ni gn l
chuyn i cosine ri rc (Discrete Cosine
Transform DCT). Tnh cht ca DCT l to ra
nhng c trng rt khc nhau. DCT cng c tc
dng lm phng ph nu ch c nhng h s u
tin c gi li. Trong nhn dng ting ni th s
h s MFCC thng nh hn 15. [6]
Sau khi tn hiu ting ni c trch c trng th
mi t c c c trng bi mt ma trn h s
thc. Do m hnh HMM ri rc c ng dng
nhn dng nn nhng vector c trng ny phi
c c lng vector (VQ) thnh mt ch s
codebook ri rc. Thut ton ph bin dng
thit k codebook l LBG (Linde, Buzo v Gray).
Hnh 8 c lng vector VQ trong nhn dng.
Phng php c s dng c lng vector l
phng php K-means.
2.3 Thc hin m-un
3
Sau khi thc hin xong 2 m-un trn th chng
ta c mt c s d liu cc vector c trng ng
vi tng t. Trong m un ny chng ta s
xy dng mt m hnh Markov n vi d liu
hun luyn l cc vector c trng c c t m-
un 2. S hun luyn v nhn dng bng
m hnh HMM c th hin trn hnh 9 vi
b t vng gm 3 t: ti, lui, tri.
Hun luyn:
Ti Lui Tri
Nhng mu
c lng
thng s

ti

lui

tri
Nhn dng:
O , , , , ,
,
P(O/
ti
) P(O/
lui
) P(O/
tri
)
Hnh 9 S m hnh HMM
ng vi mi t cn nhn dng th chng ta c mt
c s d liu cc c trng t cc ln c khc
nhau (nh trn s l 3 ln ly mu). Sau ta s
c lng cc thng s ca m hnh (A, B,
)
xc sut P(O|) t cc i, tng ng vi mi
t l mt xc nh. nhn dng mt t th ta
ch vic tnh xc sut chui quan st ca t ng
vi cc c hun luyn, v chn mu no c
xc sut ln nht.
Da vo cc ti liu tham kho v nhng thng tin
v cc h thng nhn dng xy dng thnh cng
chng ti thy rng: i vi nhn dng tn hiu
ting ni th m hnh HMM thng c chn l
m hnh tri phi (left-right) c t 5 n 6 trng
thi. Qua qu trnh th nghim, m hnh c 6 trng
thi cho kt qu tt hn nn trong chng trnh ca
mnh, cc tc gi xy dng mt HMM vi s
trng thi l 6, xem hnh 10.
Hnh 10 M hnh HMM tri phi vi 6 trng
thi.
3 M HNH H THNG XE IU
KHIN
S m hnh xe v tuyn iu khin bng ting
ni t my tnh c trnh by trn hnh 11.
lui
ti
phi tri
B iu khin t xa
SW
1
SW
2
SW
3
SW
4
anten
pht
anten
thu
B iu khin
trn xe
phi tri ti lui
Hnh 11 S tng quan h thng th nghim
Xe v tuyn c th c iu khin t xa bng
ting ni t my tnh. Ting ni l t lnh s c
thu vo v nhn dng trn b nhn dng ting ni,
v cp chui t nhn dng c cho b quyt nh
xut lnh iu khin thng qua cng COM. Mt
mch giao tip my tnh thng qua cng ni tip
(RS232) c thit k iu khin. Mch giao
tip nhn tn hiu v ng m cc kho chuyn
thnh tn hiu ca b iu khin t xa. Mi khi c
mt kho c ng hoc mt t hp phm c
nhn, b iu khin t xa s m ha thch hp v
a ra anten pht. Tn hiu iu khin c iu
ch v truyn n xe bng sng v tuyn vi tn
s sng mang F
C
= 27MHz. B iu khin trn xe
s tin hnh iu khin vn hnh xe. M hnh
hot ng tt vi b t vng gm 4 t: phi, tri,
ti, lui vi kt qu tt (99%).
4 KT LUN
M hnh th nghim nhn dng ting ni ting
Vit theo hng kt hp MFCC v HMM tuy cn
nhiu hn ch nhng p ng c mc tiu ca
ti. Chng trnh c s dng iu khin
robot vi b t vng nh (di 16 t) cho
chnh xc c th chp nhn c (trn 90%).
Trong thi gian ti nhm tc gi s ti u ha
chng trnh nhn dng t c kt qu cao
hn v tng tc x l.
TI LIU THAM KHO
1. GS. Phm Vn t , K thut lp trnh C, Nh
xut bn Khoa Hc v K Thut, 1999.
2. Nguyn Hong Hi Nguyn Khc Kim, Lp
trnh Matlab, Nh xut bn Khoa Hc v K
Thut, 2003.
3. PGS.TS. Nguyn Hu Phng, X l tn
hiu s, Nh xut bn Giao thng vn ti,
2000.
4. L Tin Thng, X l tn hiu s v wavelets,
Nh xut bn i Hc Quc Gia TP. H Ch
Minh, 2002.
5. Claudio Becchetti and Lucio Prina Ricotti,
Speech Recognition Theory and C++
Implementation, JOHN WILEY & SONS,
LTD, 2000.
6. Gordon E.Pelton, Voice Processing, McGraw
Hill, 1992.
7. John R.Deller & John G.Proakis & John H. L.
Hansen, Discrete Time Processing of Speech
Signals, Macmillan Publishing Company,
1993.
8. F.J. Owens, Signal Processing of Speech,
Macmillan, 1993.

You might also like