Professional Documents
Culture Documents
Do An Tot Nghiep
Do An Tot Nghiep
o Th Thu Thy
MC L
Trang 1
M U
Ngn iu chnh l ci mang li cho ting ni con ngi nhng m sc ring
bit. Ngn iu ca ting ni lin kt cht ch vi ng iu. Ng iu l s nng cao
h thp ca ging ni trong cu. Ting Vit ta l mt ngn ng kh phc tp bao
gm c ngn iu v ng iu. Do vn nhn dng ting ni cn rt nhiu s
u t v nghin cu. Tuy nhin cho n nay kt qu mang li vn cha hon thin
do tnh cht phc tp v khng c nh ca i tng nhn dng l ting ni ca con
ngi, c bit l ting Vit.
Hin nay c rt nhiu phng php nhn dng ting ni. M hnh Fujisaki
c ng dng rng ri trong h thng ca ting Nht, m hnh MFGI (Mixdorff
Fujisaki model of German Intonation) c ng dng trong ting c, m hnh
HMM (hidden markov models)vv
Trong cc m hnh trn li p dng nhiu phung php nhn dng khc nhau. Moi
phng phap mang mot tnh ac trng va u iem rieng.
Phng phap LPC (linear predictive coding)-ma hoa d bao tuyen tnh:
nhc iem la co mot so t phat am gan giong nhau th b nham lan
nhieu.
Phng phap AMDF (average magnitude difference function)- ham hieu bien
o trung bnh: u iem la so ngo vao t,kch thc mang huan luyen
nho, t phu thuoc vao cach phat am nen t le oc sai t hn phng phap
LPC, tuy nhien khuyet iem la khong phan biet ve thanh ieu, kho s
ung trong trng hp t oc lien tiep.
AMDF & LPC :Do u va nhc iem cua hai phng phap LPC va AMDF
nen can s ket hp gia hai phng phap o.
Phng phap th t MFCC (mel-frequency ceptrums coefficients).
Nhn dng ting ni l mt qu trnh nhn dng mu, vi mc ch l phn lp
thng tin tn hiu ting ni thnh mt dy tun t cc mu c hc trc v
lu tr trong h thng nhn dng. Cc mu l cc n v nhn dng, chng c th l
cc t hay cc m v. Nu cc mu ny l bt bin v khng thay i th cng vic
Trang 2
nhn dng ting ni tr nn n gin bng cch so snh d liu ting ni cn nhn
dng vi cc mu c hc v lu tr trong h thng.
Nhn dng ting ni l mt lnh vc tuy khng mi nhng v cng phc tp.
Nhn dng ting ni c th gii bt u nghin cu cch y hn 50 nm, tuy
nhin nhng k qu thc t t c v cng kh quan. Cn phi rt lu na con
ngi mi t n vic xy dng mt h thng hiu c ting ni nh con ngi.
Trong phm vi ch l n tt nghip chng em s xy dng chng trnh nhn
dng mi ch s ting Vit bng nhng cng c c sn ca Matlab. nh hng
xy dng chng trnh nhn dng c tt c cc t, cu trong ting Vit c th
ng dng c vo thc t. Tuy nhin do ch mi tip xc vi lnh vc ny nn kh
nng, kin thc ca chng em con rt hn ch v nhng kh khn v thi gian,
phng tinnn chng m ch c th xy dng mt h thng nhn dng nh.
Trong tng lai nu c iu kin tip xc v nghin cu su hn v lnh vc ny, em
mong mun pht trin n ny ln c th ng dng trong thc t.
Trang 3
iu khin, ra lnh
Nhn dng
Do nhng kh khn , nhn dng ting ni cn tri thc t rt nhiu t ngnh khoa
hc lin quan:
X l tn hiu: tm hiu cc phng php tch cc thng tin c trng, n
nh t tn hiu ting ni, gim nh hng ca nhiu v s thay i theo thi
gian ca ting ni.
m hc: tm hiu mi quan h gia tn hiu ting ni vt l vi cc c ch
sinh l hc ca vic pht m v vic nghe ca con ngi.
Nhn dng mu: nghin cu cc thut ton phn lp, hun luyn v so
snh cc mu d liu...
L thuyt thng tin: nghin cu cc m hnh thng k, xc sut; cc thut
ton tm kim, m ho, gii m, c lng cc tham s ca m hnh
Ngn ng hc: tm hiu mi quan h gia ng m v ng ngha, ng php,
ng cnh ca ting ni.
Tm-sinh l hc: tm hiu cc c ch bc cao ca h thng nron ca b no
ngi trong cc hot ng nghe v ni.
Khoa hc my tnh: nghin cu cc thut ton, cc phng php ci t v
s dng hiu qu cc h thng nhn dng trong thc t.
Ba nguyn tc c bn trong nhn dng ting ni:
Tn hiu ting ni c biu din chnh xc bi cc gi tr ph trong mt
khung thi gian ngn. Nh vy ta c th trch ra c im ting ni t nhng
khang thi gian ngn v dng cc c im ny lm d liu nhn dng ting
ni.
Ni dung ca ting ni c biu din di dng ch vit, l mt dy cc k
hiu ng m.
Nhn dng ting ni l mt qu trnh nhn thc. Ngn ng ni l c ngha, do
thng tin v ng ngha v suy on ca gi tr trong qu trnh nhn dng
ting ni nht l khi thng tin v m hc l khng r rng.
Trang 5
Phn
tch
v xc nh
cc tham
So
s
tng ng viNguyn
cc mutc
HMM
la
T chn
nhn dng c
Ting
ni
Lng
tsnh
ha
Vector
u vo
T nhn dng c chia thnh chui thi gian ca T khung v c phn tch
mt s thut ton phn tch nh (MFCC), phn tch m ha d bo tuyn tnh (LPC),
bin i Fourier nhanh (FFT), ... Sau bc ny ta c chui mu quan st Ot (t=
1,2,3,... T). Chui Ot c lng t ha l tp i din ca M mu ting ni. Sau
h thng so snh tng ng ca t u vo vi ca M mu ting ni. T u vo
Trang 6
Lp ra
Lp n
Lp vo
(a)
(b)
Lp vo
C nhiu cch khc nhau tng hp cc ngun kin thc vo trong h thng
nhn dng ting ni. Phng php thng dng nht l x l t di ln. Theo cch
ny, cc tin trnh x l c trin khai tun t t thp ln cao. Tin trnh phn tch
tn hiu u vo, tm c tnh, phn on, gn nhn c trn khai u tin, sau l
cc tin trnh phn lp m thanh, xc nh t, cu. Mi tin trnh x l i hi mt
ngun kin thc v cc ngun kin thc ny c tch ly dn qua cc qu trnh x
l thc t ging nh kin thc con ngi.
Trang 8
Trang 9
tip trong vic ci t ln c phn cng hoc phn mm. S lng tnh ton
trong x l LPC cng t hn trong phng php filters-bank
-
M hnh LPC hot ng tt trong cc ng dng nhn dng. Knh nghim cho
thy, cc h thng nhn dng s dng m hnh LPC cho kt qu tt hn so
vi cc h s dng filter-bank.
(1)
s ( n )= ai s ( ni )+Gu( n)
i=1
S ( z ) = ai z S ( z ) +GU (z )
1
i=1
t dn n hm truyn ca m hnh:
H ( z )=
S (z )
=
GU ( z)
1
p
1 ai z1
1
A ( z)
i=1
s ( n )= ak s ( nk )+Gu(n)
k =1
~s ( n )= a s (nk )
k
k =1
Trang 11
~s (n)
Vi hm truyn sai s :
Vn c bn ca phn tch d on tuyn tnh l xc nh tp cc h s{ak}
tin on trc tip t tn hiu ting ni cc c tnh ph ca b lc trng vi tn
hiu sng ting ni trong ca s phn tch.
Do cc c im ph tn ca ting ni t hay i theo thi gian, do vy cc h
s tin on ti mt thi im n phi c c lng t mt phn on ngn ca tn
hiu ting ni xy ra gn n. V nh th, hng tip cn c bn l tm mt tp cc h
s tin on c sai s d on bnh phng t cc tiu trn mt phn on ngn
ca tn hiu sng ting ni. Thng thng, tn hiu ting ni c phn tch trn cc
khung lin tip vi di khong 10ms.
Bi ton ny c gii da trn phng php t tng quan, khi cc h s ak c
lng c s l nghim ca phng trnh:
p
r n (|ik|) a^ k =r n ( i) , k i p
k=1
r ( k )=
x (n) x (n+ k )
n=
r (1)
r (0)
qui: vi p=2,3...,P
Tnh h s Kp (h s PARCOR)
p1
r ( p ) a i r ( pi)
K p=
i=1
E p1
Tnhcc h s d bo bc p:
a p ( k ) =a p1 ( k )K p a p1 ( pk ) vi k =1,2, , p1
ap(p) = Kp
Tnh sai s bnh phng trung bnh bc p:
Trang 13
Tn hiu preemphasized
Bc 3:Tn hiu c phn on thnh cc frame, mi frame N mu,
chng lp M mu : M = 1/3 N
Chn tn s ly mu
Chn N v M
Bc 4: ca s ha cc frame, nhm gim s gin on ca tn hiu ti u v
cui mi frame. Hay ni cch khc l gim dn tn hiu v 0 ti cc khong
bt u v kt thc ca mi khung.
Trang 14
~
x l ( n )=x l ( n ) W ( n ) 0 n N 1
2
( l h s G ca m hnh LPC )
m1
c m=am +
k=1
( mk ) c a
( mk ) c a
mk
vi p m Q ht
m1
c m=
k=Q p
m k
vi 1 m p
3
nl
g y Q= p
2
Trong :
W m = 1+
( )]
Q
m
sin
vi 1 m Q
2
Q
Fs = 8 kHz
Fs = 10 kHz
300 (45
msec)
100 (15
msec)
80 (10 msec)
10
10
12
12
12
m=1127.01048 ln 1+
f
f
hay Mel ( f )=m=2595 log 10 1+
700
700
)(
Tn s Mel
1.3.2.2 Thc hin trch c trng bng phng php MFCC
))
(Mel)
a) Frame Blocking
Tn hiu c cht thnh tng frame N mu vi chng lp M mu.
Thng ly M = 1/3N
(Ta ly N=512 d cho vic tnh FFT v M=100 )
b) Ca s ho
Ca s ha cc frame, nhm gim s gin on ca tn hiu ti u v cui mi
frame. Hay ni cch khc l gim dn tn hiu v 0 ti cc khong bt u v kt
thc ca mi khung.
Ca s thng c dng l ca s Hamming.
~
x 1 ( n )=x 1 ( n ) W (n)
vi 0 n N1
X n = x k e2 jkn/ N vi n=0,1,2 . , N 1
k=0
d) Chuyn i Mel-Frequency
Thc hin chuyn i theo cng thc (Mel).
B ( f )=1127.01 .048 ln 1+
f
700
Trang 18
[
N
S m=log
k=1
X 2 ( k ) H m (k ) vi m {1,2, , M }
Sau thc hin php bin i cosin ri rc DCT (Discrete Cosine Transformation)
ta s thu c cc h s MFCC:
MFCC = DCT (Sm)
Bin i cosin ri rc:
N 1
1
(
[ 2 ) N ] vi u=0,1,2, , N1
C ( u )= ( u ) f ( x ) cos u x +
x=0
Bin i ngc:
[( ) ]
N1
f ( x )= (u)C (u ) cos u x +
u=0
Vi
(u)=
1
vi x=0,1,2, , N1
2 N
1
, u=0
N
2
,u 0
N
Trang 19
1.3.2.3 Mt s vn khc
a) Vn xc nh im u v im cui ca tn hiu (speech detection)
Mc ch ca vic xc nh tn hiu l tch bit cc on tn hiu ting ni cn
quan tm vi cc phn khc ca tn hiu (mi trng, nhiu ). iu ny l rt cn
thit trong nhiu lnh vc. i vi vic t ng nhn dng ting ni, speech
detection l cn thit tch ring on tn hiu l ting ni t to ra cc mu
(pattern) phc v cho vic nhn dng.
Cu hi t ra y l lm sao xc nh chnh xc tn hiu ting ni, t cung
cp mu tt nht cho vic nhn dng. Trong trng hp tn hiu c thu trong
iu kin mi trng gn l tng (gn nh khng c nhiu) th vic xc nh chnh
xc ting ni l vn khng kh. Tuy nhin, thong thng trong thc t, mt vi
vn ny sinh s gy kh khn cho vic xc nh chnh xc. Mt trong nhng vn
in hnh nht l cch pht m ca ngi ni. V d, khi pht m, ngi ni
thng to ra cc m thanh nhn to nh ting chp mi, hi th hoc l ting lch
tch trong ming.
Yu t th 2 lm cho vic xc nh ting ni tr nn kh khn l iu kin mi
trng m ting ni c to ra. Mt mi trng l tng vi nhiu v tp m gn
nh khng c l khng thc t, do vy bt buc phi xem xt vic pht ra ting ni
trong mi trng c nhiu (nh ting my mc, qut, ting x xo ca nhng ngi
xung quanh), thm ch cn trong c trng hp mi trng xung quanh khng n
nh (ting sp ca, ting xe c...)
Yu t cui cng trong vic lm gim cht lng tn hiu l s mt mt trong h
thng truyn tn hiu, nh l cht lng ca knh thng tin, hay mt mt do s
module ho (lng t ho, s ho)
Speech detection thc s quan trng i vi phng php nhn dng da trn so
snh mu (pattern comparison), v cng nng cao cht lng ca mu i vi
phng php HMM hay mng Neuron. Tuy nhin trong ni dung n do ch tp
trung vo HMM v mng Neuron nn khng i su vo vic xc nh tn hiu, tn
hiu ting ni c xc nh ngng 5%
Trang 20
Thi gian ri rc, ngha l vic chuyn t trng thi ny sang trng thi khc
cng mt mt n v thi gian.
Quan st khng tn b nh, ngha l chui cc quan st c xc sut ch ph
thuc vo trng thi ngay trc (nn khng cn lu b nh nhiu).
2.2 Trnh
Phng php tip cn l thuyt thng tin v nhn dng
Hnh 1
Nhn dng l tm cch xc nh c kh nng xy ra ln nht ca chui ngn ng
W, khi cho trc cn c m A, Cng thc:
Trang 22
P(A/W)
P(O/ )
V d 1:
Hnh 5.2
Hnh 5.3
11
12
={a01,a02} , A= a21 a 22
b ( B) b (W )
1
1
v B= b ( B ) b ( W )
2
2
S trng thi
Mt s m hnh thng dng
Trang 24
P(O|)=
P(O,Q |)= P(O|Q,)P(Q |)
Xt chui trng thi c nh Q = q1q2 ...qT
P(O|Q ,)= bq1(o1)bq2(o2) ...bqT (oT )
P(Q |)= q1 aq1q2 aq2q3 ...aqT 1qT
V vy:
P(O|)=
S php tnh cn lm 2T.NT (c NT chui nh vy)
V d: N=5, T=100 2.100.5100 1072 php tnh.
2.3.1 Thut ton tin thut ton li:
a) Thut ton tin :
Thut ton tin t(t) l xc sut chui quan st tng phn tin n thi im t v
trng thi si thi im t vi iu kin m hnh cho:
t (i)= P(o1o2 ...ot,qt = si |)
D dng thy rng:
1(i)= ibi(o1),
P(O|)=
Trang 27
1iN
t+1 (j)=[
] bj (ot+1), 1 t T-1, 1 j N
Hnh 5.7
b) Thut ton li:
Tng t xc nh thut ton li t(i), khi kh nng xy ra ca chui quan st cc
b t thi im t+1 n kt thc, bit trc trng thi si thi im t v vi iu
kin m hnh cho :
Trang 28
1iN
P(O|)=
t(i)=
( t=T1,T2,...,1; 1 iN )
Din t th tc li:
Hnh 5.8
Tm chui trng thi ti u:
Mt tiu chun la chn trng thi ti u qt l cc i ha s trng thi
ng.
Ton t t (i ) l xc sut ca h thng trng thi si ti thi im t, vi iu
kin cho chui quan st O v m hnh
cho:
,
Ch rng n c th biu din di dng sau
Trang 29
quy:
Kt thc:
Trang 30
Trang 31
Hnh 5.10
Khi
Th tc c lng li Baum-Welch
Hnh 5.12
Cc biu thc c lng li vi thut ton Baum-Welch
Trang 33
Nu = {A,B,} l m hnh gc v ={ A , B , } l m hnh c lng li, khi
ta c th chng minh:
M hnh gc xc nh im ti hn ca hm c kh nng xy ra, trong trng hp
=
Hoc:
2) Nhc im :
Gi nh cho rng tt c cc xc sut chi ph thuc duy nht vo trng thi
hin ti th khng ng cho nhng ng dng v ting ni. Mt hu qu l cc HMM
kh c c cc mu pht m r rng v nhng phn phi m thanh trong thc t
ph thuc rt nhiu vo nhng trng thi qu kh. Mt hu qu khc l cc khong
tn ti c to mu khng chnh xc bi phn phi hm m gim thay v bng
phn phi Poisson chnh xc.
Gi nh c lp cho rng khng c s tng quan gia nhng frames no k
tip nhau l khng ng cho nhng ng dng v ting ni. Theo gi nh ny cc
HMM ch kim tra mt frame ting ni mt thi im.
Nhng mu mt xc sut (ri rc hay lin tc) u c chnh xc to
mu cha ti u. c bit l cc mu ri rc phi chu sai s ln.
Trang 35
Trang 36
n tt nghip
Vi
y[n] l mu ra hin ti ca b lc
x[n] l mu ng vo hin ti
x[n-1] l mu ng vo trc
ngha
[x fs] =
wavread(wavfile);
wavwrite(x,fs,wavfile);
sound(x);
x = wavrecord(n, fs)
Ghi m (t micro) vi tn s ly mu fs
v n mu. Kt qu l vector x.
x = filter([1 -0.9375], 1,
x);
B lc thng cao
y = detector(x);
endpoint detection
37
n tt nghip
38
n tt nghip
B d liu t xy dng gm :
10 t n m cc ch s ting Vit (khng, mt, hai... chn).
File wav 16 bit 8kHz, mi t c bn file pht m.
C 40 mu ting ni c s dng hun luyn.
Bt u
Kt hp tt c cc c trng ca tng t to
thnh tp d liu hun luyn
Kt thc
n tt nghip
clear all
clc;
addpath('VOICEBOX')
addpath('HMMs')
nc=16;
p=32;
M = 4;
N = 4;
SM_mat=M*ones(1,N);
so_lan_lap = 5;
%
%
%
%
%
%
% c file wav
traindata = cell(1,10);
for i=0:9
temp = cell(1:M);
for j=1:M
fname = sprintf('Train/s%dt%d.wav',i,j);
[x,fs] = wavread(fname);
x=endcut(x, 500, 0.1);
% ct khong lng
x = filter([1 -0.9375],1,x);
temp{1,j} = x';
end
traindata{1,i+1}=temp;
end
% Huan luyen tu so 0 den so 9
hmmdata = cell(1,10);
for i=1:10
fprintf('\n\nHUAN LUYEN SO %d\n',i-1);
sample=[];
for k=1:length(traindata{i})
x=filter([1 -0.9375], 1, traindata{i}{k});
% Trch c trng ca ting ni
sample(k).data=melcepst(x,fs,'M',nc,p,256,80);
end
hmmdata{i}=hmmtrain(sample,SM_mat,so_lan_lap);
end
save('hmmdata.mat')
40
n tt nghip
Bt u
addpath('VOICEBOX')
addpath('HMMs')
41
n tt nghip
load hmmdata
[fname,pathname]=uigetfile('Test/*.wav');
x = wavread([pathname,fname]);
set(handles.axes1);
plot(x);
grid minor;
x=endcut(x, 500, 0.1);
% ct khong lng
x=filter([1 -0.9375],1,x);
% Trch c trng ca ting ni
m = melcepst(x,fs,'M',nc,p,256,80);
for j=1:10
pout(j)=viterbi(hmmdata{j},m);
end
% Ly xc sut chui quan st ln nht
[d,n] = max(pout);
set(handles.So,'String',num2str(n-1));
Ngoi chc nng hun luyn v nhn dng th chng trnh cn c chc nng pht m
ch s (t khng n chn)
Chy th v kim tra kt qu
Cc tham s c thay i v chy th vi b d liu gm 20 ngi: 13 nam v 7 n.
70
% nhn dng
65
60
32
64
Kch thc codebook
128
n tt nghip
80.5
80
79.5
79
% 78.5
% nhn dng
78
77.5
77
3
77
76
75
74
73
%
72
71
70
69
% nhn dng
kch thc codebook 64. Vi kh nng nhn dng trung bnh l 80%.
C th ni y l mt kt qu cha tt bi bn cnh mt s t nhn dng kh tt (8090%) th nhng t khc m hnh li cho kt qu khng cao. Hay ni cch khc l kh
nng nhn dng cc t khng ng u.
C cc nhm t hay b nhn dng nhm vi nhau : {dng, di}; {tri, chy}; (tin
43
n tt nghip
khc nhau)
Cc tham s chn la cha ti u
Mt s t c cch pht m gn ging nhau.
Nhn xt kt qu :
T nhng kt qu thu c c th thy phng php nhn dng bng mng
KT LUN
Nhn xt kt qu chung ca n
n thc hin c vic xy dng cc m hnh nhn dng ting ni, c th l nhn
dng cc t iu khin ri rc: Tt, Bt, Chy, Dng, Tin, Li, Tri, Phi, Trn, Di.
V tin hnh chy th nghim da trn cc phng php phn tch c trng ca tn
hiu l LPC v MFCC.
Da trn c s d liu thu thp c a ra c mt m hnh nhn dng thch hp
nht.
Tuy nhin vn khng trnh khi mt s hn ch:
S lng mu cn t, nn cha khng nh c s hi t ca thut ton.
Cht lng mu khng cao v khng ng nht do tin hnh thu bng my tnh c nhn
v cc a im khc nhau.
Cn mt s hn ch trong phng php lm vic do thiu kinh nghim.
Vic chuyn i thut ton t Matlab sang C ci t cho DSP vn gp phi mt s sai
s trong tnh ton. Dn n kt qu nhn dng trong thc t khng c cao nh khi
chy th nghim trn my tnh.
44
n tt nghip
hc k thut HN 2005.
Mel frequency ceptral coefficients Wikipedia.org v cc link tham chiu
HMM toolbox for matlab -http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html
Auditory toolbox for matlab - http://www.slaney.org/malcolm/pubs.html
ECE4703 Real-Time DSP Orientation Lab - D. Richard Brown III 2004
V mt s ti liu khc.
45
n tt nghip
46