Professional Documents
Culture Documents
New Microsoft Word Document 1
New Microsoft Word Document 1
cc ng dng ASR
Fernando Pacheco Santana v Rui SEARA
i hc Lin bang Santa Catarina
Brazil
N13DCDT107
N13DCDT119
1. Gii thiu
Trong vi nm qua, nhng tin b trong nhn dng ging ni t ng (ASR) thc y
cc pht trin mt s ng dng thng mi. h thng pht lnh t ng v ging ni ng
dng quay s, v d, ang ngy cng tr nn ph bin hn. Mc d tin b ng k, mt
vn khng h mc tiu ca nhn dng ting ni v hn, ngha l, cng nhn t bt k, ni
bng bt k ngi no, trong bt k ni no, v bng cch s dng bt k mua li v h
thng truyn dn. Trong ng dng thc t, tn hiu ting ni c th b nhim bi ngun
b bin dng khc nhau. Trong cc thit b ngng hot ng, v d, tc dng ca vang v
ting n xung quanh c tng cng ng k vi khong cch ln hn gia loa v
micro. Nu sai lch nh vy l khng c b, chnh xc ca h thng ASR nng c
cn tr ( Acero & Droppo, 2008 ). vn hc m, vi ti nghin cu c ngh
tp trung vo i ph vi tc hi ca nh hng v ting n trong ng dng ASR ( de la
Torre v nhng ngi khc, 2007 ; Huang v nhng ngi khc, 2008 ). Tm li, n gn
hin thi tp trung vo s mnh m ASR n nh hng v ting n c th c xp vo
loi thch nghi vi m hnh, biu din di dng tham s vng chc, v ngn ng xc
tin.
Mc tiu ca chng ny l cung cp cho ngi c mt ci nhn tng quan v hin trng
ca ngh thut v bn ASR vang v ting n, cng nh tho lun v vic s
dng cc phng php nng cao bi pht biu c bit c gng ph v vn ny.
Nh vy, chng ti chn s dng php tr quang ph, m c xut trong vn
hc nng cao bi pht biu suy thoi bi vang v ting n (Boll, 1979; Lebart &
Boucher, 1998; Habets, 2004). Hn na, c tnh n cc h thng ASR chia s mi quan
tm tng t v vn ny, mt cch tip cn nh vy cng c p dng thnh cng
nh mt giai on tin x l trong cc ng dng ny.
Chng ny c t chc nh sau. Phn 2 c trng cho hiu ng vang v ting n trn
cc thng s ngn lun. Tng quan v cc phng php b vang v ting n trong h
thng ASR c tho lun ngn gn trong phn 3, bao gm phn loi v so snh gia
phng php tip cn khc nhau. Mt cuc tho lun ca php tr quang ph p dng
gim ting vang c trnh by trong phn 4. Trong phn m chng ta xem xt lm th
no iu chnh cc thng s ca thut ton; chng ti cng phn tch nhy c
lng cc li v nhng thay i trong cc phn ng phng. Cc hiu ng kt hp ca
vang v ting n cng c nh gi. Cui cng, phn kt lun c trnh by trong phn
5.
2. m vang v ting n
Truyn thng li ni l rt t nhin vi con ngi m chng ta thng khng nhn thc
c mt s hiu ng. Trc khi t c mt micro hoc tai ca ngi nghe, tn hiu
pht biu c th c sa i bi mi trng m h ang tuyn truyn (bao vy). Trong
bung khng phn x l tng, tn hiu sau ch c mt ng i t ngun n ngi
nhn. Nhng trong phng in hnh, cc b mt (tng v ni tht) phn nh nhng
m thanh pht ra; micro nhn c mt lung tn hiu phn x t nhiu ng truyn.
Cc tp hp ton b phn x c gi vang. Mc d trong chng ny chng ta s tho
lun v cc phng php gim hiu ng ny, vang khng phi l bt li tt c cc
ln. Itmay cung cp cho ngi nghe cm gic khng gian ca bao vy (Everest, 2001); n
cng lm tng c "liveness" v "m p" ca cn phng, c bit quan trng trong m
nhc. Mt khc, vang trong d tha gy thit hi ca tnh d hiu v r rng, lm tn hi
n thng tin lin lc hoc biu din m nhc. Hiu qu ca vang c th c m hnh
ha nh vic x l mt tn hiu bng mt h thng thi gian bt bin tuyn tnh. Hot
ng ny c i din bi chp gia phng p ng xung (RIR) v tn hiu ban u, th
hin nh
y(n) = x(n) * h(n)
(1)
trong y (n) i din cho tn hiu ting ni xung cp, x (n), bn u (khng c suy
thoi) tn hiu ting ni, h (n) biu th p ng xung phng v * m t php tnh tch chp
ng. Trong phng php ny, phng vang l hon ton c trng bi cc RIR. Hnh. 1
RIR cho thy mt phn ng xung in hnh c o bng phng. Mt RIR c th c
thng chia thnh ba phn: cc hi p trc tip, phn x u tin, v nh hng tr. .
Lng nng lng v s chm tr ca mi phn x gy ra nhng tc nghin cu m khc
nhau phn x ban u (hoc u vang) l m integratedby tai, cng c m thanh trc tip.
Do suy ngh ban u khng gii thiu quang ph phng, mu sc ca quang ph bi din
vn xy ra ( Huang v nhng ngi khc, 2008 ). m vang tr (hoc ui m vang) gy
ra mt nh hng khc nhau gi l chng cho che. tn hiu ting ni th hin mt ng
lc hc t nhin vi cc khu vc trnh by cc mc nng lng ng k khc nhau, nh
N nh gi kim tra y mt v d thc t. Hnh. 2 (a) v (b) minh ha, tng ng,
cc tn hiu ting ni ph hp vi cch ni "nhp nm mi mt" v ph lin kt. lu
trong nhng con s ng lc cp trong thi gian v cu trc hi ha r rng vi cng
hng pht biu nh du bi cc ng ti hn trong nh ph [xem hnh. 2 (b)]. nh
hng bng phng php nhn to c kt hp cht ch n tn hiu ting ni ban u,
bi ban u on bi pht biu ny vi cc RIR hin th trong hnh. 1. Hnh. 2 (c) v (d)
cho thy, tng ng, phin bn di v ph tng ng. By gi, quan st trong hnh. 2 (c)
l tn hiu c bi trong thi gian, vi hu nh khng c khong cch gia cc m v.
Ngoi ra, nhn thy s kh khn xc nh cc cng hng trong hnh. 2 (d).
V vy, lm th no o lng mc vang hoc lm th no nh gi cht lng
m thanh ca mt cn phng? nhiu nghin cu c tin hnh xc nh cc tham
s mc tiu tng quan vi cht lng tng th v n tng ch quan biu hin bi mt
phng. Trong chng ny, chng ti gii thiu hai thng s quan trng c s dng
o lng mc vang ca mt vt ngn cch: thi gian vang v u t l nng lng
cui.
Hnh 2. Tc dng vang li trn tn hiu ting ni. ( a ) tn hiu ting ni Nguyn bn ph
hp vi cch ni" nhp vo nm mi mt" v ( b ) lin kt nh quang ph. ( c ) Vang li
phin bn tn hiu trc y cng v ( d ) tng ng nh quang ph.
Thi gian vang (, T60 RT60 hoc RT60 ) RT c nh ngha l khong thi gian cn cho
vang li phn r 60 dB t mc ca m tham kho. N l tng ng v mt th cht
n kch thc bung cng nh vi quyn s hu m thanh ca trang b cht liu tng.
Vic o thi gian vang l tnh ton qua ng cong phn r t c t nng lng RIR (
Everest, 2001 ). Kt qu c th tu tc hnh lin quan n hoc s o bng thng rng
hoc tp hp gi tr ph hp vi tn s - ph thuc thi gian vang ( v d nh, 500 RT
ph hp vi timeat vang li di tn s tm a vo 500 Hz ). cung cp cho ngi c
tiu chun oftypical khi nim, bung nhim v 60 T t 200 MS n 600 c, trong khi
nh th ln c th trin lm 60 T khong 3 s ( Everest, 2001 ). Mt s mc tiu ch bo
vt knh khc ca s d hiu ting ni hay m nhc phn bit c gi l sm t s
nng lng mun ( ting ni ) hoc kim tng minh ( nhc ) ( SooHoo & Chesnokov,
1998 ), c nh ngha l
ni p(t) biu th p sut m tc thi v T ngay lp tc thi gian coi nh l ngng gia
vang sm v mun. i vi nh gi ca li ni, n l bnh thng xem xt 50 (50
ms), CT = trong khi 80 C l mt thc o cho r rng m nhc (Chesnokov & SooHoo,
1998). By gi, xem xt s tch bit gia vang sm v mun, h(n) c th c th hin
nh
Phn loi cc phng php c s dng trong ng dng nhn dng ging ni cho vang
v kh ting n.
3.1 Ci thin ging ni
Phng php ci thin ging ni c gng giariquyeets vi cc vn vang v ting
n
trc khi tn hiu t n front-end. Lm nh vy cng ging nh nh mt giai on tin
x l trong h thng ASR. Phng php trong loi ny c th c phn loi bng ca
s micro h cn hot ng, dn n hai lp: duy nht 1 microphone v nhiu
microphone (sau ny thut ng goi l mng microphone). Chng ti m t ngn gn
y mt s k thut hin ti: to chm sng, l c ngc, lc thch nghi kurtosisbased, lc
da trn dereverberation harmonicity, v tr quang ph.Chm l mt mng micro theo
phng php c in (Darren et al., 2001). Tn hiu t mi micro c tr hon v kt
hp mt cch chnh xc (bng tng cc thut ton n gin). Nh mt h qu, cc thut
ton lin quan c hng cc mn ngun m thanh, tng cng tn hiu ting ni v
gim ting vang v ting n t cc hng khc. Mc d tng t l nhn dng li ni n
o, tc dng tt nh th s khng t c i cc m thanh b di li , bi v cc thut
ton mng micro thng thng cho rng cc mc tiu v cc tn hiu khng mong mun
l khng tng quan (khng c tht forreverberation). Trong mt cch tip cn mi, c
gi l kh nng ti a ha chm sng (LIMABEAM) (Seltzer et al., 2004), cc thut ton
to chm sng c iu khin bi cc cng c nhn dng ting ni. Cch tip cn ny
chng minh mt li th tim nng so vi tiu chunk thut beamforming cho cc ng
dng ASR.
Phng php da trn lc ngc c hai giai on: c lng ca xung phn ng
gia ngun v mi micro v ng dng ca mt deconvolutionoperation. Trong s cc
phng php khc, c tnh c th c thc hin bng cc k thut Cepstral hoc mt
mng li cc s khng (Pacheco & SEARA, 2005) (1991 Bees et al.); Tuy nhin, mt s
mu thc tp c ghi nhn trong cc ng dng thc t, lm suy yu lm vic chnh
(5)
trong Y (k), X (k) v V (k) biu th thi gian Fourier ri rc trong thi gian ngn
(DFT) tng ng vi y (n), x (n) v v (n). tng chnh ca php tr quang ph l
phc hi x(n) sa i ch ln ca Y (k). Qu trnh ny c th c m t nh
mt hot ng lc quang ph.
v
|X ( k )| =G(k)|Y (k )|
l mt chc nng t c.
Sung. 3 cho thy mt s khi ca mt th tc chung ca php tr quang
ph. Cc tn hiu n o y (n) l ca s v DFT ca n c tnh ton. Cc chc nng
tng sau c c tnh bng cch s dng cc mu hin ti n cng , tng cng
tn hiu cng trc v s ting n. Lu rng cc giai on ca Y (k) [i din
bi Y (k)] vn khng thay i, l mt u vo cho cc DFT (IDFT) khi nghch
o. Cc tn hiu tng cng c thu c kt tng cng ln v giai on
ca Y (k), x l chng bng khi IDFT cng vi mt hot ng b sung- chng ln nhau
; sau s b sung cho cacs ca s
.
|X ( k )| =|Y (k )| |V (k)|
hoc thm ch
2
1
, SNR( k)>1
SNR(k )
0, con lai
2
|Y (k)|
SNR( k )=
2
Vi
|V (k )|
{[
v
1
2
G(k )= 1
SNR(k )
)]
1
v
E
n+l
m=
n
r y (l)=
k=
Xem xt mt RIR m hnh bng cch iu chnh mt chui ngu nhin zero-mean vi
mt s m (Lebart & Boucher, 1998), ngi ta c th vit
h(n ) = w( n )e nu( n)
(14)
3ln 10 .
T
60
2 n
w e
E
(k +ml)
( km+l )
r y (l)= E [x ( k) x (k +l) 2w e2 k ]
k=
r y (l)=e
k=
2
w
2 k
2 n
E [x (k ) x(k +l) e + e
k=n N d+ 1
E [x (k ) x (k +l) 2w e2 k ]]
(23)
S y (n , k )=e
cch s dng cc phng php hnh nh ni ting m hnh phn ng phng acoustic
(Allen & Berkley, 1979). Cu hnh phng c s dng trong cc th nghim m phng
c cho trong bng 1.
Trong giai on tr quang ph, tn hiu xung cp l phn on thnh 25 ms
khung, vi mt chng cho ca 15 ms, v trng bi mt ca s
Hamming. T ngng c c nh trong 40 ms. Chng ti xem xt ln tr ( = 1),
k t khi cng trnh nghin cu trc y thu c kt qu rt tt vi cu hnh ny
(Habets, 2004). T ph cng bin i v (bn gc) tn hiu giai on, nhng tn hiu
tng cng thu hi mt thut ton chng cho v thm.
Tham s
Phng 1
Phng 2
Phng 3
773.5
683
983
Kch thc
V tr loa
V tr mic
0.9
0.9
0.9
0.6
0.6
0.9
0.68
0.73
0.83
Tng
H s phn hi
Kt qu
Nn v trn
nh
Ne
100
Ns
. Trc cng trnh nghin cu (Lebart & Boucher, 1998; Habets, 2004) gi bng 1.
Ngc li ln th em, chng ta s dng cng thc chung cho bi (11). Chng ti
Kt qu c trnh by trong Bng 2. V ct "m khng x l", chng ta thy rng, ngay
c nhng thay i nh v tr loa nh hng n hiu sut ASR.
Ti liu phn hi
v tr loa chuyn
trong trc x bng
v tr loa chuyn
trong trc y bng
SER(%)
Khng c
C tr quang ph
64.4
41.2
-0.50m
64.4
44.8
-0.25m
60.8
45.6
+0.25m
47.2
26.8
+0.50m
40.0
29.6
-0.50m
46.8
29.6
-0.25m
52.0
33.6
+0.25m
65.6
48.8
+0.50m
69.2
51.2
SER (%)
iu kin kim tra
Tr quang
ph
Tin hnh
Khng tr
quang ph
ch c vang
64,4
41.2
75,6
59,6
85.6
75.2
97.6
92,4
84,0
66,8
91,6
80.0
98.4
95.6
5. Nhn xt kt lun
Chng ny c c trng nh hng ca ting vang v ting n trn hiu nng h
thng ASR. Chng ti cho thy tm quan trng ca vic x l vi s gim st
ci thin hiu sut ASR trong ng dng thc t. Mt tng quan v cc phng php tip
cn dereverberation v gim nhiu hin nay c gii quyt, cc phng php phn
loi theo quan im ca hot ng trong chui nhn dng ging ni. Vic s dng cc
php tr quang ph p dng cho dereverberation v gim nhiu trong cc h thng ASR
c tho lun, lm pht sinh mt cng thc ph hp iu tr vn tc ng vo
ny. Chng ti nh gi cc phng php s dng xem xt t l li cu trn mt nhim
v nhn dng chui s, cho thy rng t l nhn dng c th c ci thin ng k bng
cch s dng php tr quang ph. Cc tc ng vo s la chn ca cc tham s thut
ton c nh gi trong iu kin mi trng khc nhau thc hin. Cui cng,
iu quan trng l cp n ting vang v ting n vn trong h thng ASR tip tc
l mt ch y thch thc cho cng ng x l tn hiu.