You are on page 1of 25

S phn x v gim nhiu k thut cho

cc ng dng ASR
Fernando Pacheco Santana v Rui SEARA
i hc Lin bang Santa Catarina
Brazil

Thnh Vin nhm:


V Thnh Trung

N13DCDT107

Nguyn Thanh Vin

N13DCDT119

1. Gii thiu
Trong vi nm qua, nhng tin b trong nhn dng ging ni t ng (ASR) thc y
cc pht trin mt s ng dng thng mi. h thng pht lnh t ng v ging ni ng
dng quay s, v d, ang ngy cng tr nn ph bin hn. Mc d tin b ng k, mt
vn khng h mc tiu ca nhn dng ting ni v hn, ngha l, cng nhn t bt k, ni
bng bt k ngi no, trong bt k ni no, v bng cch s dng bt k mua li v h
thng truyn dn. Trong ng dng thc t, tn hiu ting ni c th b nhim bi ngun
b bin dng khc nhau. Trong cc thit b ngng hot ng, v d, tc dng ca vang v
ting n xung quanh c tng cng ng k vi khong cch ln hn gia loa v
micro. Nu sai lch nh vy l khng c b, chnh xc ca h thng ASR nng c
cn tr ( Acero & Droppo, 2008 ). vn hc m, vi ti nghin cu c ngh
tp trung vo i ph vi tc hi ca nh hng v ting n trong ng dng ASR ( de la
Torre v nhng ngi khc, 2007 ; Huang v nhng ngi khc, 2008 ). Tm li, n gn
hin thi tp trung vo s mnh m ASR n nh hng v ting n c th c xp vo
loi thch nghi vi m hnh, biu din di dng tham s vng chc, v ngn ng xc
tin.
Mc tiu ca chng ny l cung cp cho ngi c mt ci nhn tng quan v hin trng
ca ngh thut v bn ASR vang v ting n, cng nh tho lun v vic s
dng cc phng php nng cao bi pht biu c bit c gng ph v vn ny.
Nh vy, chng ti chn s dng php tr quang ph, m c xut trong vn
hc nng cao bi pht biu suy thoi bi vang v ting n (Boll, 1979; Lebart &
Boucher, 1998; Habets, 2004). Hn na, c tnh n cc h thng ASR chia s mi quan
tm tng t v vn ny, mt cch tip cn nh vy cng c p dng thnh cng
nh mt giai on tin x l trong cc ng dng ny.
Chng ny c t chc nh sau. Phn 2 c trng cho hiu ng vang v ting n trn
cc thng s ngn lun. Tng quan v cc phng php b vang v ting n trong h
thng ASR c tho lun ngn gn trong phn 3, bao gm phn loi v so snh gia
phng php tip cn khc nhau. Mt cuc tho lun ca php tr quang ph p dng

gim ting vang c trnh by trong phn 4. Trong phn m chng ta xem xt lm th
no iu chnh cc thng s ca thut ton; chng ti cng phn tch nhy c
lng cc li v nhng thay i trong cc phn ng phng. Cc hiu ng kt hp ca
vang v ting n cng c nh gi. Cui cng, phn kt lun c trnh by trong phn
5.

2. m vang v ting n
Truyn thng li ni l rt t nhin vi con ngi m chng ta thng khng nhn thc
c mt s hiu ng. Trc khi t c mt micro hoc tai ca ngi nghe, tn hiu
pht biu c th c sa i bi mi trng m h ang tuyn truyn (bao vy). Trong
bung khng phn x l tng, tn hiu sau ch c mt ng i t ngun n ngi
nhn. Nhng trong phng in hnh, cc b mt (tng v ni tht) phn nh nhng
m thanh pht ra; micro nhn c mt lung tn hiu phn x t nhiu ng truyn.
Cc tp hp ton b phn x c gi vang. Mc d trong chng ny chng ta s tho
lun v cc phng php gim hiu ng ny, vang khng phi l bt li tt c cc
ln. Itmay cung cp cho ngi nghe cm gic khng gian ca bao vy (Everest, 2001); n
cng lm tng c "liveness" v "m p" ca cn phng, c bit quan trng trong m
nhc. Mt khc, vang trong d tha gy thit hi ca tnh d hiu v r rng, lm tn hi
n thng tin lin lc hoc biu din m nhc. Hiu qu ca vang c th c m hnh
ha nh vic x l mt tn hiu bng mt h thng thi gian bt bin tuyn tnh. Hot
ng ny c i din bi chp gia phng p ng xung (RIR) v tn hiu ban u, th
hin nh
y(n) = x(n) * h(n)

(1)

trong y (n) i din cho tn hiu ting ni xung cp, x (n), bn u (khng c suy
thoi) tn hiu ting ni, h (n) biu th p ng xung phng v * m t php tnh tch chp
ng. Trong phng php ny, phng vang l hon ton c trng bi cc RIR. Hnh. 1
RIR cho thy mt phn ng xung in hnh c o bng phng. Mt RIR c th c
thng chia thnh ba phn: cc hi p trc tip, phn x u tin, v nh hng tr. .
Lng nng lng v s chm tr ca mi phn x gy ra nhng tc nghin cu m khc
nhau phn x ban u (hoc u vang) l m integratedby tai, cng c m thanh trc tip.
Do suy ngh ban u khng gii thiu quang ph phng, mu sc ca quang ph bi din
vn xy ra ( Huang v nhng ngi khc, 2008 ). m vang tr (hoc ui m vang) gy
ra mt nh hng khc nhau gi l chng cho che. tn hiu ting ni th hin mt ng
lc hc t nhin vi cc khu vc trnh by cc mc nng lng ng k khc nhau, nh

xy ra gia cc nguyn m v ph m. m vang ui lm gim ng lc ny, lm gim


cc nng lng trong mt khong thi gian ln v mt n m thanh nng lng thp hn.

N nh gi kim tra y mt v d thc t. Hnh. 2 (a) v (b) minh ha, tng ng,
cc tn hiu ting ni ph hp vi cch ni "nhp nm mi mt" v ph lin kt. lu
trong nhng con s ng lc cp trong thi gian v cu trc hi ha r rng vi cng
hng pht biu nh du bi cc ng ti hn trong nh ph [xem hnh. 2 (b)]. nh
hng bng phng php nhn to c kt hp cht ch n tn hiu ting ni ban u,
bi ban u on bi pht biu ny vi cc RIR hin th trong hnh. 1. Hnh. 2 (c) v (d)
cho thy, tng ng, phin bn di v ph tng ng. By gi, quan st trong hnh. 2 (c)
l tn hiu c bi trong thi gian, vi hu nh khng c khong cch gia cc m v.
Ngoi ra, nhn thy s kh khn xc nh cc cng hng trong hnh. 2 (d).
V vy, lm th no o lng mc vang hoc lm th no nh gi cht lng
m thanh ca mt cn phng? nhiu nghin cu c tin hnh xc nh cc tham
s mc tiu tng quan vi cht lng tng th v n tng ch quan biu hin bi mt
phng. Trong chng ny, chng ti gii thiu hai thng s quan trng c s dng
o lng mc vang ca mt vt ngn cch: thi gian vang v u t l nng lng
cui.

Hnh 2. Tc dng vang li trn tn hiu ting ni. ( a ) tn hiu ting ni Nguyn bn ph
hp vi cch ni" nhp vo nm mi mt" v ( b ) lin kt nh quang ph. ( c ) Vang li
phin bn tn hiu trc y cng v ( d ) tng ng nh quang ph.
Thi gian vang (, T60 RT60 hoc RT60 ) RT c nh ngha l khong thi gian cn cho
vang li phn r 60 dB t mc ca m tham kho. N l tng ng v mt th cht
n kch thc bung cng nh vi quyn s hu m thanh ca trang b cht liu tng.
Vic o thi gian vang l tnh ton qua ng cong phn r t c t nng lng RIR (
Everest, 2001 ). Kt qu c th tu tc hnh lin quan n hoc s o bng thng rng
hoc tp hp gi tr ph hp vi tn s - ph thuc thi gian vang ( v d nh, 500 RT
ph hp vi timeat vang li di tn s tm a vo 500 Hz ). cung cp cho ngi c
tiu chun oftypical khi nim, bung nhim v 60 T t 200 MS n 600 c, trong khi
nh th ln c th trin lm 60 T khong 3 s ( Everest, 2001 ). Mt s mc tiu ch bo
vt knh khc ca s d hiu ting ni hay m nhc phn bit c gi l sm t s
nng lng mun ( ting ni ) hoc kim tng minh ( nhc ) ( SooHoo & Chesnokov,
1998 ), c nh ngha l

ni p(t) biu th p sut m tc thi v T ngay lp tc thi gian coi nh l ngng gia
vang sm v mun. i vi nh gi ca li ni, n l bnh thng xem xt 50 (50
ms), CT = trong khi 80 C l mt thc o cho r rng m nhc (Chesnokov & SooHoo,
1998). By gi, xem xt s tch bit gia vang sm v mun, h(n) c th c th hin
nh

ni hd(n) lu mt phn ca p ng xung tng ng vi phn ng trc tip cng vi


u vang, hr(n),phn khc ca cc phn ng lin quan n nhng nm cui vang, ds N d=
fs T l s mu ca cc hi p h d(n), v fs, t l ly mu xem xt. Bn cnh vang, m
thanh cng b thoi ha do ph ting n. Ting n ngun, nh qut, ng c, trong s
nhng ngi khc, c th cnh tranh vi cc ngun tn hiu trong mt bao vy. Do ,
chng ta cng nn bao gm cc hiu ng ting n trn cc m hnh tn hiu xung cp
y(n), vit li (1) by gi l:
y(n)=x(n)*h(n)+v(n)
ti v(n) c trng cho tn hiu thm vo

3. Tng quan v dereverberation v cc phng php gim nhiu


Trc khi gii thiu cc phng php khc phc ting vang v ting n, chng
ti xin trnh by mt cch tng quan ngn gn v cng ngh ASR hin nay. Th h hin
ti ca h thng ASR c da trn mt m hnh thng k (Rabiner & Juang, 2008). N
c ngha l, trc khi h thng c khai trin (hoc kim tra), s l mt giai on u
tin v mang tnh bt buc. Trong thi gian o to, cc m hnh d kin s c xem
xt thit lp( da vo cc bi bo co v ti liu lin quan). Qua ,mi m hnh i din
cho mt m hnh tham chiu ca tng n v c s (t hoc m v, v d). nhn ra mt
tn hiu ting ni nht nh, h thng s nh gi s tng t gia tn hiu vi mi m
hnh c o to trc y. T kt hp tt c cc bc ta s thu c kt qu.
Trong sut thi gian o to, m hnh ny cng kt hp cc c tnh m thanh t
ghi m, chng hn t mc vang v ting n. Nu h thng c trin khai trong iu
kin tng t, c th ni vic o to v kim tra khp vi nhau v t l nhn thnh cng
c mong i rt cao. Tht khng may, nhng iu kin trn c th khc nhau t giai
on o to s dng hiu qu, dn n khng ph hp gia cc m thanh quan trng,
v lm suy yu hiu sut ASR. Xt mt trng hp thc t, nu m hnh c thc hin
vi cc m thanh sch c ghi li trong mt studio, v h thng c s dng ti mt
phng vn phng n o, t l nhn dng c th b suy gim. V vy, ci thin ca
mnh h thng, vn khng ph hp gia o to vkim tra phi c gii quyt.
Trong vi thp k qua, mt n lc nghin cu ng k c tin hnh gim s
khng khp gy ra bi ting vang v ting n, nhiu.
C mt s cch phn loi cc bin php nhm gim gy ra nhng rung ng v
ting n trong h thng ASR. Trong chng ny, chng ti chn phng php nhm da
vo v tr trong chui nhn dng ging ni. Tin trnh ASR c th c tm chia thnh hai
phn: front-end v back-end. Cc thng s m c ly cc module front-end, cc tn
hiu u vo v cc m hnh m thanh c tnh theo back-end (hoc b gii m). Xt
vic phn loi ny, khng ph hp gy ra bi vang v ting n c th gim hoc trc cc
giai on front-end, hoc trong giai on front-end hay ngay c cc m un back-end,
nh th hin trong hnh. 3. Do , phng php ny c chia thnh cc nhm: tng
cn ging ni , ci tin qu trnh tham s ha v thch ng vi m hnh.
Mi nhm c nu r trong phn sau:

Phn loi cc phng php c s dng trong ng dng nhn dng ging ni cho vang
v kh ting n.
3.1 Ci thin ging ni
Phng php ci thin ging ni c gng giariquyeets vi cc vn vang v ting
n
trc khi tn hiu t n front-end. Lm nh vy cng ging nh nh mt giai on tin
x l trong h thng ASR. Phng php trong loi ny c th c phn loi bng ca
s micro h cn hot ng, dn n hai lp: duy nht 1 microphone v nhiu
microphone (sau ny thut ng goi l mng microphone). Chng ti m t ngn gn
y mt s k thut hin ti: to chm sng, l c ngc, lc thch nghi kurtosisbased, lc
da trn dereverberation harmonicity, v tr quang ph.Chm l mt mng micro theo
phng php c in (Darren et al., 2001). Tn hiu t mi micro c tr hon v kt
hp mt cch chnh xc (bng tng cc thut ton n gin). Nh mt h qu, cc thut
ton lin quan c hng cc mn ngun m thanh, tng cng tn hiu ting ni v
gim ting vang v ting n t cc hng khc. Mc d tng t l nhn dng li ni n
o, tc dng tt nh th s khng t c i cc m thanh b di li , bi v cc thut
ton mng micro thng thng cho rng cc mc tiu v cc tn hiu khng mong mun
l khng tng quan (khng c tht forreverberation). Trong mt cch tip cn mi, c
gi l kh nng ti a ha chm sng (LIMABEAM) (Seltzer et al., 2004), cc thut ton
to chm sng c iu khin bi cc cng c nhn dng ting ni. Cch tip cn ny
chng minh mt li th tim nng so vi tiu chunk thut beamforming cho cc ng
dng ASR.
Phng php da trn lc ngc c hai giai on: c lng ca xung phn ng
gia ngun v mi micro v ng dng ca mt deconvolutionoperation. Trong s cc
phng php khc, c tnh c th c thc hin bng cc k thut Cepstral hoc mt
mng li cc s khng (Pacheco & SEARA, 2005) (1991 Bees et al.); Tuy nhin, mt s
mu thc tp c ghi nhn trong cc ng dng thc t, lm suy yu lm vic chnh

xc ca k thut ny. V o ngc ca cc RIR, mt cch tip cn hiu qu c


xut bi Radlovi & Kennedy (2000), trong khc phc c nhng nhc im do
giai on nonminimum , c im c trong mt worldresponses.
Mt cch tip cn th v c trnh by bi Gillespie et al. (2001), trong c im ca
tn hiu ting ni c s dng ci thin qu trnh ti rung ng. , cc tc gi
chng minh rng cc d lng t mt phn tch d on tuyn tnh ca ting n th hin
sch ti nh mi xung thanh hu. Mt b lc thch ng c th c s dng gim
thiu s phn tn ny (o bng nhn), lm gim hiu lc vang. Cc thut ton tng
t cng c s dng nh l mt giai on u tin c tin hnh bi Wu & Wang
(2006), cho thy kt qu kh quan trong vic gim hiu ng vang khi T60 l gia 0,2 v
0,4 s.
Cc cu trc hi ha ca tn hiu ting ni c th c s dng trong harmonicity
dereveberation (HERB) (Nakatani et al., 2007). Trong phng php ny, ngi ta cho
rng nhng tn hiu ban u c luu tr bi s ca tn s c bn, v v vy ta c th
c tnh phn ng phng c th thu c. Hn ch chnh ca k thut ny l s lng d
liu cn thit cho vic t c mt c lng tt. tr quang ph l mt nng cao ting
ni, s c tho lun chi tit trong phn 4.
3.2 Tnh nng tng cng m thanh
Trong lp cc k thut ny , cc tng chnh i din cho cc tn hiu vi cc thng s
t nhy cm vi nhng thay i trong iu kin m thanh nh ting vang v ting n. Mt
cch tip cn rt n gin v ph bin c gi l Cepstral normalization(CMN). Trong
k thut ny, mt trung bnh ca cc vect tham s ban u thu c. Cc vector trung
bnh kt qu c tr t mi vector tham s. Do , cc thng s bnh thng trnh by
mt mc trung bnh di hn bng s khng. C th chng minh rng phng php ny
ci thin s vng mnh i vi cc hiu ng lc tuyn tnh gii thiu vi micro v cc
knh truyn dn qua cc tn hiu ting ni (Droppo & Acero, 2008). N cng c
kim chng bng thc nghim rng CMN gim hiu ng ting n ph, thm ch mc d
n khng c hiu qu cho dereverberation (Droppo & Acero, 2008).
V vn m vang, mt s tc gi cho rng n khng th c xc nh trong
phm vi khung ngn phn tch (theo th t ca 25 ms), khi RIR c di ln. C hai
cch tip cn khc phc vn ny: ph tng i (RASTA) (Hermansky & Morgan,
1994) v iu ch ph (MSG) (Kingsbury, 1998). H xem xt cc bin chm trong quang
ph, c xc nh khi mt kch thc khung hnh theo th t 200 ms c s dng.
nh gi ASR ch ra rng phng php tip cn nh ci thin chnh xc nhn cho

iu kin va di (Kingsbury, 1998). Mt k thut tham s thay th, tn l cch tip cn


tnh nng cn thiu (Palomki et al, 2004;. Raj & Stern, 2005), cho thy i din cho cc
tn hiu u vo trong mt li tn s. Cc thnh phn khng ng tin cy hoc thiu (do
suy thoi) c xc nh v loi b hoc thm ch thay th bi mt c lng ca tn
hiu sch. Trong trng hp ca m vang, cc thnh phn ng tin cy l cc tn hiu
trc tip v phn x u tin l mnh hn. Vic o to c thc hin vi ting noi
trong sch v khng c nhu cu gi cho o to li m hnh m thanh cho tng loi
suy thoi. V vy, vic xc nh cc thnh phn khng ng tin cy c thc hin ch v
cng nhn cng nhn. thch ng 3 mu
Mc tiu chnh ca phng php tip cn m hnh thch ng gim thiu s
khng ph hp gia o to v th nghim giai on bng cch p dng mt s loi bi
thng trong m hnh tham chiu. Phng php u tin l bao gm vang v ting n
trong o to, tc l, lm nhim cc ti liu o to vi cng mt loi suy thoi d kin
trin khai. m vang v ting n c th c ghi nhn trong vic mua li ngn lun
corpora hoc thm ch c bao gm nhn to. d n quc t ghi nhn cc ti liu
o to khc nhau iu kin, chng hn nh xe hi bn trong d n SpeechDat-Car
(Moreno et al., 2000) hoc trong mi trng khc nhau trong cc d n SpeeCon (Iskra et
al., 2002). bao gm nhn to ting vang cho php m hnh to ra vi mc khc nhau
ca vang (Couvreur & Couvreur, 2004), cho php nh vy, chn ph hp m hnh tt
nht trong qu trnh trin khai.
L mt thay th o to li m hnh cho mi iu kin ting n, cc k thut
m hnh song song kt hp (PMC) c th c p dng. Cch tip cn ny c gng
c lng mt m hnh bi pht biu n o t hai m hnh khc: mt hun luyn trc
y, da trn bi pht biu trong sch, v mt m hnh nhiu, thu c bng cch c
tnh trn mng t cc phn on ting n (Gales & Young, 1995). kt qu thch ng y
ha hn c th t c bng cch s dng mt lng nh d liu, trong khi nhc im
chnh ca phng php tip cn PMC l mt gnh nng tnh ton ln. Mt iu chnh tt
hn cng c th c thc hin vi mt tp hp cc d liu thch ng trong mt phng
php c lng hu ti a (Omologo et al., 1998). Mt gia tng ng k t l cng nhn
l t c, mc d mt micro duy nht c s dng mua li tn hiu; Tuy nhin, s
vng mnh thay i trong iu kin mi trng vn cn l mt vn kh khn
(Omologo et al., 1998).
4. Tr quang ph
Tr quang ph l mt ni ting k thut nng cao bi pht biu, m l mt phn
ca cc lp ngn thi gian bin ph (STSA) phng php (Kondoz, 2004). iu g

lm php tr quang ph hp dn l s n gin ca n v phc tp tnh ton thp, thp


thun li cho cc nn tng vi ngun lc hn ch (Droppo & Acero, 2008).
4.1 Thut ton

Trc khi gii thiu php tr quang ph nh mt cch tip cn dereverberation,


chng ta s xem xt xy dng ban u ca n nh l mt k thut gim nhiu. Bt chp
nhng nh hng ca ting vang, mt tn hiu n o trong (4) c th c biu din
trong min tn s l
Y (k ) = X (k ) + V (k)

(5)

trong Y (k), X (k) v V (k) biu th thi gian Fourier ri rc trong thi gian ngn
(DFT) tng ng vi y (n), x (n) v v (n). tng chnh ca php tr quang ph l
phc hi x(n) sa i ch ln ca Y (k). Qu trnh ny c th c m t nh
mt hot ng lc quang ph.
v

|X ( k )| =G(k)|Y (k )|

l mt chc nng t c.
Sung. 3 cho thy mt s khi ca mt th tc chung ca php tr quang
ph. Cc tn hiu n o y (n) l ca s v DFT ca n c tnh ton. Cc chc nng
tng sau c c tnh bng cch s dng cc mu hin ti n cng , tng cng
tn hiu cng trc v s ting n. Lu rng cc giai on ca Y (k) [i din
bi Y (k)] vn khng thay i, l mt u vo cho cc DFT (IDFT) khi nghch
o. Cc tn hiu tng cng c thu c kt tng cng ln v giai on
ca Y (k), x l chng bng khi IDFT cng vi mt hot ng b sung- chng ln nhau
; sau s b sung cho cacs ca s
.

Sung. 3. Khi s ca mt th tc chung ca php tr quang ph.

Cc khi c to v ting n c tnh l phn quan trng nht trong qu trnh


ny v s thnh cng ca k thut ny ph thuc rt nhiu vo vic xc nh li ch y
.Trong phn tip theo, chng ta s tho lun v phng php ny xem xt mt v d
tr ph cng sut. X l tn hiu trong min quang ph in, tc l, = 2, v gi nh
rng tn hiu v ting n c khng tng quan, chng ta c
2

|X ( k )| =|Y (k )| |V (k)|
hoc thm ch
2

|X (k )| =G(k )|Y (k)|

m d ton n gin nht ca vic t c G (k) c cho bi


G(k )=

1
, SNR( k)>1
SNR(k )
0, con lai
2

|Y (k)|
SNR( k )=
2
Vi
|V (k )|

Ti SNR (k) l mt hu tn hiu-to-noise t l v V (k) l c lng ting n.


Mc d cn thit ngn chn t vic tiu cc, cc kp gii thiu bi cc X (k)
iu kin (9) gy ra mt s nhc im. Lu rng li nhun c tnh cho mi khung
v mi ch s tn s c lp. Quan st s phn b ca nhng thnh tu trong mt li
tn s, ngi ta ghi nhn rng cc t bo ln cn c th hin th mc khc nhau ca s
suy gim. Bt thng ny tng lm pht m tn s ngu nhin xut hin v bin mt
nhanh chng (Droppo & Acero, 2008), dn n mt hiu ng gy phin nhiu gi l ting
n m nhc. Nhng nh gi c lng cho G (k) c xut trong vn hc, nhm
gim m nhc ting n. Mt cch tip cn ci tin c tnh t c yu cu c gii
thiu bi Berouti et al. (1979), c a ra bi

{[

v
1
2
G(k )= 1
SNR(k )

)]

1
v

Ti v tng ng vi , oversubtraction v quang ph yu t sn. Cc yu t


oversubtraction kim sot vic gim ting n cn li. Cp thp hn ca ting n l t
c vi cao hn, tuy nhin, ti e l qu ln, cc tn hiu ting ni s b bp mo
(Kondoz, 2004). Cc yu t nn ph tc dng lm gim ting n m nhc, c nh du
trn mt di tn s rng hn (Kondoz, 2004). Mt thng mi trong s la chn cng
c yu cu. Nu l qu ln, ti to khng mong mun khc tr nn r rng hn.
iu quan trng ch ra rng bin dng ging ni v ting n cn li khng th
gim ng thi . Hn na, iu chnh tham s l ph thuc vo ng dng. N c
xc nh bng thc nghim rng mt thc nhim cho thy gia gim ting n v cht
lng bi pht biu c thc hin vi cng sut ph tr ( 2) bng cch s dng
gia 4 v 8,v 0.1 (Kondoz, 2004). iu ny thit lp c coi l y cho ngi
nghe ca con ngi, v, nh mt nguyn tc chung, con ngi c th chu ng c mt
s bin dng, nhng rt nhy cm vi s mt mi gy ra bi ting n. Chng ti s hin
th trong Sec s 4.4 rng cc h thng ASR thng d b bin dng li ni, v v vy
1 c th l mt la chn tt hn cho vic gim t l li
4.2 ng dng ca php tr quang ph cho dereverberation
Mt s thch nghi ca php tr quang ph gn y c xut nng cao
bi pht biu suy thoi bi m vang (Lebart & Boucher, 1998; Habets, 2004). N s
c tho lun chi tit sau ny.
gii quyt cn phng vang bng cch s dng php tr quang ph, mt s
quan h c bn phi c thit lp. Th nht, s t tng quan r y (A) ca tn hiu di l
Xc nh. Do , bt chp nhng tc dng ph ting n, chng ti nhn thy
x (k ) x (m) E [h(nk )h (n+lm)]

E
n+l

m=
n

r y (l)=
k=

Do tnh cht ca tn hiu ting ni v cc RIR, ngi ta c th xem xt x (n) v

Xem xt mt RIR m hnh bng cch iu chnh mt chui ngu nhin zero-mean vi
mt s m (Lebart & Boucher, 1998), ngi ta c th vit
h(n ) = w( n )e nu( n)

(14)

trong w (n) i din cho mt ting n Gaussian zero-mean trng vi phng


sai 2 w, u (n) biu thmcc chc nng n v bc, v l mt hng s gim xc lin
quan n thi vang, c th hin nh (Lebart & Boucher, 1998)
=

3ln 10 .
T

60

Nh vy, cc RHS hn (13) c vit nh


h(nk ) h =e

2 n

w e
E

(k +ml)

( km+l )

Ti (n) i din cho chui mu n.


Sau , thay th (16) vo (13), ta c
n

r y (l)= E [x ( k) x (k +l) 2w e2 k ]
k=

By gi, xem xt cc ngng N d, c nh ngha trong (3), ngi ta c th chia cc


tng trong (17) thnh hai phn. Bng cch y,
2 n

r y (l)=e

k=

2
w

2 k

2 n

E [x (k ) x(k +l) e + e

k=n N d+ 1

E [x (k ) x (k +l) 2w e2 k ]]

Ti r y r (n, n + A) v r y d (n, n + A) l cc hm t tng quan gn lin vi als du


hiu y r (n) v y d (n), tng ng. Tn hiu y r (n) c lin quan ti nhng nm cui vang,
nh l kt qu ca tch chp ca h r (n) v x (n). Bin y d (n) c kt hp vi tn hiu
trc tip v phn x ban u, c ly thng qua cc tch chp ca h d (n) v x(n).
By gi, t (20), gim thi gian mt ph cng sut (PSD) ca tn hiu xung
cp S y (n, k) c th hin nh

S y (N, k) = S y r (N, k) + S y d (N,


k)

(23)

trong S y r (n, k) v S y d (n, k) l


cc
PSDs
tng
ng
vi
sig nals y r (n) v y d (n), tng ng. T (21), gi tr c tnh S y r (n, k) c tnh bng
cch tnh v tr hon vic PSD ca tn hiu ting ni b suy thoi. Do ,
2 N d

S y (n , k )=e

S y (nN d , k ) Sau , gi nh rng y d (n) v y r (n) ar e khng tng

quan, nhng tn hiu di tr c th c


Xem nh mt ting n ph, v cc tn hiu trc tip c th c phc hi thng qua cc
php tr quang ph.
4.3 c lng thi gian vang

thc hin cc th tc c trnh by trc y, mt trong nhng bc u cn


c cc thng , k t khi n c s dng c lng mt ph cng sut ca cui
tn hiu di (24). Cho rng l lin quan n thi gian vang, mt c tnh T 60 t tn hiu
bt.
Mt s phng php c xut gn y cho s c lng ca thi gian
vang. Trong trng hp ny, im m c ngha l ch c cc tn hiu bt c sn. Ti a
nh (ML) phng php c xut cho T 60 c tnh ca Ratnam et al.
(2003) v Couvreur & Couvreur (2004). Kh khn chnh c T 60 l yu cu ca cc
vng im lng gia li ni. c bit trong li pht biu ngn, tnh trng ny c th khng
c p ng, dn n mt li ng k trong d ton T 60.
Trong chng ny, thay v nh gi mt thut ton c th, chng ti la chn
nh gi mc nhy cm ca mt h thng ASR li trong cc c tnh T 60. Kt qu
th nghim cho thy hiu sut ca thut ton tr ph di cc li nh vy c th hin
trong phn tip theo.
4.4 Hiu sut trong h thng ASR
Chng ti s dng phng php tr ph nh mt giai on tin x l trong
mt h thng ASR. Cc nhim v la chn y bao gm nhn bit chui ch s i
din cho s in thoi bng ting B o Nha Brazil. D liu bi pht biu ghi nhn qua
in thoi v ly mu ti 8 kHz c s dng nh l tn hiu ban u. Trong th nghim
ny, chng ti s dng 250 bn ghi m, chp vi mt s din gi. Cc tn hiu ting ni
di c to ra thng qua mt chp tuyn tnh gia cc d liu bi pht biu gc v mt
RIR nht nh. Chng ti xem xt ba p ng xung khc nhau, c thu c bng

cch s dng cc phng php hnh nh ni ting m hnh phn ng phng acoustic
(Allen & Berkley, 1979). Cu hnh phng c s dng trong cc th nghim m phng
c cho trong bng 1.
Trong giai on tr quang ph, tn hiu xung cp l phn on thnh 25 ms
khung, vi mt chng cho ca 15 ms, v trng bi mt ca s
Hamming. T ngng c c nh trong 40 ms. Chng ti xem xt ln tr ( = 1),
k t khi cng trnh nghin cu trc y thu c kt qu rt tt vi cu hnh ny
(Habets, 2004). T ph cng bin i v (bn gc) tn hiu giai on, nhng tn hiu
tng cng thu hi mt thut ton chng cho v thm.
Tham s

Phng 1

Phng 2

Phng 3

773.5

683

983

(2.5, 3.8, 1.3)

(2.0, 3.0, 1.5)

(4.5, 5.0, 1.0)

Kch thc
V tr loa
V tr mic

(3.3, 3.0, 0.7)

(3.0, 3.5, 0.6)

(5.5, 6.5, 0.5)

0.9

0.9

0.9

0.6

0.6

0.9

0.68

0.73

0.83

Tng
H s phn hi
Kt qu

Nn v trn
nh

Bng 1. Cc thng s c s dng t c p ng xung phong.


nh gi c thc hin bng cch s dng mt h thng nhn dng ging ni
HMM da trn loa c lp. Cc th nghim c thc hin vi cc m hnh t trn, mt
cho mi 11 ch s (0-9 ti B o Nha cng thm t "Meia" 1).
Tnh nng m thanh c chit xut bi mt front-end mel-cepstrum pht trin
cho phn phi nhn dng ging ni (DSR) (ETSI, 2002). Front-end ny bao gm mt
giai on tin x l gim ting n s dng mt b lc Wiener (ETSI, 2002). Khai thc
tnh nng ny c thc hin ti mi khung 25 ms, vi mt chng cho ca 15 ms.
T mi phn on, 12 mel tn s h s Cepstral (MFCC) v nng lng c tnh
ton, cng vi cc dn xut th nht v th hai-th t. Do , cc vector tham s cui
cng gm 39 yu t.

Cng nhn c thc hin bi mt b gii m Viterbi vi tm kim chm v t


cui ta (Young et al., 2002).
Cc kt qu ca nhim v nhn dng ging ni c th hin trong iu khon ca t l
li cu (SER), c xc nh nh
SER( )=

Ne
100
Ns

trong N e l s cu ng c cng nhn, v N s l tng s cu trong cc th nghim


(250 nh gi ny). Chng ti quyt nh s dng SER k t khi cng nhn chui ch
s (s in thoi, trong trng hp ca chng ti) mt li trong mt ch s duy nht lm
cho hiu qu cc kt qu cho ton b chui. Lu rng SER lun ln hn hoc bng t
l li t (WER).
i vi cc d liu bi pht biu gc, SER bng 4%. i vi cc d liu di, thu
c bng cch chp ca cc bi pht biu ban u vi cc RIR, SER tng ti 64,4%,
77,6% v 93,6% i vi cc Phng # 1, Phng 2 v Phng # 3, tng ng. Kt qu ny
cng c tm quan trng ca vic i ph vi Hiu ng m vang trong cc h thng ASR.
nh gi tr ph p dng gim ting vang trong cc h thng ASR, chng
ti trnh by cc th nghim m phng sau:
ti) La chn cc yu t oversubtraction v ph yu t sn . y, chng ti xc
minh s kt hp tt nht ca cc thng s xem xt mt ng dng nhn dng ging ni.
ii) Nhy cm vi cc li trong cc c tnh T 60 . K t khi mt d ton chnh xc thi
gian vang c th l kh khn, chng ti nh gi y nhy ca ASR cc li nh
vy.
iii) nh hng ca bin i RIR. Chng ti nh gi nh hng ca phong tro loa, m
ng nhng thay i trong cc RIR.

4.4.1 La chn cc yu t oversubtraction v sn ph


Tham s u tin, chng ti c nh gi l yu t oversubtraction

. Trc cng trnh nghin cu (Lebart & Boucher, 1998; Habets, 2004) gi bng 1.
Ngc li ln th em, chng ta s dng cng thc chung cho bi (11). Chng ti

nh gi cho cc gi tr khc nhau ca v y chng ti cho thy kt qu tt


nht thu c s dng = 0,2 v cc gi tr c th ca T 60 cho mi phng (xem Bng
1). Sung. 4 cho thy SER nh mt chc nng ca cc yu t oversubtraction gia 0,4 v
1,3.
i vi phng # 1 v # 2 phng, kt qu tt nht l thu c vi = 0,7, tng
ng vi mc undersubtraction. i vi phng s 3, kt qu tt nht cng thu c
cho <1.
Nhng kt qu c th cho reverberat gim ion l ph hp vi nhng ngi thu
c trong cc nghin cu v gim ting n c tho lun bi virag (1999) v Chen et
al.(2006). Virag (1999) xc nhn rng cc tham s oversubtraction nn thp hn trong
h thng ASR hn cho ngi nghe ca con ngi. Chen et al. (2006) s dng mt b
lc Wiener cho denoising xem xt cc yu t t hn s thng nht, dn n gim kh
quan mc bin dng trn cc tn hiu qu.
S nh hng ca cc yu t tng quang ph, thng s kim sot mc che
ca ting n m nhc, c th hin trong hnh. 5. i vi ba nh gi phn ng phng,
kt qu tt nht thu c cho = 0,2, tc l, cho rng iu quan trng l phi duy tr mt
mc nht nh ca mt n ting n. Cng lu rng bng cch khng s dng bt k
sn ph ( = 0) SER tng. Nhng kt qu ny ch ra rng h ri ASR ms chu ng ting
n cn tt hn so vi s bin dng vn c kch ng bi vic x l tr quang ph, cung
cp mc ting n khng phi l qu cao.
iv) nh hng ca c hai vang v ting n trn hiu sut ASR. Trong thng sn,
vang thng c kt hp vi ph gia ting n. Chng ti cng Asse ss hiu ng ny
y.
4.4.2 Sai st v nhy trong vic c lng thi vang
Nh tho lun trong Phn 4.2, thi gian vang phi c c lng bi cc
thut ton tr ph. T c tnh ny l ty thuc vo li, iu quan trng l nh gi
nh hng ca cc li nh vy trn hiu sut ASR. Cc nhy sai st trong vic lp d
ton ca T 60 c nh gi ti cc im hot ng = 0,7 v = 0,2. Chng ti s
dng cng mt tp cc RIR nh trong Bng 1. Trong cc thut ton tr quang ph, cc li
trn T 60 c gii thiu bng cch thay i

Sung. 4. Thay i trong SER nh mt chc nng ca cho = 0,2 v tng ng T 60 .

(A) Phng # 1. (B) Phng # 2. (C) Phng # 3. gi tr 0,3-1,3 s s dng cc bc 0,2


s. Sung. 6 trnh by SER v mt bin th nh vy. L tng nht, phng php ny nn
c t nhy cm vi li trong tnh ton ca T60 , t mt c tnh m l rt chi ph i hi
trong thc t. Kt qu t c ch ra rng ngay c i vi mt c tnh chnh xc
ca T 60 , s xung cp hiu sut l vn chp nhn c

Sung. 6. Bin th ca SER nh mt


chc nng ca T 60 s dng = 0,7 v
= 0,2. (A) Phng # 1.

(B) Phng # 2. (C) Phng # 3.

4.4.3 nh hng ca bin i phng


p ng xung gy ra bi mt loa di
chuyn
Cc bin th ca RIR trong mt bao vy c th c phn tch xem xt Room # 1
(Bng 1) v mt loa di chuyn. Cc refere v tr nce ca loa th hin trong Bng 1 c
chuyn 0,5 m (vi mt bc 0.25 m) trong c hai kch thc (chiu di v chiu
rng). Sung. 8 cho thy k hoch mt t ca bao vy, nh du cc v tr ca micro v
(di chuyn) loa. Bng cch s dng cu hnh ny, tm RIR khc nhau thu c. Mt tp
hp cc tn hiu m thanh vang c xc nh convolving mi phn ng phng vi cc
tn hiu u vo t cc thit lp th nghim. Cc quang ph tr alg orithm c cu hnh
vi = 0,7, = 0,2, v T 60 = 0,68 s.

Kt qu c trnh by trong Bng 2. V ct "m khng x l", chng ta thy rng, ngay
c nhng thay i nh v tr loa nh hng n hiu sut ASR.

iu kin kim tra

Ti liu phn hi
v tr loa chuyn
trong trc x bng

v tr loa chuyn
trong trc y bng

SER(%)
Khng c

C tr quang ph

64.4

41.2

-0.50m

64.4

44.8

-0.25m

60.8

45.6

+0.25m

47.2

26.8

+0.50m

40.0

29.6

-0.50m

46.8

29.6

-0.25m

52.0

33.6

+0.25m

65.6

48.8

+0.50m

69.2

51.2

Bng 2. SER nh mt chc nng ca phng thay i p ng xung.


Mc d mt s gim hiu sut c d kin, cc eff vv thay i v tr loa so
vi mc cng nhn l vn cn ng k. Ni chung, chng ti xc minh rng ln hn
khong cch gia loa v micro, ln hn t l li. Cc kt qu ny khng nh s cn thit
cho vic s dng cc k thut dereverberation mnh m i ph vi nhng thay i
p ng xung.
Tr quang ph ci thin t l nhn dng cho tt c cc iu kin xem xt. t l li
c gim t 10 n 20 im phn trm i vi cc tiu chun front-end vi. L tng
nht, t l li nn c t hn hoc bng t l li tham chiu (xem Bng 2). Mc d iu
ny cha c xc minh, khng c bt n c quan st thy trong cc k thut tho
lun y tri ngc vi mt s phng php c trnh by trong cc ti liu m (Ong
et al., 1991).
Mc d mt s gim hiu sut c d kin, cc nh hng thay i v tr loa
so vi mc cng nhn l vn cn ng k. Ni chung, chng ti xc minh rng ln hn
khong cch gia loa v micro, ln hn t l li. Cc kt qu ny khng nh s cn thit

cho vic s dng cc k thut dereverberation mnh m i ph vi nhng thay i


p ng xung.
Tr Spectral ci thin t l nhn dng cho tt c cc iu kin xem xt. T l li
c gim t 10 n 20 im phn trm i vi cc tiu chun front-end vi. L tng
nht, t l li nn c t hn hoc bng t l li tham chiu (xem Bng 2). Mc d iu
ny cha c xc minh, khng c bt n c quan st thy trong cc k thut tho
lun y tri ngc vi mt s phng php c trnh by trong cc ti liu m (Ong
et al., 1991).
4.4.4 nh hng kt hp ca vang v ting n
Cc hiu ng kt hp ca vang v ph ting n c nh gi cn nhc vic b sung
thm ting n cc tn hiu m thanh di ca Room # 1 (Bng 1). Cc mu ting n thu
c t cc thit lp c sn trong Hansen & Arslan, (1995). Chng ti xem xt hai
loi ting n: ngi u tin c t tn ting n thnh ph ln (LCI) v ngi kia l
ting n trng Gauss (WGN), vi ba t s tn hiu trn nhiu (SNR) cp : 5, 10, v 15
dB

Sung. 7. Quy hoch t ca cn phng cho thy loa v microphone v tr . V tr loa


c dch chuyn mt bc 0.25 m.

SER (%)
iu kin kim tra

Tr quang
ph
Tin hnh
Khng tr
quang ph
ch c vang

64,4

41.2

m vang + ting n thnh ph ln ti SNR


15 dB

75,6

59,6

m vang + ting n thnh ph ln ti SNR


10 dB

85.6

75.2

m vang + ting n thnh ph ln ti SNR


5 dB

97.6

92,4

m vang + nhiu Gaussian trng ti SNR


15 dB

84,0

66,8

m vang + nhiu Gaussian trng ti SNR


10 dB

91,6

80.0

m vang + nhiu Gaussian trng ti SNR


5 dB

98.4

95.6

Bng 3. Kt hp cc hiu ng ca vang v ting n.


Bng 3 cho thy cc gi tr SER. Ct "tin hnh khng tr quang ph" trnh by cc nh
hng c hi ca vang v ting n i vi hot ng nhn dng ging ni. T l li tng
ng k nh SNR gim.
Vi php tr quang ph, cc li c gim cho tt c cc tnh hung c, mc
d n vn cn cao cho cc thit lp ting n phc tp.

5. Nhn xt kt lun
Chng ny c c trng nh hng ca ting vang v ting n trn hiu nng h
thng ASR. Chng ti cho thy tm quan trng ca vic x l vi s gim st
ci thin hiu sut ASR trong ng dng thc t. Mt tng quan v cc phng php tip

cn dereverberation v gim nhiu hin nay c gii quyt, cc phng php phn
loi theo quan im ca hot ng trong chui nhn dng ging ni. Vic s dng cc
php tr quang ph p dng cho dereverberation v gim nhiu trong cc h thng ASR
c tho lun, lm pht sinh mt cng thc ph hp iu tr vn tc ng vo
ny. Chng ti nh gi cc phng php s dng xem xt t l li cu trn mt nhim
v nhn dng chui s, cho thy rng t l nhn dng c th c ci thin ng k bng
cch s dng php tr quang ph. Cc tc ng vo s la chn ca cc tham s thut
ton c nh gi trong iu kin mi trng khc nhau thc hin. Cui cng,
iu quan trng l cp n ting vang v ting n vn trong h thng ASR tip tc
l mt ch y thch thc cho cng ng x l tn hiu.

6. Ti liu tham kho


ETSI (2002). Speech processing, Transmission and Quality aspects (STQ); Distributed
speech
recognition; Advanced front-end feature extraction algorithm; Compression
algorithms, European Telecommunications Standards Institute (ETSI) Std. ES 202
050 V.1.1.1, Oct. 2002.
Allen, J. B. & Berkley, D. A. (1979). Image method for efficiently simulating small-room
acoustics. Journal of the Acoustical Society of America, Vol. 65, No. 4, Apr. 1979,
pp. 943-950.
Bees, D.; Blostein, M. & Kabal, P. (1991). Reverberant speech enhancement using
cepstral
processing. Proceedings of the IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP91), Vol. 2, pp. 977980, Toronto, Canada, Apr.
1991.
Berouti, M.; Schwartz, R. & Makhoul, J. Enhancement of speech corrupted by acoustic
noise.
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP79), Vol. 4, pp. 208-211, Washington, USA, Apr. 1979.
Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction.
IEEE
Transactions on Acoustics, Speech, and Signal Processing, Vol. 27, No. 2, Apr. 1979,
pp. 113-120.
Chen, J.; Benesty, J.; Huang, Y. & Doclo, S. (2006). New insights into the noise reduction
Wiener filter. IEEE Transactions on Audio, Speech, and Language Processing, Vol. 14,
No. 4, July 2006, pp. 12181234.

You might also like