You are on page 1of 27

Speech Enhancement Report 1.

L thuyt tng quan S tng qut x l ting ni:

Phn tch tn hiu thnh cc frame Tn hiu b nhiu

FFT

Hm x l gim nhiu

IDFT

Overlap v

adding

Tn hiu x l

c lng nhiu

Hnh 1.1 S khi cho hai thut ton SS v WF C 2 thut ton Spectral subtraction v Wiener filter ch khc nhau khi hm x l gim nhiu, tt c cc khi cn li th ging nhau. 1.1 Thut ton Spectral Subtraction 1.1.1 Gii thiu chung Spectral subtraction da trn mt nguyn tc c bn, tha nhn s c mt ca nhiu, v c lng ph nhiu ri ly ph ca tn hiu ting ni b nhiu tr i ph ca nhiu c lng. Ph ca nhiu c th c c lng, cp nht trong nhiu chu k khi khng c mt ca tn hiu ting ni. Phng php ny ch c thc hin i vi nhiu khng i hoc c tc bin i chm, v khi ph ca nhiu s khng thay i ng k gia cc khong thi gian cp nht. 1.1.2 Spectral subtraction i vi ph bin Gi y[n] l tn hiu vo b nhiu, n l tng ca tn hiu sch s[n] v nhiu n[n]: y[n] = s[n] + n[n] Thc hin bin i Fourier ri rc c 2 v,ta c
Y ( ) = S ( ) + N ( )

(1.1)

(1.2)

Chng ta c th biu din Y( ) di dng phc nh sau:


Y ( ) =| Y ( ) | e
j y ( )

(1.3)

Khi |Y( )| l ph bin , v y ( ) l ph pha ca tn hiu b nhiu. Ph ca tn hiu nhiu N( ) c th c biu din dng:
N ( ) =| N ( ) | e jn ( )

(1.4)

Bin ph ca nhiu |N( )| khng xc nh c, nhng c th thay th bng gi tr trung bnh ca n, c tnh trong khi khng c ting ni (ting ni b dng), v pha Group 1 07DT4 Page 1

Speech Enhancement Report ca tn hiu nhiu c th thay th bng pha ca tn hiu b nhiu y ( ) . Khi chng ta c th c lng c ph ca tn hiu sch:
S ( ) = [| Y ( ) | | N ( ) |]e
j y ( )

(1.5)

y | N ( ) | l bin ph c lng ca nhiu c tnh trong khi khng c ting ni hot ng. K hiu
""

ch rng gi tr l gi tr c tnh gn ng. Tn hiu

ting ni c tng cng c th t c bng cch bin i IDFT ca S ( ) . Cn ch rng bin ph ca tn hiu c tng cng l
| S ( ) |=| Y ( ) | | N ( ) | , c th b m do s sai st trong vic c lng ph ca nhiu.

Tuy nhin, bin ca ph th khng th m, nn chng cn phi m bo rng khi thc tr hai ph th ph ca tn hiu tng cng | S ( ) | lun lun khng m. Gii php c a ra khc phc iu ny l chnh lu bn sng hiu ca ph, nu thnh phn ph no m m th chng ta s gn n bng 0:
^ Y ( ) | N ( ) | | S ( ) |= 0 ,

| Y ( ) | > | N ( ) |

(1.6)

1.1.3 Spectral subtraction i vi ph cng sut Thut ton Spectral subtraction i vi ph bin c th c m rng sang min ph cng sut. V trong mt vi trng hp, n c th lm vic tt vi ph cng sut hn l vi ph bin . Ly ph cng sut ca tn hiu b nhiu trong mt khong ngn, chng ta bnh phng |Y( )|, ta c:
Y ( ) = S ( ) + N ( ) + S ( ) . N * ( ) + S * ( ) N ( )
2 2 2

= S ( ) + N ( ) + 2. Re S ( ) N * ( )
2 2

(1.7)

| N ( ) |2, S( ). N * ( ) v S ( ).N ( ) khng th tnh c mt cch trc tip v xp x bng E{| N ( ) |2}, E{ S( ). N ( ) } v E{ S ( ).N ( ) }. Bnh thng th E{| N ( ) |2} c c lng khi khng c ting ni hot ng v c biu th l | N ( ) |2. Nu khng c mt s tng quan no gia nhiu n[n] v tn hiu sch s[n], th E{ S( ). N * ( ) } v E{ S ( ).N ( ) } xem l 0. Khi ph cng sut ca tn hiu sch c th tnh c nh sau
| S ( ) |2 =| Y ( ) |2 | D( ) |2
^ ^

(1.8) Page 2

Group 1 07DT4

Speech Enhancement Report Cng thc trn biu din thut ton tr ph cng sut. Nh cng thc trn, th ph cng sut c c lng | X ( ) |2 khng c m bo lun l mt s dng, nhng c th s dng phng php chnh lu bn sng nh trnh by trn. Tn hiu c tng cng s thu c bng cch tnh IDFT ca | X ( ) | (bng cch ly cn bc hai ca | X ( ) | 2 ), c s dng pha ca tn hiu ting ni b nhiu Cng thc hm li G ( ) c th c vit theo dng sau:
| S ( ) |2 = G 2 ( ) | Y ( ) |2
^

( 1.9) (1.10)

Khi :

| N ( ) |2 G ( ) = 1 | Y ( ) |2

Trng hp chung th thut ton Spectral subtraction c th c biu din:


| X ( ) | p =| Y ( ) | p | D ( ) | p
^ ^

(1.11)

Vi p = 1 l l phng php tr ph bin in hnh, p = 2 l phng php tr ph cng sut.

1.2 Thut ton Wiener Filtering 1.2.1 Gii thiu chung Ngun gc c bn ca thut ton WF l c lng tn hiu ting ni bng cch ti thiu ha sai s bnh phng trung bnh (Mean Square Error) gia tn hiu ting ni thc v tn hiu ting ni c c lng. 1.2.2 Nguyn l c bn ca Wiener Filtering Gi thit rng y[n] l tn hiu vo b nhiu, n l tng ca tn hiu sch s[n] v tn hiu nhiu n[n]: y[n]=s[n]+n[n] Thc hin bin i Fourier ri rc c 2 v,ta c
Y ( ) = S ( ) + N ( )

(2.1)

(2.2)

Chng ta c th biu din Y( ) di dng phc nh sau:


Y ( ) =| Y ( ) | e
j y ( )

(2.3)

Khi |Y( )| l ph bin , v y ( ) l ph pha ca tn hiu b nhiu. Ph ca tn hiu nhiu N( ) c th c biu din dng bin v pha: Group 1 07DT4 Page 3

Speech Enhancement Report


N ( ) =| N ( ) | e jn ( )

(2.4)

Bin ph ca nhiu |N( )| khng xc nh c, nhng c th thay th bng gi tr trung bnh ca n c tnh trong khi khng c ting ni(ting ni b dng), v pha ca tn hiu nhiu c th thay th bng pha ca tn hiu b nhiu y ( ). Ta c th c lng c bin ca ph tn hiu sch S ( ) t Y( ) bng mt hm phi tuyn c xc nh nh sau :
| S ( ) |= Y ( ).G ( )

(2.5)

t Priori SNR v Posteriori SNR nh sau:


SNRpri = E{ S ( ) }
2

E{ N ( ) }
2

(2.6)

SNR post =

E{ Y ( ) }
2

E{ N ( ) }
2

(2.7)

Mt kh khn trong cc thut ton nng cao cht lng ting ni l ta khng c tn hiu trc tn hiu sch s[n] nn ta khng th bit ph ca n. Do ta khng th tnh c SNR pri m trong cc h thng nng cao cht lng ging ni th SNR pri l tham s rt cn thit c lng tn hiu sch. Qua thc nghim ta xem E{SNRpost}= 1 + SNRpri T G( ) ca WF c xc nh nh sau :
G ( ) = SNR pri 1 + SNR pri

1.2.3 Overlap v Adding trong qu trnh x l tn hiu ting ni 1.2.3.1 Phn tch tn hiu theo tng frame Do tn hiu cn x l ca chng ta l tn hiu lin tc, nn khi chng ta bin i FFT trc tip tn hiu t min thi gian m khng thng qua mt qu trnh tin x l no trc th tn hiu sau khi c bin i FFT s bin i nhanh, lc chng ta khng th thc hin c cc thut ton x l trit nhiu trong tn hiu v khi tn hiu c xem l ng. Chnh v vy, tn hiu ca chng ta cn phi c phn tch thnh nhng khung tn hiu(frame) lin tc trong min thi gian trc khi chuyn sang min tn s bng bin i FFT. Khi tn hiu c phn tch thnh cc frame lin tc, th trong tng Group 1 07DT4 Page 4

Speech Enhancement Report frame, tn hiu ca chng ta s bin i chm v n c xem l tnh. Nu tn hiu c phn tch theo tng frame th khi cc thut ton x l trit nhiu trong tn hiu mi c th thc hin c mt cch hiu qu. V cch phn tch tn hiu ca chng ta l frame by frame. thc hin vic phn tch tn hiu thnh cc frame, cn s dng cc loi ca s thch hp. y, chng ta s dng ca s Hamming. 1.2.3.2 Overlap v Adding Sau khi phn tch tn hiu thnh cc frame lin tc trong min thi gian bng ca s Hamming, nu cc frame ny lin tc vi nhau v khng theo mt iu kin no c th khi thc hin bin i FFT th v tnh chng ta lm suy gim tn hiu do Hamming l ca s phi tuyn. Nn khi thc hin phn tch tn hiu thnh cc frame th yu cu t ra l cc frame phi sp xp chng ln nhau, gi l overlap. Vic xp chng cc frame vi nhau s c thc hin theo mt t l thch hp, thng thng l 40% hoc 50%. Sau khi cc frame tn hiu c x l trit nhiu trong min tn s, cc frame ny c lin kt li nhau bng phng php thch hp vi phng php phn tch tn hiu thnh cc frame u vo gi l adding. Tp hp cc mu tn hiu trong cng mt frame sau khi c phn tch u vo gi l mt segment. Vi cch thc hin phn tch v lin kt cc frame bng phng php overlap v adding th tn hiu ca chng ta thu c sau khi x l trit nhiu s khng b mo dng.

Qu trnh x l tn hiu ting niu Group 1 07DT4 Page 5

Speech Enhancement Report 1.2.4 c lng v cp nht nhiu Phng thc c lng nhiu c th nh hng ln n cht lng ca tn hiu sau khi c tng cng. Nu nhiu c c lng qu nh th nhiu s vn cn trong tn hiu v n s c nghe thy, cn nu nh nhiu c c lng qu ln th ting ni s b mo, v lm s lm tnh d nghe ca ting ni b nh hng. Cch n gin nht c lng v cp nht ph ca nhiu trong on tn hiu khng c mt ca ting ni s dng thut ton thm d hot ng ca ting ni (voice activity detection - VAD). Tuy nhin phng php ch tho mn i vi nhiu khng thay i(nhiu trng), n s khng hiu qu trong cc mi trng thc t (v d nh nh hng), nhng ni c tnh ph ca nhiu thay i lin tc. Trong mc ny chng ta s cp n thut ton c lng nhiu thay i lin tc v thc hin trong lc ting ni hot ng, thut ton ny s ph hp mi trng c nhiu thay i cao. 1.2.5 Voice activity detection Qu trnh x l phn bit khi no c ting ni hot ng, khi no khng c ting ni (im lng) c gi l s thm d hot ng ca ting ni Voice Activity Detection (VAD). Thut ton VAD c tn hiu ra dng nh phn quyt nh trn mt nn tng frame-by-frame, khi frame c th xp x 20-40 ms. Mt on ting ni c cha ting ni hot ng th VAD = 1, cn nu ting ni khng hot ng hay chnh l nhiu th VAD = 0. C mt vi thut ton VAD c a ra da trn nhiu c tnh ca tn hiu. Cc thut ton VAD c a ra sm nht th da vo cc c tnh nh mc nng lng, zero-crossing, c tnh cepstral, php o khong cch ph Itakura LPC, php o chu k. Phn ln cc thut ton VAD u phi i mt vi vn l iu kin SNR thp, c bit khi nhiu b thay i. Mt thut ton VAD c chnh xc trong mi trng thay i khng th trong cc ng dng ca Speech enhancement, nhng vic c lng nhiu mt cch chnh xc l rt cn thit ti mi thi im khi ting ni hot ng.

Group 1 07DT4

Page 6

Speech Enhancement Report 2. Cc lu thut ton: 2.1 Lu thut ton Spectral Subtraction:
Begin Gi hm Segment phn chia tn hiu u vo thnh cc Frames Bin i DFT cho cc frame

Trung bnh ha bin tn hiu


Tinh cong suat nhieu trung binh ban u (N) i=1(Bt u t frame u tin)

VAD

Frame ang xt l nhiu?

Thc hin tr ph YS(:,i)-N thnh tn hiu sch c lng D(i)

Cp nht nhiu (N) Cp nht nhiu d ti a Nn frame nhiu i=i+1;nhp frame tip theo

Dng phng php gim nhiu d hiu chnh li tn hiu(X(:,i)) Chnh lu na sng

i< s frame S

Thc hin IDFT v ni cc frame ting ni dng hm OverlapAdd2 End

Group 1 07DT4

Page 7

Speech Enhancement Report

2.2 Lu thut ton Wiener Filtering


Begin Gi hm Segment phn chia tn hiu u vo thnh cc Frames Bin i DFT cho cc frame

Tinh cong suat nhieu trung binh ban u (N)


i=1(Bt u t frame u tin)

VAD
Cp nht nhiu (N) Cp nht phng sai ca nhiu Lambda D

Frame ang xt l nhiu?

Tnh Postiriori SNR v Priori SNR Tnh Gain Function G c lng tn hiu sch i=i+1;nhp frame tip theo
i<number of frame S Thc hin IDFT v ni cc frame ting ni dng hm OverlapAdd2

End

Group 1 07DT4

Page 8

Speech Enhancement Report

2.3 Lu thut ton hm Segment: Segment t thng s mc nh: - dch = 0.4% - S mu = 256 - Dng ca s Hamming

s u vo? Chuyn ca s Hamming thnh vector ct

Tnh s mu ca tn hiu c nhiu

Tnh s mutrong khong dch SP

Tnh s Segment to ra

Chia thnh nhng Frame vi dch SP

Nhn tng Frame vi ca s hamming

End

Group 1 07DT4

Page 9

Speech Enhancement Report

2.4 Lu thut ton hm VAD

Begin

Gn gi tr khong cch ph ngng

Gn s segment lin tip ngng quy nh l nhiu

Tnh khong cch ph ca segment u vo

khong cch ph ca segment u vo< khong cch ph ngng

Segment c coi l nhiu Tng b m s Segment nhiu( Noise Counter ) Segment c coi l ting ni + nhiu B m s Segment nhiu v 0

Noise Counter > S Segment lin tip ngng quy nh l nhiu ?

Segment ang xt l ti ng ni

Segment ang xt l nhi u

End

Group 1 07DT4

Page 10

Speech Enhancement Report

2.5 Lu thut ton hm OverlapAdd2:

Begin

Xy dng cc frames tn hiu ting ni c c lng

Chng cc frames theo t l dch

End

3. Kim tra thut ton vi mu tn hiu audio cho: Thc hin kim tra thut ton trn 2 file: tn hiu sch (sp01.wav) v tn hiu b nhiu (sp01_babble_sn5.wav). Table 1: CCR (Comparison Category Rate)
-3 Much worse -2 Worse -1 Slightly worse 0 About same 1 the Slightly better 2 Better 3 Much better

Tng hp kin nh gi tn hiu sau khi x l so vi tn hiu trc khi x l ch t 2 im. C th do cp nht nhiu cha ng nn kt qu x l cha c tt. Do vy cn thay i mt vi thng s trong on code ci thin cht lng x l. 4. Cc thng s nh hng n hiu qu cht lng ca thut ton: 4.1 Cc thng s chung nh hng n c 2 thut ton: SP: l dch khi chia tn hiu ra tng frame. Nu SP khng thch hp th khi cng gp cc frames li s gy ra hin tng mo dng. IS: l khong im lng ban u. Nu IS qu ln th c th xy ra trng hp ting ni ban u cng b xem l nhiu. Nu IS qu nh th lng nhiu cp nht ban u b thiu. Group 1 07DT4 Page 11

Speech Enhancement Report NoiseMargin v Hangover: l cc thng s quyt nh 1 frame l nhiu hay ting ni. Nu mt Frame b nh gi sai th nhiu cp nht s b sai, do kt qu s b nh hng. 4.2 Cc thng s nh hng n thut ton Spectral Subtraction: Noise length: h s lm trn cho qu trnh cp nht nhiu Gamma: h s dng chn thut ton l tr ph bin hay tr ph cng sut 4.3 Cc thng s nh hng n thut ton Wiener: alpha: l h s lm trn cho SNRprior 5. Ci thin cht lng: 5.1 Thay i cc thng s chung ca c hai thut ton: 5.1.1 Thay i SP: Ln lt test SP= 0.3,0.4,0.5. Kt qu CCR nh sau: + SP= 0.3 + SP= 0.4 + SP= 0.5 Vy chn SP = 0.5 5.1.2 Thay i IS: + IS = 2.5 + IS = 2 + IS=0.15 Vy chn IS= 0.15 5.1.3 Thay i NoiseMargin: + NoiseMargin=3 + NoiseMargin=4 nh gi: 2 nh gi1 nh gi:0 nh gi:0 nh gi:1 nh gi:-1 nh gi:-1 nh gi:1

+ NoiseMargin=3.5 nh gi:1 Vy chn NoiseMargin=3 5.1.4 Thay i Hangover: + Hangover= 8 + Hangover= 4 + Hangover= 6 nh gi: 1 nh gi: 2 nh gi: 1

Vy chn Hangover=4

Group 1 07DT4

Page 12

Speech Enhancement Report 5.2 Thay i thng s Gamma cho thut ton SS: + Gamma=1 + Gamma=2 nh gi: 2 nh gi: 1

Vy chn Gamma=1 5.3 Thay i thng s alpha cho thut ton WF: + alpha= 0.99 + alpha =0.9 + alpha =0.95 nh gi: -2 nh gi: 1 nh gi: 2

Vy chn alpha=0.95

Group 1 07DT4

Page 13

Speech Enhancement Report Spectral Subtraction Matlab Code function [output,Speech]=SSBoll79(signal,fs,IS)

% OUTPUT=SSBOLL79(S,FS,IS) % Spectral Subtraction based on Boll 79. Amplitude spectral subtraction % Includes Magnitude Averaging and Residual noise Reduction % S is the noisy signal, FS is the sampling frequency and IS is the initial % silence (noise only) length in seconds (default value is .25 sec) %

if (nargin<3 | isstruct(IS)) IS=.25; end W=fix(.025*fs); nfft=W; SP=.4; %Shift percentage is 40% (10ms) %Overlap-Add method works good with this value(.4) wnd=hamming(W); % wnd=rectwin(W); %Window length is 25 ms %seconds

% IGNORE THIS SECTION FOR CAMPATIBALITY WITH ANOTHER PROGRAM FROM HERE..... if (nargin>=3 & isstruct(IS))%This option is for compatibility with another programme W=IS.windowsize SP=IS.shiftsize/W; nfft=IS.nfft; wnd=IS.window; if isfield(IS,'IS') IS=IS.IS; else IS=.25; Group 1 07DT4 Page 14

Speech Enhancement Report end end


% .......IGNORE THIS SECTION FOR CAMPATIBALITY WITH ANOTHER PROGRAM T0 HERE

NIS=fix((IS*fs-W)/(SP*W) +1);%number of initial silence segments Gamma=1;%Magnitude Power (1 for magnitude spectral subtraction 2 for power spectrum subtraction)

disp(' Segmentation'); y=segment(signal,W,SP,wnd); disp(' FFT'); Y=fft(y,nfft); YPhase=angle(Y(1:fix(end/2)+1,:)); %Noisy Speech Phase Y=abs(Y(1:fix(end/2)+1,:)).^Gamma;%Specrogram numberOfFrames=size(Y,2); FreqResol=size(Y,1); %size(Y),

disp(' Noise Initialization'); N=mean(Y(:,1:NIS)')'; %initial Noise Power Spectrum mean NRM=zeros(size(N));% Noise Residual Maximum (Initialization) NoiseCounter=0; NoiseLength=9;%This is a smoothing factor for the noise updating Beta=.03; disp(' Magnitude Averaged'); YS=Y; %Y Magnitude Averaged for i=2:(numberOfFrames-1) YS(:,i)=(Y(:,i-1)+Y(:,i)+Y(:,i+1))/3; end disp(' Spectral Subtraction'); X=zeros(FreqResol,numberOfFrames); for i=1:numberOfFrames

Group 1 07DT4

Page 15

Speech Enhancement Report [NoiseFlag, SpeechFlag, NoiseCounter,Dist] =vad(Y(:,i).^(1/Gamma),N.^(1/Gamma),NoiseCounter); %Magnitude Spectrum Distance VAD Speech(i,1)=SpeechFlag; if SpeechFlag==0 N=(NoiseLength*N+Y(:,i))/(NoiseLength+1); %Update and smooth noise NRM=max(NRM,YS(:,i)-N);%Update Maximum Noise Residue X(:,i)=Beta*Y(:,i); else D=YS(:,i)-N; % Specral Subtraction if i>1 && i<numberOfFrames %Residual Noise Reduction for j=1:length(D) if D(j)<NRM(j) D(j)=min([D(j) YS(j,i-1)-N(j) YS(j,i+1)-N(j)]); end end end X(:,i)=max(D,0); end end

disp(' Synthesis'); output=OverlapAdd2(X.^(1/Gamma),YPhase,W,SP*W);

function ReconstructedSignal=OverlapAdd2(XNEW,yphase,windowLen,ShiftLen); %Y=OverlapAdd(X,A,W,S); %Y is the signal reconstructed signal from its spectrogram. X is a matrix %with each column being the fft of a segment of signal. A is the phase %angle of the spectrum which should have the same dimension as X. if it is %not given the phase angle of X is used which in the case of real values is %zero (assuming that its the magnitude). W is the window length of time %domain segments if not given the length is assumed to be twice as long as Group 1 07DT4 Page 16

Speech Enhancement Report %fft window length. S is the shift length of the segmentation process ( for %example in the case of non overlapping signals it is equal to W and in the %case of %50 overlap is equal to W/2. if not givven W/2 is used. Y is the %reconstructed time domain signal. %Sep-04 %Esfandiar Zavarehei

if nargin<2 yphase=angle(XNEW); end if nargin<3 windowLen=size(XNEW,1)*2; end if nargin<4 ShiftLen=windowLen/2; end if fix(ShiftLen)~=ShiftLen ShiftLen=fix(ShiftLen); disp('The shift length have to be an integer as it is the number of samples.') disp(['shift length is fixed to ' num2str(ShiftLen)]) end

[FreqRes FrameNum]=size(XNEW); Spec=XNEW.*exp(j*yphase); if mod(windowLen,2) %if FreqResol is odd Spec=[Spec;flipud(conj(Spec(2:end,:)))]; else Spec=[Spec;flipud(conj(Spec(2:end-1,:)))]; end sig=zeros((FrameNum-1)*ShiftLen+windowLen,1); weight=sig; for i=1:FrameNum Group 1 07DT4 Page 17

Speech Enhancement Report start=(i-1)*ShiftLen+1; spec=Spec(:,i); sig(start:start+windowLen-1)=sig(start:start+windowLen-1) +real(ifft(spec,windowLen)); end ReconstructedSignal=sig;

Function[NoiseFlag,SpeechFlag,NoiseCounter,Dist]=vad(signal,noise,NoiseCounter,N oiseMargin,Hangover)

%[NOISEFLAG, SPEECHFLAG, NOISECOUNTER, DIST] =vad(SIGNAL,NOISE,NOISECOUNTER,NOISEMARGIN,HANGOVER)

%Spectral Distance Voice Activity Detector %SIGNAL is the the current frames magnitude spectrum which is to labeld as %noise or speech, NOISE is noise magnitude spectrum template (estimation), %NOISECOUNTER is the number of imediate previous noise frames, NOISEMARGIN %(default 3)is the spectral distance threshold. HANGOVER ( default 8 )is %the number of noise segments after which the SPEECHFLAG is reset (goes to %zero). NOISEFLAG is set to one if the the segment is labeld as noise %NOISECOUNTER returns the number of previous noise segments, this value is %reset (to zero) whenever a speech segment is detected. DIST is the %spectral distance. %Saeed Vaseghi %edited by Esfandiar Zavarehei %Sep-04

if nargin<4 NoiseMargin=3; end if nargin<5 Hangover=8; end if nargin<3 Group 1 07DT4 Page 18

Speech Enhancement Report NoiseCounter=0; end

FreqResol=length(signal); SpectralDist= 20*(log10(signal)-log10(noise)); SpectralDist(find(SpectralDist<0))=0;

Dist=mean(SpectralDist); if (Dist < NoiseMargin) NoiseFlag=1; NoiseCounter=NoiseCounter+1; else NoiseFlag=0; NoiseCounter=0; end % Detect noise only periods and attenuate the signal if (NoiseCounter > Hangover) SpeechFlag=0; else SpeechFlag=1; end

function Seg=segment(signal,W,SP,Window)

% SEGMENT chops a signal to overlapping windowed segments % A= SEGMENT(X,W,SP,WIN) returns a matrix which its columns are segmented % and windowed frames of the input one dimentional signal, X. W is the % number of samples per window, default value W=256. SP is the shift % percentage, default value SP=0.4. WIN is the window that is multiplied by % each segment and its length should be W. the default window is hamming % window. % 06-Sep-04 Group 1 07DT4 Page 19

Speech Enhancement Report % Esfandiar Zavarehei if nargin<3 SP=.4; end if nargin<2 W=256; end if nargin<4 Window=hamming(W); end Window=Window(:); %make it a column vector L=length(signal); SP=fix(W.*SP); N=fix((L-W)/SP +1); %number of segments Index=(repmat(1:W,N,1)+repmat((0:(N-1))'*SP,1,W))'; hw=repmat(Window,1,N); Seg=signal(Index).*hw;

Group 1 07DT4

Page 20

Speech Enhancement Report Wiener Filter Matlab Code function output=WienerScalart96(signal,fs,IS)

% output=WIENERSCALART96(signal,fs,IS) % Wiener filter based on tracking a priori SNR usingDecision-Directed % method, proposed by Scalart et al 96. In this method it is assumed that % SNRpost=SNRprior +1. based on this the Wiener Filter can be adapted to a % model like Ephraims model in which we have a gain function which is a % function of a priori SNR and a priori SNR is being tracked using Decision % Directed method. % Author: Esfandiar Zavarehei % Created: MAR-05

if (nargin<3 | isstruct(IS)) IS=.25; %Initial Silence or Noise Only part in seconds end W=fix(.025*fs); %Window length is 25 ms SP=.4; %Shift percentage is 40% (10ms) %Overlap-Add method works good with this value(.4) wnd=hamming(W);

%IGNORE FROM HERE ............................... if (nargin>=3 & isstruct(IS))%This option is for compatibility with another programme W=IS.windowsize SP=IS.shiftsize/W; %nfft=IS.nfft; wnd=IS.window; if isfield(IS,'IS') IS=IS.IS; else IS=.25; Group 1 07DT4 Page 21

Speech Enhancement Report end end % ......................................UP TO HERE

pre_emph=0; signal=filter([1 -pre_emph],1,signal);

NIS=fix((IS*fs-W)/(SP*W) +1);%number of initial silence segments

y=segment(signal,W,SP,wnd); % This function chops the signal into frames Y=fft(y); YPhase=angle(Y(1:fix(end/2)+1,:)); %Noisy Speech Phase Y=abs(Y(1:fix(end/2)+1,:));%Specrogram numberOfFrames=size(Y,2); FreqResol=size(Y,1);

N=mean(Y(:,1:NIS)')'; %initial Noise Power Spectrum mean LambdaD=mean((Y(:,1:NIS)').^2)';%initial Noise Power Spectrum variance alpha=.99; %used in smoothing xi (For Deciesion Directed method for estimation of A Priori SNR) NoiseCounter=0; NoiseLength=9;%This is a smoothing factor for the noise updating G=ones(size(N));%Initial Gain used in calculation of the new xi Gamma=G;

X=zeros(size(Y)); % Initialize X (memory allocation) h=waitbar(0,'Wait...'); for i=1:numberOfFrames %%%%%VAD and Noise Estimation START if i<=NIS % If initial silence ignore VAD SpeechFlag=0; NoiseCounter=100; else % Else Do VAD Group 1 07DT4 Page 22

Speech Enhancement Report [NoiseFlag, SpeechFlag, NoiseCounter, Dist]=vad(Y(:,i),N,NoiseCounter); %Magnitude Spectrum Distance VAD end

if SpeechFlag==0 % If not Speech Update Noise Parameters N=(NoiseLength*N+Y(:,i))/(NoiseLength+1); %Update and smooth noise mean LambdaD=(NoiseLength*LambdaD+(Y(:,i).^2))./(1+NoiseLength); %Update and smooth noise variance end %%%%%%%%%%%VAD and Noise Estimation END

gammaNew=(Y(:,i).^2)./LambdaD; %A postiriori SNR xi=alpha*(G.^2).*Gamma+(1-alpha).*max(gammaNew-1,0); %Decision Directed Method for A Priori SNR Gamma=gammaNew;

G=(xi./(xi+1)); X(:,i)=G.*Y(:,i); %Obtain the new Cleaned value waitbar(i/numberOfFrames,h,num2str(fix(100*i/numberOfFrames))); end

close(h); output=OverlapAdd2(X,YPhase,W,SP*W); %Overlap-add Synthesis of speech output=filter(1,[1 -pre_emph],output); %Undo the effect of Pre-emphasis output=0.999*(output/max(abs(output)));

function ReconstructedSignal=OverlapAdd2(XNEW,yphase,windowLen,ShiftLen);

%Y=OverlapAdd(X,A,W,S); %Y is the signal reconstructed signal from its spectrogram. X is a matrix %with each column being the fft of a segment of signal. A is the phase %angle of the spectrum which should have the same dimension as X. if it is Group 1 07DT4 Page 23

Speech Enhancement Report %not given the phase angle of X is used which in the case of real values is %zero (assuming that its the magnitude). W is the window length of time %domain segments if not given the length is assumed to be twice as long as %fft window length. S is the shift length of the segmentation process ( for %example in the case of non overlapping signals it is equal to W and in the %case of %50 overlap is equal to W/2. if not givven W/2 is used. Y is the %reconstructed time domain signal. %Sep-04 %Esfandiar Zavarehei

if nargin<2 yphase=angle(XNEW); end if nargin<3 windowLen=size(XNEW,1)*2; end if nargin<4 ShiftLen=windowLen/2; end if fix(ShiftLen)~=ShiftLen ShiftLen=fix(ShiftLen); disp('The shift length have to be an integer as it is the number of samples.') disp(['shift length is fixed to ' num2str(ShiftLen)]) end

[FreqRes FrameNum]=size(XNEW);

Spec=XNEW.*exp(j*yphase);

if mod(windowLen,2) %if FreqResol is odd Spec=[Spec;flipud(conj(Spec(2:end,:)))]; else Group 1 07DT4 Page 24

Speech Enhancement Report Spec=[Spec;flipud(conj(Spec(2:end-1,:)))]; end sig=zeros((FrameNum-1)*ShiftLen+windowLen,1); weight=sig; for i=1:FrameNum start=(i-1)*ShiftLen+1; spec=Spec(:,i); sig(start:start+windowLen-1)=sig(start:start+windowLen-1) +real(ifft(spec,windowLen)); end ReconstructedSignal=sig;

function Seg=segment(signal,W,SP,Window)

% SEGMENT chops a signal to overlapping windowed segments % A= SEGMENT(X,W,SP,WIN) returns a matrix which its columns are segmented % and windowed frames of the input one dimentional signal, X. W is the % number of samples per window, default value W=256. SP is the shift % percentage, default value SP=0.4. WIN is the window that is multiplied by % each segment and its length should be W. the default window is hamming % window. % 06-Sep-04 % Esfandiar Zavarehei

if nargin<3 SP=.4; end if nargin<2 W=256; end if nargin<4 Window=hamming(W); Group 1 07DT4 Page 25

Speech Enhancement Report end Window=Window(:); %make it a column vector L=length(signal); SP=fix(W.*SP); N=fix((L-W)/SP +1); %number of segments Index=(repmat(1:W,N,1)+repmat((0:(N-1))'*SP,1,W))'; hw=repmat(Window,1,N); Seg=signal(Index).*hw;

function [NoiseFlag, SpeechFlag, NoiseCounter, Dist]=vad(signal,noise,NoiseCounter,NoiseMargin,Hangover)

%[NOISEFLAG, SPEECHFLAG, NOISECOUNTER, DIST] =vad(SIGNAL,NOISE,NOISECOUNTER,NOISEMARGIN,HANGOVER) %Spectral Distance Voice Activity Detector %SIGNAL is the the current frames magnitude spectrum which is to labeld as %noise or speech, NOISE is noise magnitude spectrum template (estimation), %NOISECOUNTER is the number of imediate previous noise frames, NOISEMARGIN %(default 3)is the spectral distance threshold. HANGOVER ( default 8 )is %the number of noise segments after which the SPEECHFLAG is reset (goes to %zero). NOISEFLAG is set to one if the the segment is labeld as noise %NOISECOUNTER returns the number of previous noise segments, this value is %reset (to zero) whenever a speech segment is detected. DIST is the %spectral distance. %Saeed Vaseghi %edited by Esfandiar Zavarehei %Sep-04

if nargin<4 NoiseMargin=3; end Group 1 07DT4 Page 26

Speech Enhancement Report if nargin<5 Hangover=8; end if nargin<3 NoiseCounter=0; end

FreqResol=length(signal);

SpectralDist= 20*(log10(signal)-log10(noise)); SpectralDist(find(SpectralDist<0))=0; Dist=mean(SpectralDist); if (Dist < NoiseMargin) NoiseFlag=1; NoiseCounter=NoiseCounter+1; else NoiseFlag=0; NoiseCounter=0; end

% Detect noise only periods and attenuate the signal if (NoiseCounter > Hangover) SpeechFlag=0; else SpeechFlag=1; end

Group 1 07DT4

Page 27

You might also like