Final Project

Speech Enhancement Report 1.
L thuyt tng quan S tng qut x l ting ni:
Phn tch tn hiu thnh cc frame Tn hiu b nhiu
FFT
Hm x l gim nhiu
IDFT
Overlap v
adding
Tn hiu x l
c lng nhiu
Hnh 1.1 S khi cho hai thut ton SS v WF C 2 thut ton Spectral subtraction v Wiener filter ch khc nhau khi hm x l gim nhiu, tt c cc khi cn li th ging nhau. 1.1 Thut ton Spectral Subtraction 1.1.1 Gii thiu chung Spectral subtraction da trn mt nguyn tc c bn, tha nhn s c mt ca nhiu, v c lng ph nhiu ri ly ph ca tn hiu ting ni b nhiu tr i ph ca nhiu c lng. Ph ca nhiu c th c c lng, cp nht trong nhiu chu k khi khng c mt ca tn hiu ting ni. Phng php ny ch c thc hin i vi nhiu khng i hoc c tc bin i chm, v khi ph ca nhiu s khng thay i ng k gia cc khong thi gian cp nht. 1.1.2 Spectral subtraction i vi ph bin Gi y[n] l tn hiu vo b nhiu, n l tng ca tn hiu sch s[n] v nhiu n[n]: y[n] = s[n] + n[n] Thc hin bin i Fourier ri rc c 2 v,ta c
Y ( ) = S ( ) + N ( )
(1.1)
(1.2)
Chng ta c th biu din Y( ) di dng phc nh sau:

Y ( ) =| Y ( ) | e
j y ( )
(1.3)
Khi |Y( )| l ph bin , v y ( ) l ph pha ca tn hiu b nhiu. Ph ca tn hiu nhiu N( ) c th c biu din dng:
N ( ) =| N ( ) | e jn ( )
(1.4)
Bin ph ca nhiu |N( )| khng xc nh c, nhng c th thay th bng gi tr trung bnh ca n, c tnh trong khi khng c ting ni (ting ni b dng), v pha Group 1 07DT4 Page 1
Speech Enhancement Report ca tn hiu nhiu c th thay th bng pha ca tn hiu b nhiu y ( ) . Khi chng ta c th c lng c ph ca tn hiu sch:
S ( ) = [| Y ( ) | | N ( ) |]e
j y ( )
(1.5)
y | N ( ) | l bin ph c lng ca nhiu c tnh trong khi khng c ting ni hot ng. K hiu
""
ch rng gi tr l gi tr c tnh gn ng. Tn hiu
ting ni c tng cng c th t c bng cch bin i IDFT ca S ( ) . Cn ch rng bin ph ca tn hiu c tng cng l
| S ( ) |=| Y ( ) | | N ( ) | , c th b m do s sai st trong vic c lng ph ca nhiu.

Tuy nhin, bin ca ph th khng th m, nn chng cn phi m bo rng khi thc tr hai ph th ph ca tn hiu tng cng | S ( ) | lun lun khng m. Gii php c a ra khc phc iu ny l chnh lu bn sng hiu ca ph, nu thnh phn ph no m m th chng ta s gn n bng 0:
^ Y ( ) | N ( ) | | S ( ) |= 0 ,
| Y ( ) | > | N ( ) |
(1.6)
1.1.3 Spectral subtraction i vi ph cng sut Thut ton Spectral subtraction i vi ph bin c th c m rng sang min ph cng sut. V trong mt vi trng hp, n c th lm vic tt vi ph cng sut hn l vi ph bin . Ly ph cng sut ca tn hiu b nhiu trong mt khong ngn, chng ta bnh phng |Y( )|, ta c:
Y ( ) = S ( ) + N ( ) + S ( ) . N * ( ) + S * ( ) N ( )
2 2 2
= S ( ) + N ( ) + 2. Re S ( ) N * ( )
2 2
(1.7)
| N ( ) |2, S( ). N * ( ) v S ( ).N ( ) khng th tnh c mt cch trc tip v xp x bng E{| N ( ) |2}, E{ S( ). N ( ) } v E{ S ( ).N ( ) }. Bnh thng th E{| N ( ) |2} c c lng khi khng c ting ni hot ng v c biu th l | N ( ) |2. Nu khng c mt s tng quan no gia nhiu n[n] v tn hiu sch s[n], th E{ S( ). N * ( ) } v E{ S ( ).N ( ) } xem l 0. Khi ph cng sut ca tn hiu sch c th tnh c nh sau
| S ( ) |2 =| Y ( ) |2 | D( ) |2
^ ^
(1.8) Page 2
Group 1 07DT4
Speech Enhancement Report Cng thc trn biu din thut ton tr ph cng sut. Nh cng thc trn, th ph cng sut c c lng | X ( ) |2 khng c m bo lun l mt s dng, nhng c th s dng phng php chnh lu bn sng nh trnh by trn. Tn hiu c tng cng s thu c bng cch tnh IDFT ca | X ( ) | (bng cch ly cn bc hai ca | X ( ) | 2 ), c s dng pha ca tn hiu ting ni b nhiu Cng thc hm li G ( ) c th c vit theo dng sau:
| S ( ) |2 = G 2 ( ) | Y ( ) |2
^
( 1.9) (1.10)
Khi :
| N ( ) |2 G ( ) = 1 | Y ( ) |2
Trng hp chung th thut ton Spectral subtraction c th c biu din:

| X ( ) | p =| Y ( ) | p | D ( ) | p
^ ^
(1.11)
Vi p = 1 l l phng php tr ph bin in hnh, p = 2 l phng php tr ph cng sut.
1.2 Thut ton Wiener Filtering 1.2.1 Gii thiu chung Ngun gc c bn ca thut ton WF l c lng tn hiu ting ni bng cch ti thiu ha sai s bnh phng trung bnh (Mean Square Error) gia tn hiu ting ni thc v tn hiu ting ni c c lng. 1.2.2 Nguyn l c bn ca Wiener Filtering Gi thit rng y[n] l tn hiu vo b nhiu, n l tng ca tn hiu sch s[n] v tn hiu nhiu n[n]: y[n]=s[n]+n[n] Thc hin bin i Fourier ri rc c 2 v,ta c
Y ( ) = S ( ) + N ( )
(2.1)
(2.2)
Chng ta c th biu din Y( ) di dng phc nh sau:

Y ( ) =| Y ( ) | e
j y ( )
(2.3)
Khi |Y( )| l ph bin , v y ( ) l ph pha ca tn hiu b nhiu. Ph ca tn hiu nhiu N( ) c th c biu din dng bin v pha: Group 1 07DT4 Page 3
Speech Enhancement Report

N ( ) =| N ( ) | e jn ( )
(2.4)
Bin ph ca nhiu |N( )| khng xc nh c, nhng c th thay th bng gi tr trung bnh ca n c tnh trong khi khng c ting ni(ting ni b dng), v pha ca tn hiu nhiu c th thay th bng pha ca tn hiu b nhiu y ( ). Ta c th c lng c bin ca ph tn hiu sch S ( ) t Y( ) bng mt hm phi tuyn c xc nh nh sau :
| S ( ) |= Y ( ).G ( )

(2.5)
t Priori SNR v Posteriori SNR nh sau:

SNRpri = E{ S ( ) }
2
E{ N ( ) }
2
(2.6)
SNR post =
E{ Y ( ) }
2
E{ N ( ) }
2
(2.7)
Mt kh khn trong cc thut ton nng cao cht lng ting ni l ta khng c tn hiu trc tn hiu sch s[n] nn ta khng th bit ph ca n. Do ta khng th tnh c SNR pri m trong cc h thng nng cao cht lng ging ni th SNR pri l tham s rt cn thit c lng tn hiu sch. Qua thc nghim ta xem E{SNRpost}= 1 + SNRpri T G( ) ca WF c xc nh nh sau :
G ( ) = SNR pri 1 + SNR pri
1.2.3 Overlap v Adding trong qu trnh x l tn hiu ting ni 1.2.3.1 Phn tch tn hiu theo tng frame Do tn hiu cn x l ca chng ta l tn hiu lin tc, nn khi chng ta bin i FFT trc tip tn hiu t min thi gian m khng thng qua mt qu trnh tin x l no trc th tn hiu sau khi c bin i FFT s bin i nhanh, lc chng ta khng th thc hin c cc thut ton x l trit nhiu trong tn hiu v khi tn hiu c xem l ng. Chnh v vy, tn hiu ca chng ta cn phi c phn tch thnh nhng khung tn hiu(frame) lin tc trong min thi gian trc khi chuyn sang min tn s bng bin i FFT. Khi tn hiu c phn tch thnh cc frame lin tc, th trong tng Group 1 07DT4 Page 4
Speech Enhancement Report frame, tn hiu ca chng ta s bin i chm v n c xem l tnh. Nu tn hiu c phn tch theo tng frame th khi cc thut ton x l trit nhiu trong tn hiu mi c th thc hin c mt cch hiu qu. V cch phn tch tn hiu ca chng ta l frame by frame. thc hin vic phn tch tn hiu thnh cc frame, cn s dng cc loi ca s thch hp. y, chng ta s dng ca s Hamming. 1.2.3.2 Overlap v Adding Sau khi phn tch tn hiu thnh cc frame lin tc trong min thi gian bng ca s Hamming, nu cc frame ny lin tc vi nhau v khng theo mt iu kin no c th khi thc hin bin i FFT th v tnh chng ta lm suy gim tn hiu do Hamming l ca s phi tuyn. Nn khi thc hin phn tch tn hiu thnh cc frame th yu cu t ra l cc frame phi sp xp chng ln nhau, gi l overlap. Vic xp chng cc frame vi nhau s c thc hin theo mt t l thch hp, thng thng l 40% hoc 50%. Sau khi cc frame tn hiu c x l trit nhiu trong min tn s, cc frame ny c lin kt li nhau bng phng php thch hp vi phng php phn tch tn hiu thnh cc frame u vo gi l adding. Tp hp cc mu tn hiu trong cng mt frame sau khi c phn tch u vo gi l mt segment. Vi cch thc hin phn tch v lin kt cc frame bng phng php overlap v adding th tn hiu ca chng ta thu c sau khi x l trit nhiu s khng b mo dng.
Qu trnh x l tn hiu ting niu Group 1 07DT4 Page 5
Speech Enhancement Report 1.2.4 c lng v cp nht nhiu Phng thc c lng nhiu c th nh hng ln n cht lng ca tn hiu sau khi c tng cng. Nu nhiu c c lng qu nh th nhiu s vn cn trong tn hiu v n s c nghe thy, cn nu nh nhiu c c lng qu ln th ting ni s b mo, v lm s lm tnh d nghe ca ting ni b nh hng. Cch n gin nht c lng v cp nht ph ca nhiu trong on tn hiu khng c mt ca ting ni s dng thut ton thm d hot ng ca ting ni (voice activity detection - VAD). Tuy nhin phng php ch tho mn i vi nhiu khng thay i(nhiu trng), n s khng hiu qu trong cc mi trng thc t (v d nh nh hng), nhng ni c tnh ph ca nhiu thay i lin tc. Trong mc ny chng ta s cp n thut ton c lng nhiu thay i lin tc v thc hin trong lc ting ni hot ng, thut ton ny s ph hp mi trng c nhiu thay i cao. 1.2.5 Voice activity detection Qu trnh x l phn bit khi no c ting ni hot ng, khi no khng c ting ni (im lng) c gi l s thm d hot ng ca ting ni Voice Activity Detection (VAD). Thut ton VAD c tn hiu ra dng nh phn quyt nh trn mt nn tng frame-by-frame, khi frame c th xp x 20-40 ms. Mt on ting ni c cha ting ni hot ng th VAD = 1, cn nu ting ni khng hot ng hay chnh l nhiu th VAD = 0. C mt vi thut ton VAD c a ra da trn nhiu c tnh ca tn hiu. Cc thut ton VAD c a ra sm nht th da vo cc c tnh nh mc nng lng, zero-crossing, c tnh cepstral, php o khong cch ph Itakura LPC, php o chu k. Phn ln cc thut ton VAD u phi i mt vi vn l iu kin SNR thp, c bit khi nhiu b thay i. Mt thut ton VAD c chnh xc trong mi trng thay i khng th trong cc ng dng ca Speech enhancement, nhng vic c lng nhiu mt cch chnh xc l rt cn thit ti mi thi im khi ting ni hot ng.
Group 1 07DT4
Page 6
Speech Enhancement Report 2. Cc lu thut ton: 2.1 Lu thut ton Spectral Subtraction:
Begin Gi hm Segment phn chia tn hiu u vo thnh cc Frames Bin i DFT cho cc frame
Trung bnh ha bin tn hiu

Tinh cong suat nhieu trung binh ban u (N) i=1(Bt u t frame u tin)
VAD
Frame ang xt l nhiu?
Thc hin tr ph YS(:,i)-N thnh tn hiu sch c lng D(i)
Cp nht nhiu (N) Cp nht nhiu d ti a Nn frame nhiu i=i+1;nhp frame tip theo
Dng phng php gim nhiu d hiu chnh li tn hiu(X(:,i)) Chnh lu na sng
i< s frame S
Thc hin IDFT v ni cc frame ting ni dng hm OverlapAdd2 End
Group 1 07DT4
Page 7
2.2 Lu thut ton Wiener Filtering

Begin Gi hm Segment phn chia tn hiu u vo thnh cc Frames Bin i DFT cho cc frame
Tinh cong suat nhieu trung binh ban u (N)

i=1(Bt u t frame u tin)
VAD
Cp nht nhiu (N) Cp nht phng sai ca nhiu Lambda D
Frame ang xt l nhiu?
Tnh Postiriori SNR v Priori SNR Tnh Gain Function G c lng tn hiu sch i=i+1;nhp frame tip theo
i<number of frame S Thc hin IDFT v ni cc frame ting ni dng hm OverlapAdd2
End
Group 1 07DT4
Page 8
2.3 Lu thut ton hm Segment: Segment t thng s mc nh: - dch = 0.4% - S mu = 256 - Dng ca s Hamming
s u vo? Chuyn ca s Hamming thnh vector ct
Tnh s mu ca tn hiu c nhiu
Tnh s mutrong khong dch SP
Tnh s Segment to ra
Chia thnh nhng Frame vi dch SP
Nhn tng Frame vi ca s hamming
End
Group 1 07DT4
Page 9
2.4 Lu thut ton hm VAD
Begin
Gn gi tr khong cch ph ngng
Gn s segment lin tip ngng quy nh l nhiu
Tnh khong cch ph ca segment u vo
khong cch ph ca segment u vo< khong cch ph ngng
Segment c coi l nhiu Tng b m s Segment nhiu( Noise Counter ) Segment c coi l ting ni + nhiu B m s Segment nhiu v 0
Noise Counter > S Segment lin tip ngng quy nh l nhiu ?
Segment ang xt l ti ng ni
Segment ang xt l nhi u
End
Group 1 07DT4
Page 10
2.5 Lu thut ton hm OverlapAdd2:
Begin
Xy dng cc frames tn hiu ting ni c c lng
Chng cc frames theo t l dch
End
3. Kim tra thut ton vi mu tn hiu audio cho: Thc hin kim tra thut ton trn 2 file: tn hiu sch (sp01.wav) v tn hiu b nhiu (sp01_babble_sn5.wav). Table 1: CCR (Comparison Category Rate)
-3 Much worse -2 Worse -1 Slightly worse 0 About same 1 the Slightly better 2 Better 3 Much better
Tng hp kin nh gi tn hiu sau khi x l so vi tn hiu trc khi x l ch t 2 im. C th do cp nht nhiu cha ng nn kt qu x l cha c tt. Do vy cn thay i mt vi thng s trong on code ci thin cht lng x l. 4. Cc thng s nh hng n hiu qu cht lng ca thut ton: 4.1 Cc thng s chung nh hng n c 2 thut ton: SP: l dch khi chia tn hiu ra tng frame. Nu SP khng thch hp th khi cng gp cc frames li s gy ra hin tng mo dng. IS: l khong im lng ban u. Nu IS qu ln th c th xy ra trng hp ting ni ban u cng b xem l nhiu. Nu IS qu nh th lng nhiu cp nht ban u b thiu. Group 1 07DT4 Page 11
Speech Enhancement Report NoiseMargin v Hangover: l cc thng s quyt nh 1 frame l nhiu hay ting ni. Nu mt Frame b nh gi sai th nhiu cp nht s b sai, do kt qu s b nh hng. 4.2 Cc thng s nh hng n thut ton Spectral Subtraction: Noise length: h s lm trn cho qu trnh cp nht nhiu Gamma: h s dng chn thut ton l tr ph bin hay tr ph cng sut 4.3 Cc thng s nh hng n thut ton Wiener: alpha: l h s lm trn cho SNRprior 5. Ci thin cht lng: 5.1 Thay i cc thng s chung ca c hai thut ton: 5.1.1 Thay i SP: Ln lt test SP= 0.3,0.4,0.5. Kt qu CCR nh sau: + SP= 0.3 + SP= 0.4 + SP= 0.5 Vy chn SP = 0.5 5.1.2 Thay i IS: + IS = 2.5 + IS = 2 + IS=0.15 Vy chn IS= 0.15 5.1.3 Thay i NoiseMargin: + NoiseMargin=3 + NoiseMargin=4 nh gi: 2 nh gi1 nh gi:0 nh gi:0 nh gi:1 nh gi:-1 nh gi:-1 nh gi:1
+ NoiseMargin=3.5 nh gi:1 Vy chn NoiseMargin=3 5.1.4 Thay i Hangover: + Hangover= 8 + Hangover= 4 + Hangover= 6 nh gi: 1 nh gi: 2 nh gi: 1
Vy chn Hangover=4
Group 1 07DT4
Page 12
Speech Enhancement Report 5.2 Thay i thng s Gamma cho thut ton SS: + Gamma=1 + Gamma=2 nh gi: 2 nh gi: 1
Vy chn Gamma=1 5.3 Thay i thng s alpha cho thut ton WF: + alpha= 0.99 + alpha =0.9 + alpha =0.95 nh gi: -2 nh gi: 1 nh gi: 2
Vy chn alpha=0.95
Group 1 07DT4
Page 13
Speech Enhancement Report Spectral Subtraction Matlab Code function [output,Speech]=SSBoll79(signal,fs,IS)
% OUTPUT=SSBOLL79(S,FS,IS) % Spectral Subtraction based on Boll 79. Amplitude spectral subtraction % Includes Magnitude Averaging and Residual noise Reduction % S is the noisy signal, FS is the sampling frequency and IS is the initial % silence (noise only) length in seconds (default value is .25 sec) %
if (nargin<3 | isstruct(IS)) IS=.25; end W=fix(.025*fs); nfft=W; SP=.4; %Shift percentage is 40% (10ms) %Overlap-Add method works good with this value(.4) wnd=hamming(W); % wnd=rectwin(W); %Window length is 25 ms %seconds
% IGNORE THIS SECTION FOR CAMPATIBALITY WITH ANOTHER PROGRAM FROM HERE..... if (nargin>=3 & isstruct(IS))%This option is for compatibility with another programme W=IS.windowsize SP=IS.shiftsize/W; nfft=IS.nfft; wnd=IS.window; if isfield(IS,'IS') IS=IS.IS; else IS=.25; Group 1 07DT4 Page 14
Speech Enhancement Report end end

% .......IGNORE THIS SECTION FOR CAMPATIBALITY WITH ANOTHER PROGRAM T0 HERE
NIS=fix((IS*fs-W)/(SP*W) +1);%number of initial silence segments Gamma=1;%Magnitude Power (1 for magnitude spectral subtraction 2 for power spectrum subtraction)
disp(' Segmentation'); y=segment(signal,W,SP,wnd); disp(' FFT'); Y=fft(y,nfft); YPhase=angle(Y(1:fix(end/2)+1,:)); %Noisy Speech Phase Y=abs(Y(1:fix(end/2)+1,:)).^Gamma;%Specrogram numberOfFrames=size(Y,2); FreqResol=size(Y,1); %size(Y),
disp(' Noise Initialization'); N=mean(Y(:,1:NIS)')'; %initial Noise Power Spectrum mean NRM=zeros(size(N));% Noise Residual Maximum (Initialization) NoiseCounter=0; NoiseLength=9;%This is a smoothing factor for the noise updating Beta=.03; disp(' Magnitude Averaged'); YS=Y; %Y Magnitude Averaged for i=2:(numberOfFrames-1) YS(:,i)=(Y(:,i-1)+Y(:,i)+Y(:,i+1))/3; end disp(' Spectral Subtraction'); X=zeros(FreqResol,numberOfFrames); for i=1:numberOfFrames
Group 1 07DT4
Page 15
Speech Enhancement Report [NoiseFlag, SpeechFlag, NoiseCounter,Dist] =vad(Y(:,i).^(1/Gamma),N.^(1/Gamma),NoiseCounter); %Magnitude Spectrum Distance VAD Speech(i,1)=SpeechFlag; if SpeechFlag==0 N=(NoiseLength*N+Y(:,i))/(NoiseLength+1); %Update and smooth noise NRM=max(NRM,YS(:,i)-N);%Update Maximum Noise Residue X(:,i)=Beta*Y(:,i); else D=YS(:,i)-N; % Specral Subtraction if i>1 && i<numberOfFrames %Residual Noise Reduction for j=1:length(D) if D(j)<NRM(j) D(j)=min([D(j) YS(j,i-1)-N(j) YS(j,i+1)-N(j)]); end end end X(:,i)=max(D,0); end end
disp(' Synthesis'); output=OverlapAdd2(X.^(1/Gamma),YPhase,W,SP*W);
function ReconstructedSignal=OverlapAdd2(XNEW,yphase,windowLen,ShiftLen); %Y=OverlapAdd(X,A,W,S); %Y is the signal reconstructed signal from its spectrogram. X is a matrix %with each column being the fft of a segment of signal. A is the phase %angle of the spectrum which should have the same dimension as X. if it is %not given the phase angle of X is used which in the case of real values is %zero (assuming that its the magnitude). W is the window length of time %domain segments if not given the length is assumed to be twice as long as Group 1 07DT4 Page 16
Speech Enhancement Report %fft window length. S is the shift length of the segmentation process ( for %example in the case of non overlapping signals it is equal to W and in the %case of %50 overlap is equal to W/2. if not givven W/2 is used. Y is the %reconstructed time domain signal. %Sep-04 %Esfandiar Zavarehei
if nargin<2 yphase=angle(XNEW); end if nargin<3 windowLen=size(XNEW,1)*2; end if nargin<4 ShiftLen=windowLen/2; end if fix(ShiftLen)~=ShiftLen ShiftLen=fix(ShiftLen); disp('The shift length have to be an integer as it is the number of samples.') disp(['shift length is fixed to ' num2str(ShiftLen)]) end
[FreqRes FrameNum]=size(XNEW); Spec=XNEW.*exp(j*yphase); if mod(windowLen,2) %if FreqResol is odd Spec=[Spec;flipud(conj(Spec(2:end,:)))]; else Spec=[Spec;flipud(conj(Spec(2:end-1,:)))]; end sig=zeros((FrameNum-1)*ShiftLen+windowLen,1); weight=sig; for i=1:FrameNum Group 1 07DT4 Page 17
Speech Enhancement Report start=(i-1)*ShiftLen+1; spec=Spec(:,i); sig(start:start+windowLen-1)=sig(start:start+windowLen-1) +real(ifft(spec,windowLen)); end ReconstructedSignal=sig;
Function[NoiseFlag,SpeechFlag,NoiseCounter,Dist]=vad(signal,noise,NoiseCounter,N oiseMargin,Hangover)
%[NOISEFLAG, SPEECHFLAG, NOISECOUNTER, DIST] =vad(SIGNAL,NOISE,NOISECOUNTER,NOISEMARGIN,HANGOVER)
%Spectral Distance Voice Activity Detector %SIGNAL is the the current frames magnitude spectrum which is to labeld as %noise or speech, NOISE is noise magnitude spectrum template (estimation), %NOISECOUNTER is the number of imediate previous noise frames, NOISEMARGIN %(default 3)is the spectral distance threshold. HANGOVER ( default 8 )is %the number of noise segments after which the SPEECHFLAG is reset (goes to %zero). NOISEFLAG is set to one if the the segment is labeld as noise %NOISECOUNTER returns the number of previous noise segments, this value is %reset (to zero) whenever a speech segment is detected. DIST is the %spectral distance. %Saeed Vaseghi %edited by Esfandiar Zavarehei %Sep-04
if nargin<4 NoiseMargin=3; end if nargin<5 Hangover=8; end if nargin<3 Group 1 07DT4 Page 18
Speech Enhancement Report NoiseCounter=0; end
FreqResol=length(signal); SpectralDist= 20*(log10(signal)-log10(noise)); SpectralDist(find(SpectralDist<0))=0;
Dist=mean(SpectralDist); if (Dist < NoiseMargin) NoiseFlag=1; NoiseCounter=NoiseCounter+1; else NoiseFlag=0; NoiseCounter=0; end % Detect noise only periods and attenuate the signal if (NoiseCounter > Hangover) SpeechFlag=0; else SpeechFlag=1; end
function Seg=segment(signal,W,SP,Window)
% SEGMENT chops a signal to overlapping windowed segments % A= SEGMENT(X,W,SP,WIN) returns a matrix which its columns are segmented % and windowed frames of the input one dimentional signal, X. W is the % number of samples per window, default value W=256. SP is the shift % percentage, default value SP=0.4. WIN is the window that is multiplied by % each segment and its length should be W. the default window is hamming % window. % 06-Sep-04 Group 1 07DT4 Page 19
Speech Enhancement Report % Esfandiar Zavarehei if nargin<3 SP=.4; end if nargin<2 W=256; end if nargin<4 Window=hamming(W); end Window=Window(:); %make it a column vector L=length(signal); SP=fix(W.*SP); N=fix((L-W)/SP +1); %number of segments Index=(repmat(1:W,N,1)+repmat((0:(N-1))'*SP,1,W))'; hw=repmat(Window,1,N); Seg=signal(Index).*hw;
Group 1 07DT4
Page 20
Speech Enhancement Report Wiener Filter Matlab Code function output=WienerScalart96(signal,fs,IS)
% output=WIENERSCALART96(signal,fs,IS) % Wiener filter based on tracking a priori SNR usingDecision-Directed % method, proposed by Scalart et al 96. In this method it is assumed that % SNRpost=SNRprior +1. based on this the Wiener Filter can be adapted to a % model like Ephraims model in which we have a gain function which is a % function of a priori SNR and a priori SNR is being tracked using Decision % Directed method. % Author: Esfandiar Zavarehei % Created: MAR-05
if (nargin<3 | isstruct(IS)) IS=.25; %Initial Silence or Noise Only part in seconds end W=fix(.025*fs); %Window length is 25 ms SP=.4; %Shift percentage is 40% (10ms) %Overlap-Add method works good with this value(.4) wnd=hamming(W);
%IGNORE FROM HERE ............................... if (nargin>=3 & isstruct(IS))%This option is for compatibility with another programme W=IS.windowsize SP=IS.shiftsize/W; %nfft=IS.nfft; wnd=IS.window; if isfield(IS,'IS') IS=IS.IS; else IS=.25; Group 1 07DT4 Page 21
Speech Enhancement Report end end % ......................................UP TO HERE
pre_emph=0; signal=filter([1 -pre_emph],1,signal);
NIS=fix((IS*fs-W)/(SP*W) +1);%number of initial silence segments
y=segment(signal,W,SP,wnd); % This function chops the signal into frames Y=fft(y); YPhase=angle(Y(1:fix(end/2)+1,:)); %Noisy Speech Phase Y=abs(Y(1:fix(end/2)+1,:));%Specrogram numberOfFrames=size(Y,2); FreqResol=size(Y,1);
N=mean(Y(:,1:NIS)')'; %initial Noise Power Spectrum mean LambdaD=mean((Y(:,1:NIS)').^2)';%initial Noise Power Spectrum variance alpha=.99; %used in smoothing xi (For Deciesion Directed method for estimation of A Priori SNR) NoiseCounter=0; NoiseLength=9;%This is a smoothing factor for the noise updating G=ones(size(N));%Initial Gain used in calculation of the new xi Gamma=G;
X=zeros(size(Y)); % Initialize X (memory allocation) h=waitbar(0,'Wait...'); for i=1:numberOfFrames %%%%%VAD and Noise Estimation START if i<=NIS % If initial silence ignore VAD SpeechFlag=0; NoiseCounter=100; else % Else Do VAD Group 1 07DT4 Page 22
Speech Enhancement Report [NoiseFlag, SpeechFlag, NoiseCounter, Dist]=vad(Y(:,i),N,NoiseCounter); %Magnitude Spectrum Distance VAD end
if SpeechFlag==0 % If not Speech Update Noise Parameters N=(NoiseLength*N+Y(:,i))/(NoiseLength+1); %Update and smooth noise mean LambdaD=(NoiseLength*LambdaD+(Y(:,i).^2))./(1+NoiseLength); %Update and smooth noise variance end %%%%%%%%%%%VAD and Noise Estimation END
gammaNew=(Y(:,i).^2)./LambdaD; %A postiriori SNR xi=alpha*(G.^2).*Gamma+(1-alpha).*max(gammaNew-1,0); %Decision Directed Method for A Priori SNR Gamma=gammaNew;
G=(xi./(xi+1)); X(:,i)=G.*Y(:,i); %Obtain the new Cleaned value waitbar(i/numberOfFrames,h,num2str(fix(100*i/numberOfFrames))); end
close(h); output=OverlapAdd2(X,YPhase,W,SP*W); %Overlap-add Synthesis of speech output=filter(1,[1 -pre_emph],output); %Undo the effect of Pre-emphasis output=0.999*(output/max(abs(output)));
function ReconstructedSignal=OverlapAdd2(XNEW,yphase,windowLen,ShiftLen);
%Y=OverlapAdd(X,A,W,S); %Y is the signal reconstructed signal from its spectrogram. X is a matrix %with each column being the fft of a segment of signal. A is the phase %angle of the spectrum which should have the same dimension as X. if it is Group 1 07DT4 Page 23
Speech Enhancement Report %not given the phase angle of X is used which in the case of real values is %zero (assuming that its the magnitude). W is the window length of time %domain segments if not given the length is assumed to be twice as long as %fft window length. S is the shift length of the segmentation process ( for %example in the case of non overlapping signals it is equal to W and in the %case of %50 overlap is equal to W/2. if not givven W/2 is used. Y is the %reconstructed time domain signal. %Sep-04 %Esfandiar Zavarehei
if nargin<2 yphase=angle(XNEW); end if nargin<3 windowLen=size(XNEW,1)*2; end if nargin<4 ShiftLen=windowLen/2; end if fix(ShiftLen)~=ShiftLen ShiftLen=fix(ShiftLen); disp('The shift length have to be an integer as it is the number of samples.') disp(['shift length is fixed to ' num2str(ShiftLen)]) end
[FreqRes FrameNum]=size(XNEW);
Spec=XNEW.*exp(j*yphase);
if mod(windowLen,2) %if FreqResol is odd Spec=[Spec;flipud(conj(Spec(2:end,:)))]; else Group 1 07DT4 Page 24
Speech Enhancement Report Spec=[Spec;flipud(conj(Spec(2:end-1,:)))]; end sig=zeros((FrameNum-1)*ShiftLen+windowLen,1); weight=sig; for i=1:FrameNum start=(i-1)*ShiftLen+1; spec=Spec(:,i); sig(start:start+windowLen-1)=sig(start:start+windowLen-1) +real(ifft(spec,windowLen)); end ReconstructedSignal=sig;
function Seg=segment(signal,W,SP,Window)
% SEGMENT chops a signal to overlapping windowed segments % A= SEGMENT(X,W,SP,WIN) returns a matrix which its columns are segmented % and windowed frames of the input one dimentional signal, X. W is the % number of samples per window, default value W=256. SP is the shift % percentage, default value SP=0.4. WIN is the window that is multiplied by % each segment and its length should be W. the default window is hamming % window. % 06-Sep-04 % Esfandiar Zavarehei
if nargin<3 SP=.4; end if nargin<2 W=256; end if nargin<4 Window=hamming(W); Group 1 07DT4 Page 25
Speech Enhancement Report end Window=Window(:); %make it a column vector L=length(signal); SP=fix(W.*SP); N=fix((L-W)/SP +1); %number of segments Index=(repmat(1:W,N,1)+repmat((0:(N-1))'*SP,1,W))'; hw=repmat(Window,1,N); Seg=signal(Index).*hw;
function [NoiseFlag, SpeechFlag, NoiseCounter, Dist]=vad(signal,noise,NoiseCounter,NoiseMargin,Hangover)
%[NOISEFLAG, SPEECHFLAG, NOISECOUNTER, DIST] =vad(SIGNAL,NOISE,NOISECOUNTER,NOISEMARGIN,HANGOVER) %Spectral Distance Voice Activity Detector %SIGNAL is the the current frames magnitude spectrum which is to labeld as %noise or speech, NOISE is noise magnitude spectrum template (estimation), %NOISECOUNTER is the number of imediate previous noise frames, NOISEMARGIN %(default 3)is the spectral distance threshold. HANGOVER ( default 8 )is %the number of noise segments after which the SPEECHFLAG is reset (goes to %zero). NOISEFLAG is set to one if the the segment is labeld as noise %NOISECOUNTER returns the number of previous noise segments, this value is %reset (to zero) whenever a speech segment is detected. DIST is the %spectral distance. %Saeed Vaseghi %edited by Esfandiar Zavarehei %Sep-04
if nargin<4 NoiseMargin=3; end Group 1 07DT4 Page 26
Speech Enhancement Report if nargin<5 Hangover=8; end if nargin<3 NoiseCounter=0; end
FreqResol=length(signal);
SpectralDist= 20*(log10(signal)-log10(noise)); SpectralDist(find(SpectralDist<0))=0; Dist=mean(SpectralDist); if (Dist < NoiseMargin) NoiseFlag=1; NoiseCounter=NoiseCounter+1; else NoiseFlag=0; NoiseCounter=0; end
% Detect noise only periods and attenuate the signal if (NoiseCounter > Hangover) SpeechFlag=0; else SpeechFlag=1; end
Group 1 07DT4
Page 27

Final Project

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Project

Uploaded by

Copyright:

Available Formats

Speech Enhancement Report 1.

L thuyt tng quan S tng qut x l ting ni:

Phn tch tn hiu thnh cc frame Tn hiu b nhiu

Chng ta c th biu din Y( ) di dng phc nh sau:

ch rng gi tr l gi tr c tnh gn ng. Tn hiu

Trng hp chung th thut ton Spectral subtraction c th c biu din:

Vi p = 1 l l phng php tr ph bin in hnh, p = 2 l phng php tr ph cng sut.

Chng ta c th biu din Y( ) di dng phc nh sau:

Speech Enhancement Report

t Priori SNR v Posteriori SNR nh sau:

Qu trnh x l tn hiu ting niu Group 1 07DT4 Page 5

Trung bnh ha bin tn hiu

Frame ang xt l nhiu?

Thc hin tr ph YS(:,i)-N thnh tn hiu sch c lng D(i)

Thc hin IDFT v ni cc frame ting ni dng hm OverlapAdd2 End

Speech Enhancement Report

2.2 Lu thut ton Wiener Filtering

Tinh cong suat nhieu trung binh ban u (N)

Frame ang xt l nhiu?

Speech Enhancement Report

s u vo? Chuyn ca s Hamming thnh vector ct

Tnh s mu ca tn hiu c nhiu

Tnh s mutrong khong dch SP

Chia thnh nhng Frame vi dch SP

Nhn tng Frame vi ca s hamming

Speech Enhancement Report

2.4 Lu thut ton hm VAD

Gn gi tr khong cch ph ngng

Gn s segment lin tip ngng quy nh l nhiu

Tnh khong cch ph ca segment u vo

khong cch ph ca segment u vo< khong cch ph ngng

Noise Counter > S Segment lin tip ngng quy nh l nhiu ?

Segment ang xt l nhi u

Speech Enhancement Report

2.5 Lu thut ton hm OverlapAdd2:

Xy dng cc frames tn hiu ting ni c c lng

Chng cc frames theo t l dch

Speech Enhancement Report Spectral Subtraction Matlab Code function [output,Speech]=SSBoll79(signal,fs,IS)

Speech Enhancement Report end end

disp(' Synthesis'); output=OverlapAdd2(X.^(1/Gamma),YPhase,W,SP*W);

Speech Enhancement Report start=(i-1)*ShiftLen+1; spec=Spec(:,i); sig(start:start+windowLen-1)=sig(start:start+windowLen-1) +real(ifft(spec,windowLen)); end ReconstructedSignal=sig;

%[NOISEFLAG, SPEECHFLAG, NOISECOUNTER, DIST] =vad(SIGNAL,NOISE,NOISECOUNTER,NOISEMARGIN,HANGOVER)

Speech Enhancement Report NoiseCounter=0; end

FreqResol=length(signal); SpectralDist= 20*(log10(signal)-log10(noise)); SpectralDist(find(SpectralDist<0))=0;

Speech Enhancement Report Wiener Filter Matlab Code function output=WienerScalart96(signal,fs,IS)

Speech Enhancement Report end end % ......................................UP TO HERE

pre_emph=0; signal=filter([1 -pre_emph],1,signal);

NIS=fix((IS*fs-W)/(SP*W) +1);%number of initial silence segments

G=(xi./(xi+1)); X(:,i)=G.*Y(:,i); %Obtain the new Cleaned value waitbar(i/numberOfFrames,h,num2str(fix(100*i/numberOfFrames))); end

if mod(windowLen,2) %if FreqResol is odd Spec=[Spec;flipud(conj(Spec(2:end,:)))]; else Group 1 07DT4 Page 24

function [NoiseFlag, SpeechFlag, NoiseCounter, Dist]=vad(signal,noise,NoiseCounter,NoiseMargin,Hangover)

if nargin<4 NoiseMargin=3; end Group 1 07DT4 Page 26

Speech Enhancement Report if nargin<5 Hangover=8; end if nargin<3 NoiseCounter=0; end

You might also like

NIS=fix((ISfs-W)/(SPW) +1);%number of initial silence segments

G=(xi./(xi+1)); X(:,i)=G.Y(:,i); %Obtain the new Cleaned value waitbar(i/numberOfFrames,h,num2str(fix(100i/numberOfFrames))); end