You are on page 1of 32

Uppsala University Department of Engineering Sciences Adaptive Signal Processing 5p spring 2006

Acoustic Echo Cancellation

Authors: Aleksandar Jovanovic Kalle Nilvr Patrik Sderberg Magnus Broberg

Abstract
This paper shows some implementations of acoustic echo cancellation algorithms in Matlab and the results of analysis on the broader systems involved. It is the result of a project in the course adaptive signal processing at Uppsala University. It focuses on Normalized Least Mean Square (NLMS) and Variable Impulse Response Double Talk Detector (VIRE DTD). Discussions on Stereophonic Acoustic Echo Cancellation (SAEC) are carried out and we recommend some topics for doing further work on with this project.

Acknowledgements
Professor Andreas Jakobsson at Karlstad University has developed an assignment1 for a course in adaptive signal processing which clearly illustrates the effects of acoustic echo cancellation. It has implementations of the NLMS algorithm and of the Geigel DTD. We have used the Matlab code as a starting ground for our work and have also used the sounds included since our recorded sounds created strange results, perhaps due to some downsamplings that we made. Andreas was also kind enough to mail an article about Fast Normalized Cross Correlation (FNCC) DTD and answer a question on the behavior of VIRE DTD.

Mikael Sternad has been our supervisor and we have received a lot of support from him.

Daniel Aronsson showed a useful way of plotting data in a timed sequence, a technique which we used since to analyze how the filter adapted. The technique also clearly showed the effects of adaption while double talk was present. He also suggested a loop method to extract correct threshold for double talk detection.

Lars Johan Brnnmark helped out when we had problems of measuring the room impulse response. After some short time of work from his side the measurements worked correctly.

Simon Mika, Simon Moritz and Carl-Johan Larsson made some progress with this project last year and some of our work is based on their results.

hgren, Per and Jacobsson, Andreas (2006) Course material for course in adaptive signal processing at Karlstad University

Table of contents
Conclusion ............................................................................................................................... 5 Introduction.............................................................................................................................. 7 Background .............................................................................................................................. 7 System overview...................................................................................................................... 9 Filter adaptation ....................................................................................................................... 9 5.1 LMS ............................................................................................................................... 10 5.2 NLMS ............................................................................................................................ 11 6 Talk detection......................................................................................................................... 12 6.1 Far-end talk detection .................................................................................................... 12 6.2 Double talk detection ..................................................................................................... 13 6.2.1 Geigel..................................................................................................................... 14 6.2.2 VIRE DTD ............................................................................................................. 15 6.2.3 Other ...................................................................................................................... 17 7 Comfort noise generator ........................................................................................................ 17 8 Measuring room impulse response ........................................................................................ 20 9 Stereophonic Acoustic Echo Cancellation (SAEC)............................................................... 22 10 Real time implementation .................................................................................................. 24 11 Views on further development........................................................................................... 25 12 Figure index ....................................................................................................................... 26 13 Matlab code segment index ............................................................................................... 26 14 Mathematical formula index .............................................................................................. 26 15 Subject index...................................................................................................................... 26 16 Bibliography ...................................................................................................................... 27 17 Appendix............................................................................................................................ 28 17.1 aec.m - For acoustic echo cancellation .......................................................................... 28 17.2 ir.m - For calculating impulse response of a room ........................................................ 31 1 2 3 4 5

1 Conclusion
There are many ways of solving the acoustic echo cancellation and the market is flooded with algorithms for both adaptation and double-talk detection. We opted for the VIRE DTD algorithm proposed by Per hgren2 but a stable implementation was very difficult and behaviour was somewhat unpredictable.

This year we improved last years work humongously, making our solutions better on all previously implemented areas and implemented all restoring parts of a complete AEC solution. The results were very good, but there are of course things that could be improved in a complicated system like this. Maybe next step would be to try to cut down the computation time and take the step to a full real-time system.

The results of our system can be viewed in Figure 1.


x 10 2 arg 1 0 -2 2 arg 2 0 -2 2 arg 3 0 -2 2 arg 4 0 -2 2 arg 5 0 -2 2 arg 6 0 -2 2 x 10
4

x 10

10

12

14

16

18

20

x 10

10

12

14

16

18

20

x 10

10

12

14

16

18

20

x 10

10

12

14

16

18

20

x 10

10

12

14

16

18

20

10

12

14

16

18

20

arg 7

0 -2 2 4 6 8 10 12 14 16 18 20

Figure 1: SPCLAB result window showing, from top to bottom, far-end talk, near-end talk, microphone pickup, filtered signal, double-talk detection, far-end talk detection and finally an indication on when adaptation is taking place.

hgren, Per (2004) On System Identification and Acoustic Echo Cancellation

spclab( xF(indV), v(indV), y(indV), e(indV), isDT(indV)*max(e)/2, isFET(indV)*max(e)/2, isAdapt(indV)*max(e)/2 ); Matlab code segment 1: Using spclab to plot the results.

The performance of our filter can be viewed in Figure 2 and Figure 3

Figure 2: ERLE plot through Matlab command plot( smooth(-10*log10((((e(1:100000))).^2+eps) ./ ((y(1:100000)).^2+eps)+eps), 5000) );

Figure 3: ERLE plot with NLP through Matlab command plot( smooth(10*log10((((e(1:100000).*not(isNLP(1:100000)))).^2+eps) ./ ((y(1:100000)).^2+eps)+eps), 1000) );

2 Introduction
This report is the result of a project course in adaptive signal processing at Uppsala University. The aim of the project is to improve last years research on acoustic echo cancellation. More specifically the tasks were to implement algorithms for adaptation in such ways that the results exceeded those of the previous group using one microphone and one loudspeaker, and to further examine the possibility of using two microphones and two loudspeakers. The tool used is primarily Matlab.

3 Background
The problem of acoustic echo cancellation is the result of hands-free telephony and teleconferencing systems. In early telephony the microphone and loudspeaker were separated and no sound could propagate between the speaker and the microphone. Therefore no echo would be transmitted back. Using a hands free loudspeaker telephone, however, the sound from the loudspeaker will be picked up by the microphone and transmitted back to the sender who will

recognize this as an echo. This severely reduces conversation quality, even at very small echo delays.

Figure 4: A telephone conference using an IP-telephony system.

In a room with no propagation delay and no room impulse response (i.e. a studio with dampening walls and the microphone placed with no distance from the loudspeaker) the solution would simply be to subtract the input (far-end talk), which is readily available, from the output signal picked up by the microphone, which consists of both near-end talk and far-end talk. After the subtraction the output signal would consist of near-end talk only. This is not possible, however, since the room both alters the sound and spreads it over time. Using IP-telephony, as illustrated in Figure 4, this spread over time will vary according to the delay in the net and therefore IPtelephony introduces even more problems. Due to these problems the input must be modified accordingly before we subtract it. The problem is that the parameter after which it should be modified is unknown. This is where the adaptive filtering technology comes in. The adaptive filter adjusts according to inputs and outputs to form the parameters mentioned after which the input must be modified if a subtraction is to be useful.

Acoustic echo cancellation also introduce a second problem: To be able to detect when there is nothing but far-end talk entering the microphone, and when there are other things entering aswell. The adaptation algorithm uses only one measurement, the difference between the modified input and the real input. If this difference is zero, and no near-end talk is present, the filter will be an

exact copy of the room impulse response and hence work as intended. Now if, at this time, there is near-end talk, the difference will be equal to the near-end talk and the filter algorithm will notice this as an error in the filter. The filter will therefore adapt to cancel out the near-end talk aswell and as a result the it will cease to work.

The same techniques are used in data networking where there are also problems with echoes.

4 System overview
To solve the acoustic echo problem the setup in Figure 5 was used.

Figure 5: System overview. The following notations are used: v(t) = white noise, (t) = adaptive coloring filter, n_hat(t) = comfort noise, NLP = Non-Linear Processor, e(t) = error signal, d(t) = estimated echoic signal, h_hat(t) = adative filter, y(t) = uncorrected output, s(t) = near-end talk, n(t) = ambient near-end noise.

(t ) . The The goal is to mimic h(t), which is the room impulse response, with the adaptive filter h Comfort Noise Generator and the NLP is used to further improve output, but are not an essential part of the adaptive filtering problem.

5 Filter adaptation
A filter is something that transforms data to extract useful information from a noisy environment. In digital filtering there are two primary types; infinite impulse response (IIR) and finite impulse response (FIR). IIR filters can normally achieve the same filter characteristics as a FIR filter using less memory and calculations with the cost of possibly becoming unstable. As the filters become more complex though, IIR filters needs more parameters and the advantages are reduced.

Because of the high complexity of the many strong and sharp peaks in a room impulse response the filters that are being used in acoustic echo cancellation are usually of the FIR type.3

For the filter in an acoustic echo cancellation to work in the real case, with changing parameters such as different room acoustics and people moving around in the room, a filter with adapting parameters (taps) is necessary.

Figure 6: Filter adaptation.

There are numerous adaptive algorithms that are applicable in acoustic echo cancellation such as least mean squares (LMS), recursive least squares (RLS), affine projection algorithm (AFA) and different degenerates there of. LMS is an old, simple and proven algorithm which has turned out to work well in comparison with newer more advanced algorithms. In this project we use the normalized LMS (NLMS) for the main filter and LMS for the noise generation.

5.1

LMS

In 1959 Widrow and Hoff introduced the LMS algorithm. During the years this has been the by far most used adaptive filtering algorithm for several reasons; it was the first, it requires relatively little computation and it works well, at least for slow changes in the filter.

The LMS filter uses a gradient method of steepest decent to adapt its weights to minimize some function c(i) defined in Mathematical formula 1
c (n ) = E e(n )

Mathematical formula 1: Function c(n) to be minimized.

Liavas, Athanasios P. and Regalia, Phillip A. (1998) Acoustic Echo Cancellation: Do IIR Models Offer Better Modeling Capabilities than Their FIR Counterparts?

10

where e(n) is defined in Figure 5. In comparison to other algorithms LMS is relatively simple as it doesnt require correlation function calculation or matrix inversions, for each sample in the signal LMS only uses two multiplications and two additions.

hn +1 (i ) = hn (i ) +

* e(n ) * x(n i )

Mathematical formula 2: Adjustment of taps using LMS algorithm.

The taps are adapted as shown in Mathematical formula 2 where h, e and x are defined in Figure 5 and is the step length between zero and one over the largest eigenvalue of the correlation matrix.

if(isNT(k)) % If there is No Talk, adapt comfort noise filter % Adapt using LMS CNGFilter = CNGFilter + mu0/2 * e(k) * whiteNoiseBlock; end Matlab code segment 1: LMS implementation.

5.2

NLMS

The primary disadvantage of the LMS algorithm is its slow convergence rate that is the result of the static step length . In NLMS is normalized by the energy of the signal vector as in Mathematical formula 3 and therefore achieves a much faster convergence rate then LMS at a low cost. To avoid division by zero a small number is often added to the energy.

NLMS (n ) =

+ x (n ) * x(n )
T

Mathematical formula 3: Step length adjustment using NLMS algorithm.

hn +1 (i ) = hn (i ) +

+ x (n ) * x(n )
T

* e(n ) * x(n i )

Mathematical formula 4: Adjustment of taps using NLMS algorithm.

The performance of NLMS has been satisfying, as shown in Figure 7.

11

Figure 7: The error declines with time as the adaptive filter converges towards the room impulse response.

6 Talk detection
(t ) , Talk detection is used for deciding on when to activate the NLP and when to adapt the filter h amongst others. There are two types of talk detection, far-end talk detection and double talk detection.

6.1

Far-end talk detection

The far-end talk is detected by measuring the power of the signal and comparing it to some threshold. The threshold is calculated dynamically as shown in Matlab code segment 2.
% Calculate the far-end detection threshold. temppow = zeros(floor(length(xF)/fs)-1, 1); for k = 1:length(temppow), temppow(k) = xF((k-1)*fs+1:k*fs)' * xF((k-1)*fs+1:k*fs) / fs; end % FE threshold is 1/10 of the average power farendThres = max(temppow(k)) * 1/10; Matlab code segment 2: Dynamic calculation of the far-end detection threshold.

Measurement and comparison is then made as in Matlab code segment 3.

12

energyOfSignalBlock = signalBlock' * signalBlock; powerOfSignalBlock = energyOfSignalBlock / filterLength; isFET(k) = (powerOfSignalBlock > farendThres); Matlab code segment 3: Measurement of far-end talk power and comparison to calculated threshold.

This method results in the detection that is shown in Figure 8.


x 10
4

Far-end voice detection

2.5 2 1.5 1 0.5 Amplitude 0 -0.5 -1 -1.5 -2 -2.5

Far-end voice detection Far-end voice signal 0 2 4 6 8 10 Sample 12 14 16 18 x 10


4

Figure 8: Far-end talk detection.

6.2

Double talk detection

One of the more difficult parts with acoustic echo cancellation is to know when to stop adapting the filter. The filter must only be adapted when there is far-end talk only, but not when there are both far- and near-end talk. The near-end talk would make the system estimation process fail and produce extremely erroneous results. Therefore it is necessary to detect when there is both farend and near-end talk. This is what is called double-talk and for that a double-talk detector (DTD) is needed.

However there is a problem in the real case where near-end talk is not available by it self but only in a combination with far-end talk in the microphone signal. The difficulty is to distinguish the different sub signals and to know which is what.

13

There are several solutions to this problem and we have chosen to implement two of these and study them in terms of performance and computational complexity.

Figure 9: Double talk detection. When both far-end talk and near-end talk is present a detection variable (marked with a blue line) is set.

6.2.1 Geigel
The Geigel algorithm is a very simple DTD with low computational complexity. It is based on the assumption far-end talk has lower power then the near-end talk when we receive the signal in the microphone. The room will most likely have worked dampening on the far-end signal and the volume on the speaker is with any luck not turned up too much. Practically we form a decision variable as shown in Mathematical formula 5.
y (t ) max{ x(t ) , K , x(t n ) }

d (t ) =

Mathematical formula 5: Geigel decision variable.

If d(t) becomes larger than some predetermined threshold there is doubletalk.

14

We implemented this and got it to work very well if the power of the far-end signal was significantly lower then the near-end signal. This was an acceptable solution to the double-talk problem, but the implementation areas we aimed at, with unknown speaker and microphone positions, demanded a more flexible solution.

6.2.2 VIRE DTD


The Variance Impulse Response algorithm (VIRE) calculates the variance of the maximum value of the recent taps in the adaptive filter.

Figure 10: The development of the taps during filter adaptation. When double talk is present the taps diverge from the average.

If the variance exceeds some threshold, which could be varied over time, we have double-talk. In other words, if the estimated room impulse response changes a lot, we assume that it is not the room that has changed, but that some other source of sound has appeared. The formula is somewhat complicated,

15

2 (t ) = 2 (t 1) + (1 )( (t ))2 (t ) = (t 1) + (1 )
,K, h = max h 1 n

Mathematical formula 6: The VIRE algorithm.

with (t) being the mean of , and a forgetting factor4, though the calculation is very lightweight as it only needs five multiplications.

forgettingFactor = 0.97; % Forgetting factor for VIRE. Matlab code segment 4: Eventually the forgetting factor was set to 0.97. Test showed promising results using 0.5 aswell.

We got this algorithm to work very well, but it demanded certain tweaking that seemed to be input specific. Especially , the forgetting factor, was a challenge to understand and optimize. To get good results we needed to change it a lot, and we couldnt find a good way to estimate it over time.

% VIRE DTD if k > 10, tap(k) = max(abs(tempFilter)); tapmean(k) = forgettingFactor*tapmean(k-1) + (1-forgettingFactor)*tap(k); variance(k) = forgettingFactor*variance(k-1) + (1-forgettingFactor)*(tap(k)-tapmean(k))^2; end if(k > 10*filterLength && variance(k) > vireThres && k+DTMemory < length(xF)) isDT(k : k+DTMemory) = 1; end Matlab code segment 5: VIRE DTD algorithm.

Because the calculation is very lightweight, however, it fitted our purpose best of the algorithms we read about, so we decided to go with it. Nonetheless there are some other algorithms that were worth looking at.

hgren, Per (2004) On System Identification and Acoustic Echo Cancellation

16

Figure 11: The VIRE variance. When it exceeds the threshold (marked with a red line) it triggers the detector (marked with a blue line).

6.2.3 Other
There are several other ways of detecting double talk. The Cheap Normalized Cross Correlation (CNCR) algorithm for example is based on the comparison of the variances of the estimated signal and the measured signal.

It might be a good idea to go with another DTD algorithm if you skip the real-time implementation goal; it would have saved us a lot of time if we chose an easier algorithm and that might would have given us better results as well.

7 Comfort noise generator


The purpose of the Comfort Noise Generator (CNG) is to create synthetic noise while the controller unit has shut down output from the system. The reason why one would want to shut down the output is that it is unnecessary to transmit data when near end is silent as it would mean that only far end sound would be fed back to the sender. By having a Non-Linear Processor (NLP) it is possible to stop this sound from being transmitted. The NLP is activated when there is far-end talk but no double talk, and hence no near-end talk, as shown in Matlab code segment 6.

17

isNLP(k) = ( not(isDT(k)) && isFET(k) ); Matlab code segment 6: Setting status of NLP.

However, if one would choose to not transmit anything the user on the other end might suspect that the line has gone down. To avoid this, comfort noise is sent instead. This noise is colored

(t ) which is according to the background noise in the room. It is done through an adaptive filter g

(t ) as an adaptive parameter. calculated by LMS with the error signal used by the main filter h

2.5 2 1.5 1 Amplitude 0.5 0 -0.5 -1 -1.5

x 10

Non-Linear Processor Activation

Non-Linear Processor Far end signal Near end signal 0 2 4 6 8 10 Sample 12 14 16 18 x 10


4

Figure 12: This figure shows the activation of the NLP according to far end talk and near end talk. The NLP should be activated when there is far end talk but no near end talk.

White noise is created with the WGN (White Gaussian Noise) command in Matlab. We are setting the strength of this noise statically (assigning -27 to a parameter specifying power in decibels relative to a watt) as shown in Matlab code segment 7, but we would rather want to set it dynamically according to the intensity of the ambient noise in the room. This however has proven very difficult and therefore the noise level is adjusted to suit our equipment. If other equipment is to be used this parameter may have to be altered.

18

whiteNoise(k) = wgn(1,1,-27); Matlab code segment 7: The creation of white noise.

(t ) is updated if there is near-end talk as shown in Matlab code segment 8. The coloring filter g

if(isNT(k)) % Adapt using LMS CNGFilter = CNGFilter + mu0/2 * e(k) * whiteNoiseBlock; end Matlab code segment 8: Adaptation of comfort noise coloring filter.

Over time the filter will adapt to model the noise that is present in the near end room as illustrated in Figure 13.
Comfort noise filter after 1700 samples 1000 Filter taps 500 Gain 0 -500 -1000 Gain Comfort noise filter after 35000 samples 1000 Filter taps 500 0 -500 -1000 0 20 40 Tap 60 80 100 -1500 0 20 40 Tap 60 80 100

Comfort noise filter after 79000 samples 1000 Filter taps 500 Gain Gain 0 -500 -1000 -1500 0 20 40 Tap 60 80 100

Comfort noise filter after 158000 samples 2000 Filter taps 1000 0 -1000 -2000

20

40 Tap

60

80

100

Figure 13: The comfort noise filter at 1700, 35000, 79000 and 158000 samples, respectively. Notice the slight divergence at 158000.

Finally, the assembly of a number of white noise samples generated in Matlab code segment 7, whiteNoiseBlock, is filtered through the coloring filter CNGFilter if the NLP is activated,

19

if(isNLP(k)) comfortNoise(k) = whiteNoiseBlock' * CNGFilter; e(k) = comfortNoise(k); end Matlab code segment 9: Coloring of comfort noise.

and the comfort noise is set as the output. Figure 14 is showing what the colored noise looks like at the times the NLP is active, that is comfortNoise(k). This result is added to the output signal.

CNG 2000 1500 1000 500 Noise Level 0 -500 -1000 -1500 -2000 Comfort noise Non Linear Processor

8 10 Sample

12

14

16

18 x 10
4

Figure 14: The generated comfort noise as the NLP turns off the microphone output. The noise filter diverges somewhat between 150000 and 165000 samples when we have undetected near end talk which create a louder noise level but then starts to converge.

8 Measuring room impulse response


When the adaptive filter works optimal it will converge towards the impulse response of the LEM system. A room impulse response will therefore be good to have for comparisons with the filter. Determination of an impulse response is essential to have control over all the factors of the simulation and also to be able to vary them.

To measure an impulse response of a filter (in our case room) there are several methods that can be used, one can record the echo of a impulse such as a loud bang, one can use sine-waves of all the different frequencies as input and see what the system does or one can record what comes out

20

of the system when white noise is used for input. In the latter two cases the response of the system is deconvolved with the input and the resulting signal is the impulse response.

Figure 15: Impulse response in room 1116 at Magistern.

Using sine-waves will produce the best result but going though all the relevant frequencies is a time consuming task. For the impulse method to work optimal an infinitely short pulse with infinite height needs to be used. A loud bang such as a clap or a balloon popping would work but in most cases not give a very good result. The final method of recording white noise has the potential of giving a good result while it quite easily can be realized using Matlab and doesnt require anything other than a computer equipped with a microphone and a speaker. This is the method that we have chosen.

To be able to use division in the frequency domain instead of deconvolution in time domain white noise with constant power is needed. This is accomplished by generating the noise in the frequency domain with random phase.

r=ones(1,fftLen/2+1); % constant amplitude arg=[0 2*pi*rand(1,fftLen/2-1) 0]; % random phase X_0_nyq=r.*exp(i*arg); % complex noise in freq. domain X=[X_0_nyq conj(fliplr(X_0_nyq(2:end-1)))]; % mirror x=ifft(X); % transform to time domain Matlab code segment 10: Noise generation courtesy of Lars-Johan Brnnmark.

Further improvement of the result can be achieved by using the mean of multiple recordings. The periodic signal is then played back and recorded at the same time after which the recording is divided into periods again.

21

periods = 32; for j=1:periods xx(((j-1)*fftLen+1):(j*fftLen)) = x; % make signal periodic end r = audiorecorder(fs,16,1); sound(xx, fs); % play noise recordblocking(r,length(xx)/fs); % record noiserec = getaudiodata(r); for j=1:periods x_rec(j,:) = noiserec(((j-1)*fftLen+1):(j*fftLen)).'; % split into blocks end Matlab code segment 11: Recording of noise.

Finally the impulse response is calculated by dividing recording with noise in frequency domain as presented in Matlab code segment 12.

for j=1:periods X_REC = fft(x_rec(j,:)); % transform recorded noise ir(j,:) = ifft(X_REC./X); % determine one impulse response end impulse_response = mean(ir); Matlab code segment 12: Calculate impulse response.

9 Stereophonic Acoustic Echo Cancellation (SAEC)


In teleconferencing systems, stereo sound would offer a better user experience then in a mono system. It would offer the users the possibility to distinguish between different voices by determining which speaker delivers the sound. But to cancel acoustic echo that comes from two speakers into two microphones, that is demanded if we want stereo sound, turns out to be a very complex problem.

22

Figure 16: A typical Loudspeaker Enclosure Microphone (LEM) setup in the stereophonic case

One microphone signal, y1(t), can be modeled with the following equation:

y1 (t ) = h11 x1 (t ) + h12 x 2 (t ) + n(t )


Mathematical formula 7: Microphone signal

Where * denotes convolution, h11 and h12 are the room impulse responses for the different speakers to the microphone and n(t) is the noise of the room. The other microphone signal can be modeled similarly since the system is symmetrical. This will make it at least four times as computation heavy as in the mono case. One big problem with SAEC is what is commonly referred to as the non-uniqueness problem5. This problem arises from the fact that the signals x1(t) and x2(t) are highly correlated since they originate from the same source and the fact that in a typical scenario where you would like stereo sound there are different people speaking alternatively6. The algorithms used must track both near-end and far-end changes in the echo paths, which is not easy since they can change so drastically if another person starts talking. It is therefore important to keep the rooms impulse response estimate very close to the real room impulse response before the paths change, which would demand fast adaptive filter, and that is of course very hard to accomplish7.

5 6

Sundar G. Sankaran (1999), On ways to improve adaptive filter performance Masahiro Yukawa, Noriaki Murakoshi, and Isao Yamada (2005), Efficient Fast Stereo Acoustic Echo Cancellation Based on Pairwise Optimal Weight Realization Technique 7 hgren, Per (2004) Stereophonic Acoustic Echo Cancellation

23

There are different ways to approach SAEC and the non-uniqueness problem. One way is to reduce the correlation of the input signals, but without adding distortion to the signal. One way to do this is to introduce nonlinearity in one of the input signals. You could also add random noise to the input channels. A possible way to do this without destroying the sound totally would be to add noise that the human cant hear.

It seems very hard to solve the SAEC problem, and extremely hard to implement it in a practically usable way. It would be very computational heavy and the different user-scenarios that could come up makes SAEC a big, if not impossible, challenge for next years group.

10 Real time implementation


A sampling rate of 8000 Hz was chosen because by Nyquist theory, a band limited analogue signal can be sampled at a frequency double its bandwidth and be recovered. Thus, by sampling at 8 kHz, a voice signal which has a bandwidth of 4 KHz can be sampled and recovered. Simulations have shown that filter with 10248 taps is suitable to model room impulse responses. Apparently echo cancellation is a very demanding process9. We did not succeed in implementation of real time software echo canceller running natively on a PC with the help of the MATLAB software. Our case shows that using a 1024 taps long adaptive filter and sampling rate of 8 kHz the necessary time for calculating 25 seconds of speech is approximately 39 seconds and the time taken for the NLMS algorithm to converge was about 10 seconds.

Real time implementation is possible through the use of custom Very Large Scale Integration (VLSI) processors or Digital Signal Processors (DSP). These processors are specially designed for signal processing tasks and their computational power is very high10. They provide parallel processing of commands. DSP programs work by hardware interrupts. Sampling at 8 kHz, the sampling program will be interrupted every 125 sec by the next sampled signal. In the case of a 40 MHz fast DSP each instruction takes 25 nsec to complete. Thus, there are 125 sec/25 nsec = 5000 machine cycles available for echo canceling calculation before the next sampled signal arrives11.
Berkeman, Anders and Owall, Viktor Architectural tradeoffs for a custom implementation of an acoustic echo canceller 9 Raghavendran, Srinivasaprasath (2003) Implementation of an Acoustic Echo Canceller Using Matlab 10 hgren, Per (2004) On System Identification and Acoustic Echo Cancellation 11 Chong Chew, Wee and Boroujeny, Farhang (1997) Software Simulation and Real-time Implementation of Acoustic Echo Cancelling
8

24

Figure 17: A process diagram for handling incoming data continously.

Multiplication has several times higher complexity than an addition, so only multiplications are considered when choosing proper DSP. The division operation used by the NLMS has a high complexity, but it is not used as frequent as multiplications in this algorithm, and is therefore considered negligible in the analysis.

Unlike Matlab where we have floating point data representation, DSP algorithms store the data with finite precision. Unnecessary large word lengths on signals result in larger arithmetic blocks and larger memories. Such extra hardware consumes power without any performance gain. Therefore, all signals should have a minimum word length. On the other hand, as the signal word length also determines resolution and dynamic range, there is a trade off between performance and power consumption. It is important to keep signals wide enough to avoid overflow and rounding errors, or at least keep the probabilities for such events to a minimum.

11 Views on further development



Keep algorithms as simple as possible Continue investigation and implement real time adaptation Pay less attention to the Stereophonic Acoustic Echo Cancellation-problem

25

12 Figure index
Figure 1: SPCLAB result window. .................................................................................................. 5 Figure 2: ERLE plot......................................................................................................................... 6 Figure 3: ERLE plot with NLP ........................................................................................................ 7 Figure 4: A telephone conference using an IP-telephony system.................................................... 8 Figure 5: System overview.. ............................................................................................................ 9 Figure 6: Filter adaptation.............................................................................................................. 10 Figure 7: The error decline. ........................................................................................................... 12 Figure 8: Far-end talk detection..................................................................................................... 13 Figure 9: Double talk detection...................................................................................................... 14 Figure 10: The development of the taps during filter adaptation................................................... 15 Figure 11: The VIRE variance....................................................................................................... 17 Figure 12: Activation of NLP according to far end talk and near end talk.................................... 18 Figure 13: The comfort noise filter at 1700, 35000, 79000 and 158000 samples. ........................ 19 Figure 14: The generated comfort noise as the NLP turns off the microphone output.. ............... 20 Figure 15: Impulse response in room 1116 at Magistern. ............................................................. 21 Figure 16: A typical LEM setup in the stereophonic case ............................................................. 23 Figure 17: A process diagram for handling incoming data continously........................................ 25

13 Matlab code segment index


Matlab code segment 1: Using spclab to plot the results................................................................. 6 Matlab code segment 2: Dynamic calculation of the far-end detection threshold......................... 12 Matlab code segment 3: Measurement and comparison of far-end talk power. ............................ 13 Matlab code segment 4: Setting the forgetting factor.................................................................... 16 Matlab code segment 5: VIRE DTD algorithm. ............................................................................ 16 Matlab code segment 6: Setting status of NLP.............................................................................. 18 Matlab code segment 7: The creation of white noise. ................................................................... 19 Matlab code segment 8: Adaptation of comfort noise coloring filter............................................ 19 Matlab code segment 9: Coloring of comfort noise....................................................................... 20 Matlab code segment 10: Noise generation courtesy of Lars-Johan Brnnmark. ......................... 21 Matlab code segment 11: Recording of noise................................................................................ 22 Matlab code segment 12: Calculate impulse response. ................................................................. 22

14 Mathematical formula index


Mathematical formula 1: Function c(n) to be minimized. ............................................................. 10 Mathematical formula 2: Adjustment of taps using LMS algorithm............................................. 11 Mathematical formula 3: Step length adjustment using NLMS algorithm.................................... 11 Mathematical formula 4: Adjustment of taps using NLMS algorithm. ......................................... 11 Mathematical formula 5: Geigel decision variable........................................................................ 14 Mathematical formula 6: The VIRE algorithm.............................................................................. 16 Mathematical formula 7: Microphone signal................................................................................. 23

15 Subject index
CNG, 17 DSP, 24, 25 DTD, 2, 3, 17 ERLE, 6, 7, 26

26

Geigel, 3, 15 LEM, 23 NLMS, 2, 3, 11, 18, 25 NLP, 17, 18, 20 Nyquist theory, 24

Real time implementation, 24 room impulse response, 3, 20, 23, 24 SAEC, 2, 22, 23, 24 WGN, 18 VIRE DTD, 3

16 Bibliography
1. hgren, Per and Jacobsson, Andreas (2006) Course material for course in adaptive signal processing at Karlstad University http://www.it.kau.se/ee/utbildning/kurser/tel614/Download.html 2, 4, 10. hgren, Per (2004) On System Identification and Acoustic Echo Cancellation http://www.ahgren.com/publications/phdthesis.pdf
3. Liavas, Athanasios P. and Regalia, Phillip A. (1998) Acoustic Echo Cancellation: Do IIR Models

Offer Better Modeling Capabilities than Their FIR Counterparts? http://www.telecom.tuc.gr/Greek/Liavas/publications/Acoustic%20Echo%20Cancellation%20Do %20IIR%20Models%20Offer%20Better%20Modeling%20Capabilities%20than%20Their%20FI R%20Counterparts.pdf 5. Sundar G. Sankaran (1999), On ways to improve adaptive filter performance http://scholar.lib.vt.edu/theses/available/etd-122099-153321/unrestricted/Chapter07.pdf 6. Masahiro Yukawa, NoriakiMurakoshi, and Isao Yamada (2005), Efficient Fast Stereo Acoustic Echo Cancellation Based on Pairwise Optimal Weight Realization Technique http://www.hindawi.com/GetPDF.aspx?doi=10.1155/ASP/2006/84797 7. hgren Per, (2004), Stereophonic Acoustic Echo Cancellation http://www1.shellkonto.se/ahgren/research_saec.html 8. Berkeman, Anders and Owall, Viktor Architectural tradeoffs for a custom implementation of an acoustic echo canceller http://www.norsig.no/norsig2002/Proceedings/papers/cr1125.pdf 9. Raghavendran, Srinivasaprasath (2003) Implementation of an Acoustic Echo Canceller Using Matlab http://etd.fcla.edu/SF/SFE0000169/Raghavendran_thesis.pdf 11. Chong Chew, Wee and Boroujeny, Farhang (1997) Software Simulation and Real-time Implementation of Acoustic Echo Cancelling http://www.ece.mtu.edu/ee/faculty/rezaz/index_files/Seminapapers2004/Kashulpatel.pdf

27

17 Appendix
17.1 aec.m - For acoustic echo cancellation
%clear; close all; disp('Initialize...') filterLength = 1000; forgettingFactor = 0.97; mu0 = 1; DTMemory = 1000; firstDoubleTalk = 10*filterLength; noiseStrength = -27; % Load data files. % % Uses our soundfiles % OUR_SOUND_FROM = 5; %in seconds % OUR_SOUND_TO = 15; %in seconds % % xF = wavread( 'kalle.wav' ) * 100000; % xE = wavread( 'kalle_room.wav' ) * 100000; % v = wavread( 'magnus_tyst.wav' ) * 100000; % xF = downsample(xF, 5); % xE = downsample(xE, 5); % v = downsample(v, 5); % % OUR_SOUND_FROM = 44100/5 * OUR_SOUND_FROM + 1; % OUR_SOUND_TO = 44100/5 * OUR_SOUND_TO; % % xF = xF(OUR_SOUND_FROM:OUR_SOUND_TO); % xE = xE(OUR_SOUND_FROM:OUR_SOUND_TO); % v = v(OUR_SOUND_FROM:OUR_SOUND_TO); % fs = 44100/5; % Uses test soundfiles xF = readData( 'FarEnd.pcm' ); h = [0 1 -0.8 0.3 -0.1 0.1 -0.5 0.3 0]; % xE = filter(h,[1],xF); % filterLength = 9; xE = readData( 'FarEndEcho.pcm' ); v = readData( 'NearEnd.pcm' ); fs = 8000; y = xE + v; % calculate the variance of the voice, in the real case we don't have this % so we use a statically assigned value instead varianceOfVoice = var(v); % Rough estimate of suitable treshold for VIRE DTD, should be done as a % estimate in the loop since we don't have xF vireThres = (mu0*varianceOfVoice / (2 - mu0))*mean(1/norm(xF)^2); % the number of filters to save for plotting numberOfSaves = 100; saveAtSample = floor(length(xF)/numberOfSaves); % Initialize adaptive filtering. e = zeros( size(xE) ); % Error signal. s = zeros( size(xE) ); % Estimated echo signal. % Length of adaptive filter. % Forgetting factor for VIRE. % Step size parameter. % Number of samples of inactivity after a DT. % Where to start double talk detection % Statically assigned noise strength

28

signalBlock = eps*ones(filterLength,1); % Adaptive filter state for time t. if (exist('filterTaps') == 0 || not(length(filterTaps) == filterLength)) filterTaps = eps*ones(filterLength,1); % Adaptive filter weights. end tempFilter = eps*ones(filterLength,1); % Adaptive filter weights. saveFilter = eps*ones(filterLength,numberOfSaves); saveTempFilter = eps*ones(filterLength,numberOfSaves); tap = zeros( size(xE) ); % the maximoum tap of the filter over time tapmean = zeros( size(xE) ); % the mean of the max taps variance = zeros( size(xE) ); % the variance of the max taps isDT = logical(zeros( size(xE) )); % is double talk isFET = logical(zeros( size(xE) )); % is far-end talk isAdapt = logical(zeros( size(xE) )); % is adapt filter isNLP = logical(zeros( size(xE) )); % is non linear processor comfortNoise = zeros( size(xE) ); CNGFilterLength = 100; if (exist('CNGFilter') == 0 || not(length(CNGFilter) == CNGFilterLength)) CNGFilter = eps*ones( CNGFilterLength,1 ); end saveCNGFilter = zeros(CNGFilterLength,numberOfSaves); whiteNoise = wgn(length(xF),1,noiseStrength); %zeros( size(xE) ); whiteNoiseBlock = eps*ones( CNGFilterLength, 1 ); % Calculate the far-end detection threshold, not for the real case % should be done in the loop using fethres = max(fethresh, power(last fs samples)) % or statically assigned temppow = zeros(floor(length(xF)/fs)-1, 1); for k = 1:length(temppow), temppow(k) = xF((k-1)*fs+1:k*fs)' * xF((k-1)*fs+1:k*fs) / fs; end farendThres = max(temppow) * 1/10; % FE threshold is 1/10 of the average power % alternative way of calculating far end threshold %farendThres = max(smooth(xF.^2, fs)) / 10; % Perform the adaptive filtering. disp('Perform adaptive filtering...') q = waitbar(0,'Adapting filter... Please hold on...'); for k = 1:length(xF), % whiteNoise(k) = wgn(1,1,noiseStrength); % generate white noise continously % Update the filter signalBlock. if k > filterLength, signalBlock = flipud(xF(k-filterLength+1:k)); whiteNoiseBlock = flipud(whiteNoise(k-CNGFilterLength+1:k)); end s(k) = signalBlock' * filterTaps; % Estimated echo value. e(k) = y(k) - s(k); % Prediction error. energyOfSignalBlock = signalBlock' * signalBlock; powerOfSignalBlock = energyOfSignalBlock / filterLength; isFET(k) = (powerOfSignalBlock > farendThres);

% Always adapt temp filter using NLMS tempFilter = filterTaps + (mu0/(energyOfSignalBlock + 1)) * e(k) * signalBlock; % VIRE DTD if k > 1,

29

tap(k) = max(tempFilter); tapmean(k) = forgettingFactor*tapmean(k-1) + (1-forgettingFactor)*tap(k); variance(k) = forgettingFactor*variance(k-1) + (1-forgettingFactor)*(tap(k)tapmean(k))^2; end if (variance(k) > vireThres && k > firstDoubleTalk) isDT(k : min([k+DTMemory length(xF)])) = 1; end isNLP(k) = isFET(k) && not(isDT(k)); isAdapt(k) = isNLP(k); isNT(k) = not(isFET(k)) && not(isDT(k)); if (isNT(k)) % Adapt comfort noise filter using LMS CNGFilter = CNGFilter + mu0/2 * e(k) * whiteNoiseBlock; end if (isNLP(k)) comfortNoise(k) = whiteNoiseBlock' * CNGFilter; e(k) = comfortNoise(k); end if ( isAdapt(k) ) % use the temp filter on the signal (in the next iteration) filterTaps = tempFilter; end % save filters regularly for plotting if (mod(k, saveAtSample) == 0) saveCNGFilter(:,k/floor(length(xF)/100)) = CNGFilter; saveFilter(:,k/floor(length(xF)/100)) = filterTaps; saveTempFilter(:,k/floor(length(xF)/100)) = tempFilter; waitbar( k/length(xF), q ); % update progressbar end end close(q); % Display the far-end, near-end, measured signal, and the result % after removing the estimated echo. indV = 1:length(xF); spclab( xF(indV), v(indV), y(indV), e(indV), isDT(indV)*max(e)/2, isFET(indV)*max(e)/2, isAdapt(indV)*max(e)/2 ); figure; plot (isFET*max(xF), 'DisplayName', 'Far-end voice detection'); title('Far-end voice detection'); hold all; plot (xF, 'DisplayName' , 'Far-end voice signal'); hold off; figure; plot (isDT*max(xF), 'DisplayName' , 'Double talk detection' ); title('Double talk detection'); hold all; plot (abs(xF), 'DisplayName', 'Far-end voice signal'); plot (-1*abs(v), 'DisplayName' , 'Near-end voice signal'); hold off; figure; plot (y, 'DisplayName' , 'Recorded signal'); title('Resulting signal'); hold all; plot (e, 'DisplayName' , 'Echo cancelled signal'); hold off;

30

figure; semilogy (isDT*mean(variance)*5, 'DisplayName', 'Double talk detector'); title('Double talk variance/threshold'); hold all; semilogy (smooth(variance,1000), 'DisplayName', 'VIRE variance'); semilogy (vireThres*ones(length(xF),1), 'DisplayName', 'Double talk threshold'); hold off; figure; plot (isNLP*max(xF), 'DisplayName', 'Non linear processor' ); title('NLP'); hold all; plot (abs(xF), 'DisplayName', 'Far end signal'); plot (-abs(v), 'DisplayName', 'Near end signal'); hold off; figure; plot (comfortNoise, 'DisplayName' , 'CNG' ); title('CNG'); hold all; plot (isNLP*max(comfortNoise), 'DisplayName', 'Non linear processor'); hold off; figure; plot (tap, 'DisplayName' , 'Largest tap' ); title('Tap development'); hold all; plot (variance, 'DisplayName', 'Variance of largest tap'); hold off; figure; C:\Documents and Settings\Patrik\Skrivbord\aec.m 5 of 5 den 8 juni 2006 11:57:45 hold off; for k=1:length(saveFilter(1,:)) plot(saveFilter(:,k)); axis([1 filterLength -1 1]); title(k); drawnow; pause(0.1); end figure; hold off; for k=1:length(saveCNGFilter(1,:)) plot(saveCNGFilter(:,k)); title(k); drawnow; pause(0.1); end % Plot ERLE using NLP: plot( smooth(-10*log10((((e(1:100000).*not(isNLP(1:100000)))).^2+eps) ./ ((y(1:100000)).^2+eps)+eps), 1000) ); % Plot ERLE not using NLP: plot( smooth(-10*log10((((e(1:100000))).^2+eps) ./ ((y(1:100000)).^2+eps)+eps), 5000) );

17.2 ir.m - For calculating impulse response of a room


% Hej!

31

% % Ett knep jag brukar anvnda r att generera en periodisk brussignal, dr periodtiden % mste vara strre n lngden p rummets impulssvar (ett vanligt vardagsrum har % typiskt ca 0.5 sek impulssvar). Sedan berknar man fft:n blockvis, med % blocklngd = brusets periodtid. Fr att inte tappa noggrannhet, r det viktigt % att X:s absolutbelopp r ca =1 fr alla frekvenser, vilket lses genom att generera % bruset i frekvensdomnen, med konstant belopp och slumpmssig fas, tex s hr: pause(30) fs=44100; %Samplingsfrekv fftLen=fs*1; %Impulssvaret ej lngre n 0.5 sek, blocklngden blir fs/2 %Konstant belopp mellan 0 Hz och nyquist r=ones(1,fftLen/2+1); % fasvinkel 0 vid 0 Hz och nyquist, resten slumpmssigt, likformigt % frdelat mellan 0 och 2*pi arg=[0 2*pi*rand(1,fftLen/2-1) 0]; %Generera komplext brus i frekv. domnen X_0_nyq=r.*exp(i*arg); %Spegla och konjugera spektrum mellan nyquist och 2*pi X=[X_0_nyq conj(fliplr(X_0_nyq(2:end-1)))]; x=ifft(X); %Berkna tidssekvensen (skall vara reell om allt funkar som det skall) x=0.99*x/max(abs(x)); periods = 32; xx = zeros(1, periods*fftLen); %Periodisera, t.ex. 12 perioder for j=1:periods xx(((j-1)*fftLen+1):(j*fftLen)) = x; end % Vid identifieringen, anvnd fs/2 punkter i fft:n. Gr som ni gjort tidigare, men nu % fr varje block, och medelvrdesbilda. Brusigheten kan fs ner ytterligare om ni % klipper bort eventuell inledande tystnad i brjan av inspelade bruset, alternativt % medelvrdesbildar ver fler perioder. r = audiorecorder(fs,16,1); sound(xx, fs); recordblocking(r,length(xx)/fs); noiserec = getaudiodata(r); x_rec = zeros(periods,fftLen); % initiera array fr periodisk inspelning ir = zeros(periods,fftLen); % initiera array fr impulssvar for j=1:periods x_rec(j,:) = noiserec(((j-1)*fftLen+1):(j*fftLen)).'; % dela upp inspelning i block X_REC = fft(x_rec(j,:)); ir(j,:) = ifft(X_REC./X); % bestm impulssvar % [c,b] = deconv(x_rec(j,:),x); % ir(i,:) = c; end % the final impulse response ir_avg = mean(ir);

32

You might also like