You are on page 1of 32

VoIP Tuning Guide for Nebula

April 15, 2018


Steven Yang
Prerequisites - Acoustics / Mechanics Performance

• The acoustic design guidelines are detailed in another documents. Verify


that the prototype was designed and built adequately to these guidelines
before attempting to tune our algorithms. This includes basic checks:
• Loudspeaker Enclosure: Reasonable frequency response, low distortion
at max volume, no chassis buzzing and rattling, good seal to the outside,
etc.
• Microphone seals; Reasonably frequency response with no large
resonances, good seals to the outside, consistent frequency response
from both channels, etc.
• Acoustic echo coupling: Play and record pink noise to get an idea of
echo coupling vs. frequency. If there are any regions with high echo
coupling, such as mechanical vibrations in the low frequencies or mic port
resonances in the high frequencies? Are there any obvious internal
coupling paths that should have been mitigated with better mechanical
design?
• The AEC performance is the gating factor on all designs, so the mechanics
and acoustics must be as ideal as possible for any given form factor.
Primary Factors for Good VoIP Performance

• Loudness (Level), Including Send Level at a Wide Range of Talker Distances


• Frequency Response (microphone and speaker)
• Distortion
• Delay (Send & Receive Latency)
• Echo during singletalk
• Echo during doubletalk
• Send attenuation during doubletalk
• Idle Channel Noise
• Speech Quality (Does the voice sound natural or overly-processed)
• Noise Suppression
Microphone Gains
• The front-end microphone gain must be set to avoid input signal saturation even at
worst-case playback loudness. To check this, record from the microphones while
playing signals to the speakers at max volume. Add worst-case elements such as
external reflectors between the microphone and speaker, and even try fullscale
sine sweeps in the speaker. Adjust the hardware microphone gain so that no
saturation occurs. Target signal peaks to around -3 dB fullscale. (This is why
acoustic microphone resonances reduce available headroom).
• The microphone gains in our tuning tool are ADC Boost and Dmic Gain.

© 2017 Synaptics Incorporated 4


EQ in VOIP

© 2017 Synaptics Incorporated 5


Equalizer in VOIP
Operation: Use the EQ to adjust the microphone frequency response to remove as much
low-frequency energy (below the voice band) as possible and to flatten out the frequency
response.

© 2017 Synaptics Incorporated 6


Equalizer

Parameter Explanation
type One of the five filter types, low-pass filter, high-
pass filter, peak filter, high-shelf filter, low-shelf
filter
freq_Hz The center or knee frequency of the band,
depending on the filter type

Qfactor Determines the spectral width of the filter and


its overshoot / undershoot

gain_dB The gain of the filter. Value of 0 dB is equivalent


to disabling the band if the filter type is Low
Shelf, High Shelf or Peaking.

© 2017 Synaptics Incorporated 7


AEC in VOIP

© 2017 Synaptics Incorporated 8


Multi-Channel Acoustic Echo Canceller (AEC)
• Purpose
– The Multi-channel AEC eliminates the echo component from the capture signal, so that
the user can carry out hands-free speakerphone conversations that are full-duplex and
sound natural.

© 2017 Synaptics Incorporated 9


AEC
➢ The Multi-channel AEC is composed of two components:
➢ Linear AEC
➢ FDNLP (Full-Duplex Non Linear Processing)
➢ We use this to address the residual echo on the system in which the linear AEC is having issue
converging to because either the signal is very short and does not allow enough time for the
linear AEC to converge and could also be used to address some non-linearities.
➢ Linear AEC will only reduce residual echo that is linear and will be running all the
time as long as the AEC is enabled.
➢ FDNLP will aggressively reduce residual echo while the linear AEC is converging.
The linear AEC should be converged just in time when the FDNLP will release the
signal.
➢ FDNLP cannot run independent from the linear AEC.
➢ The linear AEC will not affect the speech in the signal captured by the microphone
while the FDNLP will and that is the reason why you have to be careful when you
really need to have the FDNLP enabled and what are the drawbacks.

© 2017 Synaptics Incorporated 10


AEC Reference Channel

• The “Minimum Band AEC Reference Channel” can be set to


accommodate different frequency band regions for AEC channel
0 and channel 1. This is for 1.1 or 2.2 designs where there are
separately amplified (biamp’d) woofers and tweeters. This allow
the AEC to be optimized for each speaker. For fullband or mono
devices, leave them at fullband.
FDNLP
• FDNLP Aggressiveness - Positive values increase aggressiveness and negative
values decrease it. The parameter controls how much of the residual echo is
removed. Large positive number will aggressively remove residual echo, with
possible side effects of more near end distortion during double talk. Negative
numbers will remove less residual echo, in this case there will be no near end
speech distortion at the cost of higher residual echo. A proper parameter value
should be chosen to strike the right balance between echo residue removal and
near-end distortion for the application at hand.
• FDNLP initial Convergence - Number of bands converged for initial NLP module:
Max = 63. Number of AEC bands that should meet the criteria given by
nlp_cnvg_th before exiting the initial processing mode. When increased, more
time will be spent in the initial processing mode.
SSP in VOIP

© 2017 Synaptics Incorporated 13


Selective Source Pickup

Parameter Explanation
distance_mm Microphone Spacing in mm
center Defines the center of the beam in π (pi) radians.0
Directly in front of the array

half_width The angle between the center and the edge of the angular
region where the target is picked up.
It is expressed as a fraction of π (pi) radians.

0.5 180 degree pickup (Wide-beam mode)


0.056Focused pickup (Narrow-beam mode)

© 2017 Synaptics Incorporated 14


ERLE threshold

Operation:
• The ERLE threshold is the amount of echo return loss that the AEC is expected
to provide. Setting this value is key to balancing singletalk performance and
doubletalk performance.
• Lowering the ERLE threshold (below 20) may be required when the SER is low,
or when there are problems with residual echo bursts and leakage during
singletalk. The drawback of lowering this value is that full-duplex performance
is reduced. The near-end talker will be reduced or even gated out when the
far-end talker is talking.
• Increasing the ERLE threshold (above 20) is going to allow for full duplex
performance with less send attenuation during doubletalk. However, there will
be an increased risk of echo during doubletalk or even echo leakage and
residuals during singletalk. A product design with a high SER and no distortion
is needed to achieve higher ERLE threshold settings. Greater microphone to
speaker distances are required for higher ERLE th settings.
ERLE threshold
• The ERLE threshold setting for the SSP is now also tied to the ERLE
threshold setting for the new NoiseGate.
• The tuning tool has not yet been updated to combine the ERLE th
sliders from the SSP window and the AEC window.
Noise Reduction
Parameter Explanation
NR Aggressiveness Minimum linear gain applied by the spectral gain in the SSP.
Direct effect on the noise suppressed inside and outside the speech.
Suggested values for
ASR: between 2 and 3. It depends on the impact of the filtering in the ASR
engine.
VOIP: between 2 and 4

Soft Noise Gating Minimum linear gain applied by the temporal soft gain.

Controls the amount of noise further suppressed in frames with low


speech probability.

ASR: 0-2 depending on the ASR engine and the test conditions. (Certain
ASR engines require a steady residual noise)

VoIP: 0-4 for single speaker VoIP setting it to 0.0 is good to suppress
residual noise.

© 2017 Synaptics Incorporated 17


Noise Reduction
Operation:
• Adjust the NR Aggressiveness for the right amount of noise suppression.
The tradeoff is that the more aggressive this is set, the lower the potential
speech quality of the voice could sound less natural and more filtered.
• More aggressive SSP will also potentially slow the adjustment time to
switch from one dominant talker in the near-end room to another, and
potentially “lose” the dominant talker if they are moving around. For
conference mode, set the aggressiveness as low as possible.
• If the SSP aggressiveness is too low, then stationary background noise
might be allowed through, especially if modulated. The risk when this
happens is that the AGC will increase the output and cause a large noise
buildup.

© 2017 Synaptics Incorporated 18


Subband Post NLP in VOIP

© 2017 Synaptics Incorporated 19


Subband Post NLP
• Purpose
– The Subband Post NLP module suppresses residual echo not cancelled by AEC and
not reduced by NLP.
– It acts like a gate which open when local speech is present and close when local
speech is absent.
– Reference inputs measures ERLE to determine state.
• Dependencies
– subband Analysis and Synthesis, AEC, FDNLP.
• Parameters
– ERLE threshold.

© 2017 Synaptics Incorporated 20


Amplifier in VOIP

© 2017 Synaptics Incorporated 21


Amplifier

• The microphone gain was set in a previous step. Now, software amplifier must
be added so that the signal level is appropriate for close talkers and far talkers.
• The old way to do this before an agc (automatic gain control) is available was to
create an extreme DRC curve (see next slides) which added significant gain to
low-level input signals and also limited the output of close-talker loud signals.
Then, the post-processed “output gain” (see below) was used to set the level so
that speech peaks were normalized close to 0 dBFS with no saturation.
DRC in VOIP

© 2017 Synaptics Incorporated 23


Dynamic Range Compression

© 2017 Synaptics Incorporated 24


Dynamic Range Compression
• Operation
– Boost low-amplitude signals while preventing clipping distortion.
– Optionally it can be used also for suppression of hissing noise in the absence
of a significant sound signal.
– The DRC behaves as a signal-dependent amplifier:
– Low-level parts of movies or songs are amplified and therefore sound louder.
– High-level parts sound cleaner. However, attenuation may be needed if the
signal peaks before DRC are above 0 dB FS. Such high peaks may occur if
the signal is amplified or its peak-to-average ratio is increased by effects
processing. In such cases the DRC may cause the signal to sound softer.
– Optionally, parts with very low peaks are attenuated to suppress noise-only
intervals.

© 2017 Synaptics Incorporated 25


DRC

© 2017 Synaptics Incorporated 26


DRC commands and parameters
High Compression
Region Slope Middle Compression
Region Slope

Knee 1

Knee 2
Expansion
Region Slope

© 2017 Synaptics Incorporated 27


Automatic Gain Control
Purpose:
• applies a soft gain to compensate the input dynamic of the target speech.

Parameter Explanation
fixed_gain Set the fixed gain (in dB) applied before the adaptive AGC
gain. Needed to compensate any fixed attenuation at the
input of the DSP processing (e.g. to prevent microphone
signal saturation due to a loud echo playback)
Max_agc_gain The max gain value of AGC (in dB)

© 2017 Synaptics Incorporated 28


AGC
The agc has two parameters, agc_fixed_gain and agc_max_gain. The fixed gain
is now used instead of the “Output Volume” to set the level appropriately for a
close, loud talker. Then, the agc_max_gain is used to capture far talkers.
• Set the agc_max_gain to 0.
• Set the playback level to 100%. Play some audio signal, such as music. The
signal must be normalized to full scale so that we are producing the maximum
loudness from the loudspeakers.
• Play the close-talker signal and adjust the agc_fixed_gain until the peaks are
normalized close to 0 dBfs with no saturation.
• Enable the AGC and verify that the output level is consistent over several mic -
to-speaker distances. The agc_max_gain needs to be set according to the
expected distance range. You can use the following magic formula:
agc_max_gain =log2(max_distance/min_distance)*6 dB
where max_distance and min_distance is the maximum (reasonable) an
minimum distance at which we expect to use the system (in meters).
• Avoid excessive agc_max_gain, since the potential for unwanted side-effects
(noise buildup in a quiet room with occasional non-stationary noise events) is
greater. Target <17 dB if possible.
• These controls will be added to the tuning GUI soon.
Subjective System Testing

• For subjective testing, you must be able to place a VoIP call from the DUT in one
test room to a reference system “far-end” user in another room.
• The test rooms must be completely isolated, or adverse echo coupling through
the building can occur. Next door rooms typically are not far enough apart,
unless one of them is a sealed chamber.
• The reference user in the far end should use a USB headset for a highly
controlled, full-duplex, and ideal acoustic system first.
• Later, you can test two DUT’s in separate rooms for system stability issues that
can be triggered with a speakerphone on both sides.
• Try to choose rooms and setup configurations that are representative of the
product requirements. (Conference room size, for example).
Sample Subjective Testing Scorecard

Test Score definitions scale:


Test scoring w ill be done on a scale
of 1 to 5, w ith a score of 1 being BAD
and 5 being GOOD

5: No detectable flaw s in the


observed metric (Pass)

4: Some minor flaw s detectable by an


observant user (Pass)

3: Flaw s detectable to the casual


user (Possible Fail further review
required)

2: Serious flaw s making the call


difficult to continue (Fail)

1: Very serious flaw s preventing the


completion of the call (Fail)
© 2017 Synaptics Incorporated

Synaptics, the Synaptics logo, TouchPad, ClickPad, SecurePad, ClearPad, ClearView, Synaptics TouchView, Natural ID,
Confidential
ClearForce, SentryPoint, AudioSmart, VideoSmart, ImagingSmart, Design Studio, Image Studio and SafeSense are trademarks
or registered trademarks of Synaptics Incorporated or its affiliates in the United States and/or other countries. All other brands
and names may be trademarks of their respective owners.

You might also like