Manual VQ108 Scope - Eng

VQ108 Scope
Speech Quality Analysis
Operating Instructions
No part of the handbook or application may be reproduced in any form (print, photocopy, microfilm or any
other way), or manipulated, duplicated or distributed electronically without our prior written permission. We
hereby point out that all descriptions and brand names used in this book are generally subject to brand mark,
trademark and patent protection of the respective companies.
Stand: 11/2007
Software-Version: 1.0.1.1
INDEX
1 Introduction.............................................................................................................................................................. 4
2 Compensation........................................................................................................................................................... 5
2.1. IRS Receive Filtering ........................................................................................................................................ 5
2.2. Calculation of the Active Speech Time Interval ................................................................................................ 5
2.3. Time-Frequency Decomposition, Time Axis Modification ................................................................................ 5
2.4. Calculation of the Pitch Power Densities ......................................................................................................... 5
2.5. Compensation of Linear Frequency Response .................................................................................................. 5
2.6. Compensation of the Time Varying Gain .......................................................................................................... 6
2.7. Calculation of the Loudness Densities .............................................................................................................. 6
2.8. Calculation of the Disturbance Density ............................................................................................................ 6
2.9. Modelling of the Asymmetrical Effect ............................................................................................................... 6
2.10. Aggregation of the Disturbance Densities over Frequency and Silent Interval Processing ............................. 6
2.11. Realignment of Bad Intervals ............................................................................................................................ 6
2.12. Aggregation of the Disturbances over Time...................................................................................................... 7
2.13. Computation of the PESQ Score ....................................................................................................................... 7
3 Speech Samples ........................................................................................................................................................ 7
4 The “VQ108 Scope“ Software ................................................................................................................................. 8

4.1 System requirements.......................................................................................................................................... 8
4.2 The Program User Interface ............................................................................................................................. 8
Reading In Speech Signal Files ................................................................................................................................. 9
Graphical output of signals ........................................................................................................................................ 9
PESQ Calculation ...................................................................................................................................................... 9
Language Selection.................................................................................................................................................. 11
Help.......................................................................................................................................................................... 12
5 Active Measurement with the „SQ2500“ ....................................................................................................... 12

5.1 System Requirements....................................................................................................................................... 12
5.2 General Information........................................................................................................................................ 13
5.3 The SQ2500 (Speech Quality Test Box) .................................................................................................... 14
5.4 The Function „SQ2500-Test“ of the Software VQ108 Scope...................................................................... 15
5.4.1 Configuration.............................................................................................................................................. 16
5.5 The Program SQ2500 Remote.............................................................................................................. 21
6 Appendix................................................................................................................................................................. 22
A - Index ....................................................................................................................................................................... 22
1 Introduction
Perceptual Evaluation of Speech Quality (PESQ) is a method of objective speech quality evaluation in
telephony in the frequency range 0.3-3400Hz. PESQ is described in ITU recommendation Q.862 and is
based on real conditions for end-to-end speech communication. Amongst those factors taken into account
are packet loss, noise and the audio codec used.
PESQ returns an evaluation of speech quality in the range -0.5 to 4.5. Values nearer to 0.5 mean very bad
speech quality, while values nearer 4.5 mean very good speech quality. Return values lie between 1 and 4.5
in most cases. This is rather surprising initially since the ITU scale for MOS extends up to 5. But the
explanation is surprisingly simple. PESQ simulates a hearing test and is optimised to reproduce the average
result of all listeners. Statistics prove that the best average result one can generally expect from an
audiometry test is not 5 but rather around 4.5. It would appear that testers are always rather cautious about
giving a sample a rating of 5. Even if there is no degradation.
PESQ Value MOS Value Speech quality

4.5 5 excellent
4 4 good
3 3 fair
2 2 poor
1 1 bad
Table 1: Evaluation of Speech Quality
Diagram 1: Model of the PESQ Algorithm
As shown in Diagram 1, a reference signal and a degraded signal are input into the PESQ analysis system.
Both of these signals must be of type *.wav with a sampling rate of 8 kHz and a depth of 16 bit linear PCM. A
substantial amount of comparisons and calculations are performed in the PESQ system – explaining why
calculation of the results can take some time. The result delivered is the PESQ value, described above, as
well as a number of additional parameters which define speech quality.
Many of the stages in the PESQ algorithm are extremely complex and hence a description of the process in
not straightforward. For this reason the algorithm is described just briefly. Please refer to ITU-T
recommendation P.862 for more detailed information.
VQ108 Scope -4-

2 Compensation
The effect of the equipment under test on the speech signal is not known. It can vary substantially depending
on whether a LAN, ISDN or an analogue 2-wire connection is used for data capture. The original signal is
also not on a constant level. It therefore becomes necessary to align the original signal x(t) and the degraded
output signal y(t) to the same constant level. PESQ takes a constant level of around 79dB SPL (in
accordance with ITU-T recommendation P.830) and changes the signals to bring them into line with this
level. Besides a level alignment in the time domain, it is also necessary to align the level in the frequency
domain. This is carried out by generating a sine wave with a frequency of 1kHz and an amplitude of 40dB
SPL. This sine wave is transformed to the frequency domain using a windowed FFT with 32ms frame length.
After converting the frequency axis to a modified Bark scale, the peak amplitude of the resulting pitch power
density is then normalised with a power scaling factor Sp.
2.1. IRS Receive Filtering
It is assumed that listening is carried out using a handset with a frequency response that follows an IRS
receive or a modified IRS receive characteristic. A perceptual model of the human evaluation of speech
quality must take account of this for the actual signals to be processed. Therefore IRS like receive filtered
versions of the original speech signal and the degraded speech signal are computed. In PESQ, this is
implemented by an FFT over the entire length of the speech file, resulting in the filtered signals XIRSS(t) and
YIRSS(t).
2.2. Calculation of the Active Speech Time Interval
If the original and degraded output speech file start or end with large silent intervals, this could influence the
computation of certain average distortion values over the files. Therefore, an estimate is made of the silent
parts at the beginning and end of these files. The sum of five successive absolute sample values must
exceed 500 from the beginning and end of the original speech file in order for that position to be considered
as the start or end of the active interval. The interval between this start and end is defined as the active
speech time interval.
2.3. Time-Frequency Decomposition, Time Axis Modification
The human ear performs a time-frequency transformation. In PESQ this is modelled by a short term FFT with
a Hanning window over 32ms frames. The overlap between successive frames is 50%. The power spectra -
the sum of the squared real and imaginary parts of the complex FFT components - for both signals are
stored separately in real valued arrays. Phase information within a single frame is discarded in PESQ. All
calculations are based on only the power representations PXWIRRSS(f)n and PYWIRRSS(f)n. The start points of
the frames in the degraded speech signal are shifted over a delay. The time axis of the original speech
signal is offset to the left. If the delay increases, parts of the degraded speech signal are omitted from the
process, whilst for decreases in the delay parts of the degraded signal are repeated. This time-axis
modification gave best results in terms of correlation with the subjectively perceived speech quality.
2.4. Calculation of the Pitch Power Densities
The Bark scale reflects that, at low frequencies, the human hearing has a finer frequency resolution than at
higher frequencies. This is implemented by binning FFT bands and summing the corresponding FFT bands
with a normalisation of the summed parts. The resulting signals are known as the “pitch power densities“,
PPXWIRSS(f)n and PPYWIRRSS(f)n.
2.5. Compensation of Linear Frequency Response
To deal with filtering in the system under test, the power spectra of the original and degraded speech signal
are averaged over time. This average is calculated over speech active frames only using time-frequency
cells whose power is more than 30dB above the absolute hearing threshold level (limit of the sound level).
Per modified Bark bin, a partial compensation factor is calculated from the ratio of the degraded spectrum to
the original spectrum. The maximum compensation never exceeds 20dB. The original pitch power density,
PPXWIRSS(f)n, of every frame n is then multiplied with this partial compensation factor to equalise the original
to the degraded signal. This results in a filtered version of the original pitch power density, PPX’WIRSS(f)n.
VQ108 Scope -5-

2.6. Compensation of the Time Varying Gain
Short-term variations are partially compensated by processing the pitch power densities frame by frame. For
the original and the degraded pitch power densities, the sum in each frame of all values that exceed the
absolute heating threshold is computed. The ratio of the power in the original and the degraded speech files
is calculated and bounded to the range 3x10-4 to 5. A first order low pass filter is used for the calculation. The
time constant of this filter is approximately 16ms. The distorted pitch power density in each frame is then
multiplied by this ratio, resulting in the distorted pitch power density, PPX’WIRSS(f)n, partially gain
compensated.
2.7. Calculation of the Loudness Densities
After partial compensation for filtering and short-term gain variations, the original and degraded pitch power
densities are transformed to a Sone loudness scale using Zwicker’s law. The resulting two-dimensional
arrays, LX(f)n and LY(f)n , are called loudness densities.
2.8. Calculation of the Disturbance Density
The signed difference between the distorted and original loudness densities is computed. When this
difference is positive, components such as noise have been added to the original signal. When this
difference is negative, components have been omitted from the original speech signal. The difference array
is called the raw disturbance density. The net effect is that the raw differences are pulled towards zero. This
represents a dead zone before an actual time-frequency cell is perceived as being distorted. This models the
process of small differences being inaudible in the presence of loud signals in each time-frequency cell. The
result is a disturbance density, D(f)n, as a function of time and frequency.
2.9. Modelling of the Asymmetrical Effect
The asymmetrical effect is caused by the fact that when a codec distorts the input signal, it will in general be
very difficult to introduce a new time-frequency component that integrates with the input signal, and the
resulting output signal will thus be decomposed into two different percepts, the input signal and the
distortion, leading to clearly audible distortion. When the codec leaves out a time-frequency component, the
resulting signal can not be decomposed in the same way and the distortion is less objectionable. This effect
is modeled by calculating an asymmetrical disturbance density DA(f)n per frame by multiplication of the
disturbance density D(f)n with an asymmetric factor. The asymmetric factor equals the ratio of the distorted
and original pitch power densities raised to the power of 1.2. If the factor is less than 3, it is set to 0. It if
exceeds 12, it is clipped at this value.
2.10. Aggregation of the Disturbance Densities over Frequency and Silent

Interval Processing
The disturbance density, D(f)n, and asymmetric disturbance density, DA(f)n, are summed along the frequency
axis using two different Lp norms and a weighting on soft frames (having low loudness) . After this
multiplication, the frame disturbance values are limited to a maximum of 45. If the distorted signal contains a
decrease in the delay larger than 16ms, the repeat strategy from the time-frequency decomposition is re-
applied. The resulting frame disturbances are called D’n and DA’n.
2.11. Realignment of Bad Intervals
Consecutive frames with a frame disturbance above a threshold are called bad intervals. In a minority of
cases, the objective measure predicts large distortions over a minimum number of bad frames due to
incorrect time delays observed by the preprocessing. In this case a new delay value is estimated by locating
the maximum of the cross-correlation between the absolute original speech signal and the absolute
degraded speech signal pre-compensated with the delays observed by the preprocessing.
When the maximum value is below a certain threshold, it is concluded that the interval has been
compensated. This interval is then no longer considered “bad”. If the maximum value is not below the
threshold, the process is repeated. The result is the final frame disturbances, D’’n and DA’’n and are used to
calculate the perceived overall speech quality.
VQ108 Scope -6-

2.12. Aggregation of the Disturbances over Time
First the frame disturbances are aggregated over split second intervals. Next the split second intervals are
aggregated over the complete active time interval. For the split second time aggregation, the frame
disturbance values and the asymmetric frame disturbance values to L6 are aggregated over 20 frames.
These split second intervals also overlap 50% and no window function is used. An L2 norm is used over the
speech file length. The values of the active intervals of the speech files are aggregated using the L2 norm.
2.13. Computation of the PESQ Score
The final PESQ score is a linear combination of the average disturbance value and the average
asymmetrical disturbance value. This linear combination was optimized on a large set of subjective
experiments.
3 Speech Samples
A speech sample, conforming to the ITU-T P.839 standard (PESQ), must be sent. According to the
recommendation, the speech sample should consist of simple, short and meaningful sentences and should
be selected so as to be easily understood. The sentences should also be divided into two or three sections
with silent time intervals. These sections should be created in such a way that there are no associations
between the individual sentences within a section. Avoid very short and very long sentences. The aim is for
each spoken sentence to have a duration of 2-3 seconds. The speech samples should be 8-12 seconds in
length. Between 40 and 80% of the time should be speech. If long time intervals are to be tested, it is
reasonable to create several separate recordings of around 8-20 seconds.
When recording speech samples, the speakers should be in a room with a reverberation period of less than
500ms and a room noise level below 30dBA. High quality recording systems should be used for the
recordings. The speakers should articulate the sentences fluently but not dramatically and maintain a
constant, comfortable speech level, avoiding noises such as paper rustling. During the recording, the active
speech level should be between 20dB and 30dB and should be monitored. Every sentence which does not
attain this level should be re-recorded.
The speech sample should be processed such that it is suitable for the “System Under Test“ (SUT). Further
distortion by unnecessary quantisation, amplitude limitation or renewed sampling is to be avoided. The
preferred format for the saving of the original speech sample and the degraded speech sample is a sampling
rate of 8 kHz, 16 bit linear PCM.
Several pre-recorded speech samples, in *.wav format, are included with this software which may be used as
references when sending and comparing.
VQ108 Scope -7-

4 The “VQ108 Scope“ Software
4.1 System requirements
Minimum:
• 1GHz PC
• 256 MByte memory
• CD-ROM drive
• Windows 2000
• 500 MByte free disc space
• Screen resolution of min. 1024x768
4.2 The Program User Interface
[1]
[8]
[6] [7]
[10]
[9]
[3]
[4]
[2]
[5]
Diagram 2: User Interface of the VQ108_Scope Program
The VQ108 Scope software for speech quality analysis has a clearly laid out, intuitive and user-friendly
dialogue based graphical user interface from which all functions can be called.
VQ108 Scope -8-

Reading In Speech Signal Files
The reference signals and degraded signals described above (wave files, 8 kHz sampling rate, 16 bit linear
PCM) are read into the system in the upper left-hand area of the user interface [1]. Select “Open“ to read in
speech signal files. This opens up the default Windows dialogue from where you can select the files. The
speech sample over time appears in the window in the lower half of the user interface once a valid Wave file
has been loaded. The reference signal is shown in blue, the degraded signal in red. When selected two files
(reference file and degraded file), ensure that they do actually belong together (i.e. the reference file has
been sent and the degraded file has been received at the end of the measuring section).
Diagram 3 shows an error message which is displayed if the two files read in do not belong together or the
transmission quality is so bad that no correlation at all can be detected.
Diagram 3: PESQ Calculation Error Message
Select the “Listen“ button to output the signals to the sound card so as to subjectively assess the speech
quality.
Graphical output of signals
The speech samples loaded are shown graphically in the lower part of the screen [2]. The signals can be
compared with the naked eye (see Delays, Distortions and Level differences.
Signals may be displayed as “linear“, as “level“ or as a “spectrum“. Switch between the three by changing
the mode on the right of the display area [3]. Use [4] to change the resolution of the time axis (X-axis). With
the resolution set to 10s, the signals are usually visible in their entirety, albeit very imprecise since not all
speech samples can be displayed. With smaller resolutions (5s, 1s and 0.1s) the signals are not visible in
their entirety in one screen, but the accuracy of the display is greater.
The reference and degraded signals can be turned on and off by selecting the boxes in [5]. This is useful
when just one of the signals is to be analysed in detail without being hindered by the other.
PESQ Calculation
Once the reference and degraded signals have been read in, the automatic, objective speech quality
analysis can be started by pressing the “Calculate PESQ“ button [6]. This may take some time as
comprehensive and complex calculations are involved.
Once the calculation is complete, the most important value, the “PESQ value“ is displayed in large, bold
numbers [7]. The colour of the value is dependent on the value. Good results are displayed in green,
medium results are displayed in yellow and bad results in red, allowing the user to gain at once an indication
of the speech quality.
All calculable parameters are displayed in detail in the results window [8]. Select “Show Advanced Results“
[9] to show these values all together in a separate window (Diagram 4).
VQ108 Scope -9-

Diagram 4: Display of the Advanced Results from the PESQ Calculation
These values may be printed out by selecting “Print results“ [10].

The following pages provide a brief description on the individual values returned.
Values derived from psychoacoustic model and PESQ utterance detector:

PESQ Score (P.862) - PESQ value in accordance with ITU-T P.862
MOS-LQO (P.862.1) - PESQ value on the MOS scale in accordance with P.862.1
PESQ Score (P.862) Noise - PESQ MOS inbetween speech (both signals silent)
PESQ Score (P.862) Speech - PESQ MOS during speech (both signals speech)
MOS (P.800) - (reworked) MOS – in P800
MOS Lq - (reworked) LQ MOS – on LQ scale
G.107 Rating - E-model / R-rating
Max Delay - Maximum delay in ms (±5 ms)
Avg Delay - Average delay in ms (±5 ms)
Min Delay - Minimum delay in ms (±5 ms)
Attenuation - Damping in dB
Background Noise - Loudness of the degraded signal during silent intervals in Sone
Reference Level - Loudness of the reference signals during speech in Sone
Degraded Level - Loudness of the degraded signal during speech in Sone
Values derived from PESQ utterance detector after IRS filter and level alignment:
Snr Speech - SNR between speech
Snr Silent - SNR between silence
Mean Utterance Correlation - Mean Utterance Correlation
Reference Length - Reference Length

Reference Active size - Reference Active size
VQ108 Scope -10-

Reference Silence at Start - Reference Silent at Start
Reference Silence at End - Reference Silence at End
Reference Speech Activity - Reference Speech Activity
Degraded Length - Length of the degraded signal

Degraded Active size - Active size of the degraded signal
Degraded Silence at Start - Silence at Start of the degraded signal
Degraded Silence at End - Silence at End of the degraded signal
Degraded Speech Activity - Speech Activity of the degraded signal
Standard deviation - Standard deviation
Values derived from modified PESQ utterance detector:

Nr Utterances - Number of utterances found in the reference signal
FEC - Vector with “front end clipping” – time for each utterance in ms
HOT - Vector with “hold over” – time for each utterance in ms
Start Ref: - Vector with start time of each utterance of the reference signal in ms
End Ref - Vector with end time of each utterance of the reference signal in ms
Nr Utterances Deg Aligned - Number of utterances found in the aligned degraded signal
StartDegAligned - Vector with start time of each utterance of the aligned, degraded signal in
ms
EndDegAligned - Vector with end time of each utterance of the aligned, degraded signal in
ms
Values derived from raw, unfiltered input signals:

Total Reference Level - Level of the reference file in dBov
Total Degraded Level - Level of the degraded file in dBov
Reference Background Noise - Level of the background noise in the reference signal in dBov
Degraded Background Noise - Level of the background noise in the degraded signal in dBov
Reference Speech Level - Level of speech in the reference signal in dBov
Degraded Speech Level - Level of speech in the degraded signal in dBov
Nr Clippings - Number of Clippings
Start Clipping Deg - Start of clipping of the degraded signal
Stop Clipping Deg - End of clipping of the degraded signal
Language Selection
The user can select the language used for the interface by pressing the “Select Language“ button. (see
Diagram 5).
This button is not available if VQ108 Scope was started from the VQ108 Analyser analysis software. The
user interface of the VQ108 Scope software then adjusts automatically to the language selected in VQ108
Analyser.
Diagram 5: “Select Language“ Dialogue
VQ108 Scope -11-

Help
Press the “Help” button, if you require further information on the calculation of PESQ parameters or help on
the VQ108 Scope software.
5 Active Measurement with the „SQ2500“

To use this feature a SQ2500 (Speech Quality Test Box) is required. Compared to the passive
analysis, it is possible to use this feature for an active speech quality test to get a PESQ Score. For a
Speech Quality Analysis with the SQ2500 the measurement equipment D2500 or D2000Pro build
by the company Aethra is necessary to build up a connection (ISDN, analog, etc.).
5.1 System Requirements
- PC with min. 1 GHz CPU
- Windows XP
- 256 MB RAM
- USB 2.0 or 1.1
The SQ2500 package includes:

- SQ2500 (Speech Quality Test Box)
- USB Cable
- Connection Cable between SQ2500 and end device
- SQ2500 Remote Software
Warning: The warranty expires if the device is opened or is used in another way as described!
Before using the device read the manual.
VQ108 Scope -12-

5.2 General Information
The measurement system gives the possibility to do an End-to-End-Speech Quality analysis in voice
transmission systems. For it, a standardized Speech sample will be fed into the end device. The incoming
signal will be intercepted and saved in the measurement system. The received Speech sample and the send
one will be compared and an automatic Speech Quality analysis starts. PESQ (ITU-T P.862) The result
reflects a placement on the MOS-Chart from 1 to 5 (1 - bad, 2 - reasonable, 3 - fair, 4 - good, 5 - excellent).
Note
- The „Speech Quality Test Box“ (SQ2500) includes a high quality USB-Audio-Adapter, never
use or install another generic USB-Audio-Adapter
- Never install generic device drivers for the USB-Audio-Adapter
- No other running application should use the soundcard during a measurement. If it is possible, close
all other running applications
- The given PESQ-Values have a inaccuracy of as far as 0.05
- Because of a little quality loss produced by the box, an additional 0.01 per direction is added to the
actual shown PESQ-Value
VQ108 Scope -13-

5.3 The SQ2500 (Speech Quality Test Box)
Following connections and indicators are found on the front of the SQ2500:
Power/Busy
Standard D2500 USB

[3] [1] [2]
Diagram 5-1: Front of SQ2500
[1] Power/Busy – LED: Steady green while powered on and flashes while sending and receiving
signals.
[2] USB-Connector: Connection between Box and PC. The SQ2500 is automatically
recognized by Windows XP, never install device drivers!!!
[3] Handset Connector: Connection between SQ2500 and the Handset-Connector of the
end device. There are two jacks that are differently wired:
On the left it is wired TX/RX/RX/TX (Standard)
On the right it is wired TX/RX/TX/RX (D2500).
Depending on your Handset-Connector of the end device use one of the
jacks.
For the D2500/D2000Pro by Aethra use the jack on the right side.
VQ108 Scope -14-

5.4 The Function „SQ2500-Test“ of the Software VQ108 Scope
Diagram 5-2: PESQScope Main User Interface
To make an active measurement with the SQ2500, first start the Software VQ108 Scope. In
the Main-User interface hit the button „SQ2500 Test“ in the down left. That opens a dialog as seen as in
figureDiagram 5-3.
VQ108 Scope -15-

[1]
[2]
[4]
[3]
[5]
[6]
[7]
Diagram 5-3: Active Measurement Dialog with SQ2500 [8]
[1] Options Choose the kind of measurement while in Master-Mode.

[2] Number of Measurements Enter the number of measurements that will send and receive a
speech sample. Since the PESQ-Values are sometimes
fluctuating, it is recommended to make a measurement at least
10 times.
[3] START / STOP The measurement will be started or stopped. If stopped before
the end of a measurement, no values will be evaluated. A regular
measurement will stop automatically after the given number of
measurements.
[4] Result The actual results of the measurement will be shown here.
Furthermore it is used to show the status.
[5] PESQ-Score After a successful run the average of all measurements is shown
in this field.
[6] Print Result The results of the actual measurement shown in [3] will be sent
to your printer.
[7] Print in File ... or send to a text file.
[8] Close Closes the application
5.4.1 Configuration
Please choose under „Config->Audio“ the Mode „Extern 200..3400Hz“ for all measurements with the
SQ2500-Box connected to the D2500/D2000Pro.
a1) Measurement with the SQ2500 and Loopbox-Function
[S]
VQ108 Scope -16-

[M]
- On the Master-Side [M] connect the SQ2500 to your PC via USB.

- Connect the Handset Connector of the end device (D2500/D2000Pro) with the Box.
- On the Slave-Side [S] connect the end device (D2500/D2000Pro) with Loopbox - Function, so
received signals are mirrored and sent back.
- Build up a connection between both end devices (D2500/D2000Pro).
VQ108 Scope -17-

On Master-Side [M]
- Start the application VQ108 Scope and close all other running applications if possible.
Please be sure that no other application uses the soundcard.
- In the Main-User interface of VQ108 Scope hit the button „SQ2500 Test“ in the
down left (see figureDiagram 5-2).
- Choose the mode „Master“ and the first Option „Remote Station in Loopbox Mode“
- Choose the number of measurements.
- Hit „Start“.
After the start the status of the running measurements is shown in the result field. At the end of the
entered number of measurements the results of each measurement, the number of valid and invalid
measurements, the minimum and maximum values and the average are given in the same field
Additionally the average given in the PESQ-field will be colored, whereas a green value a good, a yellow
an average and a red colored value a bad indication of the average is.
a2) Measurement with SQ2500 and TraceSim VoIP
[S]
[M]
- On the Master-Side [M] connect the SQ2500 to your PC via USB.

- Connect the Handset Connector of the end device (D2500/D2000Pro) with the Box.
- On the Slave-Side [S] connect the PC with the Aethra-VoIP-Simulation software VQ108
Simulator in VQ-Receiver mode (A detailed description for the VQ108
Simulator is found in the VQ108_XP_Sim - Manual)
- Build up a connection between the D2500/D2000Pro and the VQ108 Simulator.
VQ108 Scope -18-

On Master-Side [M]
- Choose the mode „Master“ and the first Option „Remote Station in Loopbox Mode“
- Choose the number of measurements.
- Hit „Start“.
After the start the status of the running measurements is shown in the result field. At the end of the
entered number of measurements the results of each measurement, the number of valid and invalid
measurements, the minimum and maximum values and the average are given in the same field
Additionally the average given in the PESQ-field will be colored, whereas a green value a good, a yellow
an average and a red colored value a bad indication of the average.
b) Measurement with two SQ2500
[M] [S]
- Connect the SQ2500-Box on both sides to your PS’s via USB.

- Connect the Handset connectors of the end devices (D2500/D2000Pro) with the SQ2500-
Boxes.
- Build up a connection between the both end devices (D2500/D2000Pro).
On the Slave-Side [S]

- Choose the Mode „Slave“.
- Hit „Start“. The Remote-System is ready for the tests until you hit „Stop“.
VQ108 Scope -19-

On the Master-Side [M]
- Choose the „Master“-mode.
- Choose on of the following options:
„Remote station sends reference-signal“,

if only the backchannel is used to determine the speech quality, the „Slave“ sends a
reference-speech sample that the „Master“ saves and evaluates.
„Remote station receives reference-signal and sends it back“, if both directions are
supposed to be part of the measurement. The „Master“ sends the reference-speech
sample. The „Slave“ saves it and sends it back, if the „Master“ asks for it.
- Choose the number of measurements

- Hit „Start“. The system sends a DTMF-sound over the send-line. The speech sample will be send to
the „Slave“ and returned to the „Master“ or the „Slave“ sends his speech sample directly to the
„Master“ like chosen in the options. The „Master receives the lessened signal and saves it for the
later PESQ-Evaluation.
After the measurement all results will be shown in the result field like described earlier (figure 5-4).
Diagram 5-4: Results of a Measurement
The results of a successful measurement can be printed out by the function „Print results“ or saved in a text
file with „Print in File“.
VQ108 Scope -20-

The specific lessened speech samples are saved automatically into the applications folder. That enables you
to do additional analysis with the VQ108 Scope - Main application. Remember that the evaluated
PESQ-values of the Main application are a little bit lower as the results of the SQ2500-Test
application, because the quality loss in the box is compensated. The send reference-speech sample has the
name OR105.wav“, the lessened signal of the first measurement run is found under „PesqProbe1.wav“,
„PesqProbe2.wav“ fort he second, etc. Please note that with every new measurement the Wave-Files
(PESQ-Samples) will be overwritten.
5.5 The Program SQ2500 Remote

The included program can be used as a Remote station („Slave“) for the „Master“-Side instead of the
VQ108 Scope. Implemented is only the „Slave“-functionality. It can not initiate or evaluate tests. By
using the SQ2500 Remote software at the remote-side no licenses are necessary (Software
license, PESQ-License etc.). Only the „Master“-application needs a license.
The handling is pretty easy. After connecting the SQ2500 to the end device (D2500/D2000Pro) and
the PC, open the program and hit „Start“. The remote-system is now ready for measurements until you hit
„Stop“. No further settings are necessary because that will be handled by the „Master“-side.
Diagram 5-5: User interface SQ2500 Remote
VQ108 Scope -21-

6 Appendix
A - Index
A
Active Measurement........................................................................................................................................................................12, 16
B
Bad Intervals ...........................................................................................................................................................................................6
C
codec ..................................................................................................................................................................................................4, 6
compensation factor .............................................................................................................................................................................5
D
deadzone ...............................................................................................................................................................................................6
Disturbance Density ................................................................................................................................................................................6
duration ..................................................................................................................................................................................................7
E
evaluation...........................................................................................................................................................................................4, 5
F
frequency .......................................................................................................................................................................................4, 5, 6
G
Graphical output of signals......................................................................................................................................................................9
H
Handset Connector.................................................................................................................................................................14, 17, 18
I
invalid measurements ..................................................................................................................................................................18, 19
L
Language Selection ...............................................................................................................................................................................11
VQ108 Scope -22-

M
MOS............................................................................................................................................................................................4, 10, 13
N
number of measurements................................................................................................................................................16, 18, 19, 20
P
PESQ.................................................................................................................................... 4, 5, 7, 9, 10, 11, 12, 13, 16, 18, 19, 20, 21
Print in File ...........................................................................................................................................................................................20
R
Remote-System ..................................................................................................................................................................................19
S
speech quality ........................................................................................................................................................... 4, 5, 6, 8, 9, 12, 20
Speech Quality Test Box .......................................................................................................................................................12, 13, 14
Speech Samples.......................................................................................................................................................................................7
speech signal.................................................................................................................................................................................5, 6, 9
System requirements ...............................................................................................................................................................................8
U
USB-Connector ...................................................................................................................................................................................14
User Interface ....................................................................................................................................................................................8, 15
utterance detector ......................................................................................................................................................................10, 11
V
VQ-Receiver ........................................................................................................................................................................................18
W
warranty................................................................................................................................................................................................12
Z
Zwicker’s law .........................................................................................................................................................................................6
Changes and errors excepted
VQ108 Scope -23-

Manual VQ108 Scope - Eng

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Manual VQ108 Scope - Eng

Uploaded by

Copyright:

Available Formats

VQ108 Scope

Speech Quality Analysis

3 Speech Samples ........................................................................................................................................................ 7

4 The “VQ108 Scope“ Software ................................................................................................................................. 8

5 Active Measurement with the „SQ2500“ ....................................................................................................... 12

PESQ Value MOS Value Speech quality

Table 1: Evaluation of Speech Quality

Diagram 1: Model of the PESQ Algorithm

VQ108 Scope -4-

2.1. IRS Receive Filtering

2.2. Calculation of the Active Speech Time Interval

2.3. Time-Frequency Decomposition, Time Axis Modification

2.4. Calculation of the Pitch Power Densities

2.5. Compensation of Linear Frequency Response

VQ108 Scope -5-

2.7. Calculation of the Loudness Densities

2.8. Calculation of the Disturbance Density

2.9. Modelling of the Asymmetrical Effect

2.10. Aggregation of the Disturbance Densities over Frequency and Silent

2.11. Realignment of Bad Intervals

VQ108 Scope -6-

2.13. Computation of the PESQ Score

VQ108 Scope -7-

4.1 System requirements

• 256 MByte memory

• 500 MByte free disc space

• Screen resolution of min. 1024x768

4.2 The Program User Interface

Diagram 2: User Interface of the VQ108_Scope Program

VQ108 Scope -8-

Diagram 3: PESQ Calculation Error Message

Graphical output of signals

VQ108 Scope -9-

These values may be printed out by selecting “Print results“ [10].

Values derived from psychoacoustic model and PESQ utterance detector:

Mean Utterance Correlation - Mean Utterance Correlation

Reference Length - Reference Length

VQ108 Scope -10-

Degraded Length - Length of the degraded signal

Values derived from modified PESQ utterance detector:

Values derived from raw, unfiltered input signals:

Diagram 5: “Select Language“ Dialogue

VQ108 Scope -11-

5 Active Measurement with the „SQ2500“

5.1 System Requirements

- PC with min. 1 GHz CPU

- USB 2.0 or 1.1

The SQ2500 package includes:

VQ108 Scope -12-

VQ108 Scope -13-

Standard D2500 USB

Diagram 5-1: Front of SQ2500

VQ108 Scope -14-

Diagram 5-2: PESQScope Main User Interface

VQ108 Scope -15-

Diagram 5-3: Active Measurement Dialog with SQ2500 [8]

[1] Options Choose the kind of measurement while in Master-Mode.

VQ108 Scope -16-

- On the Master-Side [M] connect the SQ2500 to your PC via USB.

VQ108 Scope -17-

a2) Measurement with SQ2500 and TraceSim VoIP

- On the Master-Side [M] connect the SQ2500 to your PC via USB.

VQ108 Scope -18-

„Remote station sends reference-signal“,