5$',2 /,1.

3$5$0(7(5 %$6(' 63((&+ 48$/,7< ,1'(;  64,
$ .DUOVVRQ * +HLNNLOl7 % 0LQGH 0 1RUGOXQG % 7LPXV 1 :LUpQ Voice Processing and Radio Network Research Ericsson R & D Ericsson Erisoft AB Box 920, SE-971 28 LULEÅ, SWEDEN *SE-932 83 SKELLEFTEHAMN, SWEDEN

$%675$&7
Measurement of cellular speech quality has applications from equipment installation to daily network maintenance and benchmarking. The area is under development, driven by cost and leadtime of subjective listening tests. Lately, field tests has shown usability for objective speech quality methods. The new SQI-measure, based on radio link parameters is one of these methods. It is independent of transmitted signal and can provide better performance than PSQM. Additional information available at: www.ericsson.se/BR/RQIS In this paper, we will describe the SQI measure, categorize speech quality measurement methods to highlight differences between measures, present performance comparison figures, and finally motivate why speech quality is possible to estimate given the radio link status.

Different operators have solved this measurement problem in different ways. Some rely on subjective opinions of listeners, despite of shortcomings such as long processing time, cost, sometimes lack of accuracy. Others use the objective methods with their limited scope, equipment need and also sometimes lack of accuracy. 

64,  63((&+ 48$/,7< ,1'(;

SQI is a new objective speech quality measure for cellular use based on radio link parameters. These parameters quantizes the amount and type of distortion that the speech has been subjected to during radio link transmission. No special measurement signal needs to be transmitted, every call can potentially be monitored. Equipment required for the method is limited to the receiving side of a link only. The score is presented in the absolute and standardized dBQ scale[3]. A simplified description of a cellular speech path can consist of five blocks, Figure 1. The first consisting of a speech codec compressing the speech, followed by a channel coder providing redundancy before transmission. A channel is distorting the coded bitstream. On the receiver side a channel decoder uses the redundancy in the bitstream trying to correct distortions or at least detect corruption. Finally a speech decoder decompresses the bitstream back to audible speech. Due to how the channel distorts the channel-bitstream and how capable the channel and speech decoders with error concealment unit are in producing correct speech, different levels of distortion is perceived by the listener. Channel Errors Radiolink Parameters Channel Speech Decoder Decoder Speech Quality 

,1752'8&7,21

The needs of a cellular operator includes net tuning, acceptance test of delivered equipment, daily supervision, benchmarking both within own subnets as well as towards other operators, locating problems when performance degrades and more. As markets open up and competition increases, the importance of both being able to provide and to communicate a high service level towards the customers also increases. As cellular customer, speech quality is an important parameter when choosing operator. Especially when accessibility and services provided are comparable. This proposes speech quality as the performance criteria which an operator should use to solve all of their above stated needs.

Original Speech

Speech Channel Coder Coder

+

Received Speech

)LJXUH . Speech path model with SQI extraction.

Radiolink Parameters

Temporal Processing

Correlative Processing + Nonlinear Transformation

Estimation Stage

SQI

)LJXUH  Schematics of the SQI model

The distortion of the channel is measured and presented as radio link parameters. Using these parameters together with knowledge of the speech coder capability, SQI measures the quality. The SQI algorithm itself starts off with radio link parameters such as bit error levels, erased frames, stolen frames, hand-off situations, DTX-activity and such. Temporal processing is performed to get new parameters correlating better to timing effects in the decoders and to subjective judgements. For example averaging parameter in time. A correlative stage of parameter combining follows, also containing some nonlinear transformations and producing yet a new parameter set. An example is the combination of stolen frames with frame erasures raised to some fractional power. This set is the extracted features out of which the speech quality finally is estimated. The estimator body is shown below, where 4 is the estimated speech quality, Z beeing weigthing factors, S a vector of 3 radio link parameters, ] a power corresponding to a parameter.

Examples are segmentalSNR(Signal to Noise Ratio), PSQM[4], MNB (Measuring Normalized Blocks[5]), and SQI. What the measurement is based on can also be used for categorization. This pair consists of measures based on speech or other transmitted signals compared to parameters describing transmission. Most objective measures such as PSQM and MNB use transmitted speech, but also some subjective ones like DMOS (Differential MOS[2]). A few use other signals then speech, like the method used in products for fixed telephony. In the other category SQI and an early suggestion from DeTe Mobil[8] uses transmission parameters only. Whether the measure takes into account both links in a conversation is the next category. Most methods as well as SQI are strictly simplex, and measure one link only. If a duplex quality measure for both up and downlink is required an aggregation of two simplex measures is usually done. A subjective duplex example is the conversational test. Where equipment is required, is the next category. SQI require only equipment at the receiver side of the simplex connection. All methods using measurement signals require equipment at both transmitting and receiving/calculating side. Time scale is yet another categorization. It is difficult to state groups of measures, but the time frame that a measure is based on or supposed to represent is interesting when comparing measures. Objective measures like SegSNR can be calculated on very small segments, like 5 ms. Subjectively there is a lower limit of how short time frame that can be judged, being close to a short utterance of about 2-3 seconds. The next level of time frame can be the one used in MOS tests, approximately 5-8 seconds. Most objective methods produces scores per sentence. Conversational tests vary in length, but should mimic reality and represent a few minutes. Finally benchmarking scores or other network aggregations can present scores that represent hours of speech. SQI is designed to give a score that represent a short utterance of about 2-3 seconds.

$ 4 = ∑ ZL ⋅
L =1

3

(S(L))

]

L

+ ∑ ∑ ZLM ⋅
L =1 M =1

3

3

(S(L)S( M))

]

L

Today SQI is available for both GSM and DAMPS. Downlink measurements can be performed with the measurement tool TEMS. Uplink measurements are planned to be released within the Ericsson Base Station product family in the close future. 

&$7(*25,=$7,21

Measures and methods can be separated using some categories. E.800[1] lists three categories under the label QualityOfService; Accessibility, Retainability and Integrity. All measures of speech quality belongs to the category Integrity. Objective and subjective measures is the first pair. Subjective measures are based on subjective opinions, mostly of listening subjects. MOS is such a method, Mean Opinion Score[2]. Objective methods are based on objective facts, and are often mathematical to nature. Most objective speech quality methods are optimized to maximize the correlation to a subjective method.

The last categorization is usage, in the sense of performance measure or error detector. Answering the questions how and why. Most measures including SQI belong to the performance group, giving an indication of how it sounds. The latter group indicates where the problem resides. An indicator for echo-existence for example.

score. The ideal is a straight line. However, ITU-T do not recommend PSQM for degraded cellular conditions. If the outliers in Figure4 are identified, they are found to contain hand-overs and bursts of frame erasures. This type of distortion is normally out of scope for PSQM. Table 2 presents the metrics when outliers containing hand-overs are omitted: 0HWKRG SQI PSQM &RUUHODWLRQ 95,3 92,9 506 3( 3,71 4,51  506 &3( 1,45 2,63 

3(5)250$1&( &203$5,621

Performance evaluation of an objective speech quality measure are by default performed as comparisons towards a subjective measure. We will present performance metrics of SQI compared to a subjective comparative listening rated in dBQ, as well as comparable results from PSQM. Three evaluation metrics will be presented, first the correlation then the RMS prediction error and finally the 95% inverse confidence interval weighted RMS prediction error as suggested in ITU-T[6], see formula below1. Where 1 is total number of scores in the database, 3( is the prediction error to score L, &RQI is the 95% confidence interval of the subjective score L.

7DEOH  Performance metrics without handovers.

95506&3( =

1 1

 3( L  ∑  L  &RQI + 0,5 L
1
=1

2

The performance of the subjective listening itself reveals a mean standard deviation of 3.20 dBQ per clip. This corresponds to a 95% confidence interval of 3.98 dBQ. The 95% confidence interval of the listening can be compared to the 95%RMS CPE, and the number of listeners required to give the same performance of the listening as the SQI and PSQM can then be estimated. This renders 13 listeners for SQI and 9 listeners for PSQM using an objective comparative listening in the dBQ scale. In other words using SQI a performance improvement comparable to 4 extra listeners or 44% is achieved over PSQM

This comparison is performed on live recordings from a GSM network. Clips with an even speech quality distortion distribution is selected among a large number of clips and processed with both SQI and PSQM. All clips are downlink and Full Rate, some contain various degree of DTX(Discontinuous Transmission) and some Handovers. The clips are 2,5 seconds long. PSQM scores are transformed to the dBQ domain with a second order polynomial. 0HWKRG SQI PSQM &RUUHODWLRQ 93,1 91,1 506 3( 4,34 4,88  506 &3( 1,90 2,57 

0(7+2' 027,9$7,21

7DEOH  Performance metrics on all data.

In cellular systems the major part of the speech distortion originates from the radio environment. This is under the assumption that equipment, transmission errors and echocancellation is under control. The cause of degraded speech quality is then the radio channel distortion. The effect is the degraded speech quality that is percieved after speech decoding. Recalling Figure 1, two stages are present between cause and effect in the simplified schematics. The first can be regarded as a transform of the channel errors, providing the speech decoder its input as well as some radio link parameters. The second is the speech decoder, providing the effect or the speech quality. If its function is known it could be identified as a system producing speech quality based on how degenerated the channel is.

As can be seen in Table 1 and figure 3 and 4, SQI is a better measure than PSQM for cellular applications. Figure 3 and 4 shows subjective score against method
1

Note! This metric is not suited for comparisons of scores obtained from databases with high confidence to scores obtained from databases with low confidence. The strength is weigthing many databases with varying confidence into one performance metric.

SQI Performance, incl Hand−Over 25 20 15 10 10 5 0 −5 −10 −10 −15 −20 −25 −25 −15 5 20

PSQM Performance, incl Hand−Over

15

PSQM
−20 −15 −10 −5 0 5 Subjective listening 10 15 20 25
fig 3

SQI

0

−5

−20 −25

−20

−15

−10

−5 0 5 Subjective listening

10

15

20

25
fig 4

)LJXUH  SQI vs Subjective score.

)LJXUH  PSQM vs Subjective score.

According to [7] a system can be identified if the identification method provides parameters to a model structure given enough samples taken during total excitation of the system and resulting with a prediction error that approaches zero. The system being speech decoder with input of channel decoded data and radio link parameters and output of speech with quality Q in this case. Care is taken with the sampling frequency of the input during identification. The system is monitored in its normal environment, so the number of samples used in model identification determines if all cases of excitation are captured. A multidimensional distributionbased clip selection process ensures existence of all excitation modes in a reduced identification set. Cross validation is used to exclude the possibility of time invariance Initiated speech coding knowledge [9]-[10] ensures that all existence of feedback is modeled in the temporal processing stage of SQI. When these factors are considered, a successful identification can be performed. The prediction error that the SQI-model produces is not zero, but stay on the same level or better than another accepted method, PSQM. SQI can thus be regarded as a good or even better speech quality measure than PSQM. 

5()(5(1&(6 

&21&/86,21

[1] ITU-T E.800 “Terms and definitions related to quality of service and network performance including dependability” [2] ITU-T P.800, “Methods for subjective determination of transmission quality” [3] ITU-T P.810, “Modulated Noice Reference Unit”, MNRU [4] ITU-T, P.861, “Objective quality measurement of telephone-band (300-3400 Hz) speech codecs”, PSQM [5] Appendix, P.861, “Measuring Normalising Blocks”, MNB [6] ITU-T, Study Group 12, Delayed Contribution D.053, Geneva, 17-27 February 1998, “Evaluation of new methods for objective testing of video quality: objective test plan” [7] Söderström, Stoica, “System Identification” [8] Ingo Gaspard, DeTeMobil GmbH, “Efficient Methods for Evaluation and Prediction of Subjective Speech Quality in GSM Networks” IEEE 1994, ISBN 0-7803-1927-3/94 [9] A.Uvliden et.al “Adaptive Multi Rate Speech Coding”, ASILOMAR 1998 [10] T.B.Minde et. al, “Requirements on Speech Coders Imposed by Speech Service Solutions in Cellular Systems”, IEEE Workshop on Speech Coding, September 1997

SQI is an objective, simplex, transmission based, integrity measure, for performance evaluations requiring equipment at receiver side only. The score is presented in dBQ and represents a time-interval of 2-3 seconds. Performance of SQI is superior to PSQM, ITU-T P.861. Metrics to a subjective listening measure are; correlation: 0.932, RMSPE: 4.34, RMSCPE: 1.90.