You are on page 1of 9

Voice-Enhancement Devices and GSM Network Quality

Fredrik Pettersson
Product Marketing Manager Ditech Networks

Mobile phone customers rank call quality and reliability as the most important criteria when choosing a provider. Global System for Mobile Communications (GSM) carriers must do what they can to preserve acceptable call quality, often under difficult circumstances. For example, handset capabilities vary, network conditions vary, and the codecs used to process voice traffic can also vary. Under these conditions, carriers should learn all they can about the factors that impact GSM network quality and how to best manage it. This article will look at how voice quality is measured in GSM networks, the impact of traditional voice qualityenhancement solutions on GSM networks, and how a new type of devicea voice-enhancement device (VED)significantly improves voice quality by delivering a more consistent and tighter voice quality distribution. The article also addresses VEDs ability to enable a higher use of low-bit-rate speech codecs (employed by GSM half-rate [HR] and adaptive multi-rate [AMR] calls for cost-effective voice capacity during peak traffic conditions) while maintaining acceptable quality. Measuring Voice Quality in GSM Networks Current methods for measuring voice quality in GSM networks primarily use drive test tools based on the perceptual evaluation of speech quality (PESQ) algorithm [1], which is an objective method for assessing end-to-end speech quality developed by the International Telecommunication Union (ITU). Carriers typically use PESQ with clean speech files to measure the radio quality of their networks and to quantify how impairments related to codec type and frame loss (caused by poor coverage, handovers, and interference) impact listening quality. Another common method is based on Ericssons speech quality index (SQI), which estimates how codec type and radio link parameters such as bit-error rate (BER), frame-erasure rate (FER), discontinuous transmission (DTX), and handover rates affect voice quality [2]. Limitations of Traditional Objective Voice-Quality Measurement Methods Essentially, PESQ and SQI measure the impact of radio frequency (RF)related impairments on listening quality. Thus, they are not able to capture and quantify how other important voicequality impairments present in live callsbackground noise, acoustic echo, and mismatched speech levelsaffect customer experience. High levels of background noise are common in mobile calls placed in busy streets; crowded places; and inside cars, buses, and trains. Acoustic echo is a nonlinear type of echo often generated by low-end handsets due to insufficient acoustic isolation between the speaker and the microphone. Acoustic echo can also be generated by headsets and Bluetooth devices, which are

becoming increasingly common due to new laws that require motorists to use hands-free devices when talking on their phones while driving. Moreover, PESQ is an intrusive method, which means that it can only measure the quality of a limited number of test calls. It cannot measure the quality of actual live calls or the quality of calls placed in locations different from those covered by the standardized drive test routes. In addition, PESQ and SQI can only estimate listening quality; they cannot measure conversational quality, which is greatly affected by acoustic echo and transmission delay. Subjective Voice-Quality Test Methods Subjective methods use human listeners to measure voice quality so they can detect and evaluate all aspects of voice quality. The ITU has developed a number of recommendations for subjective testing, including the ITU Telecommunication Standardization Sector (ITUT) P.800 [3], which defines the mean opinion score (MOS) as one important metric for subjective determination of transmission quality. The test subjects are asked to listen to prepared speech files and then determine the quality of the speech according to the following scale: Excellent Good Fair Poor Bad 5 4 3 2 1

The MOS is then calculated as an average of all participants scores. The ITUT P.835 recommendation [4] is particularly suitable for subjectively evaluating speech communications systems that include noise suppression algorithms. The scoring method is similar to ITUT P.800 with the addition that participants are asked to do three separate evaluations of the speech signal, the background signal, and the overall signal. A key limitation of these subjective test methods is that they are expensive, because they require a lot of testing time and a large number of participants. This means that carriers cannot employ these subjective methods for large-scale network testing and continuous monitoring of voice quality. The ITUT G.107 E-Model The ITUT G.107 e-model [5] is an objective method that addresses many of the limitations of the traditional voicequality measurement methods. The e-model is a comprehensive method that takes into account not only codec impairments and the impact of frame loss, but also environmental noise, mismatched speech levels, echo, and delay to determine perceived voice quality. Further, the e-model is a non-intrusive method that allows cost-effective, large-scale monitoring of all live calls passing through a network. The e-model estimates overall voice quality using a metric called the transmission rating (R) factor. The R factor is a number from 0 to 100, where 100 represents an ideal call with no impairments. Connections with an R factor of less than 50 are considered to be of poor quality with nearly all users dissatisfied, and are therefore not recommended by the ITU. The European Telecom Standards Institute (ETSI) originally specified the e-model, and it was later adopted by the ITU, which created the G.107 recommendation.

The e-model assumes that all impairments are additive, and calculates the R factor according to the following formula: R = Ro Is Id Ie + A where Ro is the signal-to-noise ratio (SNR) and Is is the signal impairment occurring simultaneously with the speech, including loudness, codec quantization distortions, and nonoptimum side tones. Id captures impairments related to echo and delay, and Ie (equipment impairment factor) represents impairments due to low bit-rate codecs and packet loss. Finally, the advantage factor, A, reflects the users expectation of quality when making a phone call. Table 1 shows maximum clean speech R values for common codecs used in wireless networks [6]: Codec G.711 GSMEFR EVRC GSMFR GSMHR Bit rate 64 kbps 12.2 kbps 8.55 kbps 13 kbps 5.6 kbps Maximum R value for clean speech 94 89 88 74 71 Table 1 The e-model also provides a method for converting R values into MOS, according to Figure 1, provided in G.107 recommendation [5].

Figure 1: MOS as Function of Rating Factor R For example, R = 50 translates to about 2.6 points on the MOS scale. Figure 2 shows an example of how codec impairments, frame loss, background noise, acoustic echo, and mismatched speech levels impact the delivered voice quality according to the ITUT G.107 model.

Figure 2: Voice-Quality Impairments and Actual Delivered Quality (Based on the ITUT G.107 E-Model) The R value of a perfect EFR call is 89, but due to the different impairments, the actual delivered quality is close to 55 on the R scale in this particular example. Figure 2 also illustrates the discrepancy between the voice quality that is measured by a PESQ based drive test tool using clean speech files and the actual quality delivered to the subscriber. A clean-speech EFR call at 1 percent FER would be of good quality, but due to the common existence of moderate background noise, a relatively weak acoustic echo, and a speech level that is somewhat too low in this case, the actual quality is reduced to R = 55, which is quite close to the R = 50 limit for an acceptable call. The e-model is able to measure and quantify these additional impairments that tend to have a significant impact on user experience. Finally, the e-model has been adopted by the voice over Internet protocol (VoIP) industry, where carriers frequently use it to monitor voice quality. In Japan, it is even employed to regulate voice quality according to the following classes: R > 80, and end-to-end delay < 150 ms for PSTNequivalent and emergency calls R > 50, and end-to-end delay < 400 ms for plain IP telephony

The ITUT G.107 method is now also being integrated into some of the most advanced VEDs for non-intrusive monitoring of live wireless calls. Traditional Voice-Enhancement Solutions For many years, GSM carriers have used advanced drive test tools and methods to measure and mitigate radio networkrelated voice-quality impairments. Other important factors impacting voice quality in GSM networks are the so-called external impairmentsbackground noise and nonlinear acoustic echo. As we have seen, these external impairments are not being measured by traditional drive test tools. The impacts of background noise and nonlinear acoustic echo are therefore poorly understood in the GSM industry, and previous solutions addressing these impairments have not been optimal. There are several typical approaches to voice enhancement in GSM networks. Some GSM handsets have basic noise suppression and acoustic echo cancellation built in. However, the

performance of these handset-based solutions varies considerably depending on handset vendor and model, so the GSM carrier cannot optimize its delivered quality and capacity in the way it could with network-based solutions that can process traffic from all handsets in the network. In addition, handsets have limitations in processing power and battery life that make it difficult and expensive to implement the most advanced noise-suppression and echo-cancellation algorithms. During the standardization of the latest codec in the GSM family, AMR, 3GPP identified the impact of background noise on voice quality, especially the limitations of the lowest AMR codec rates in noisy conditions. 3GPP issued a set of minimum performance requirements on handsetbased noise suppression in an attempt to address the problem [7]. These requirements are not mandatory, however, and apparently not stringent enough. Largescale measurements on live GSMAMR calls in a North American network show that considerable amounts of noise remain in the calls, with significant impacts on voice quality and user experience. Based on these measurements, about 18 percent of live AMR calls have background noise impairments exceeding 24 points on the R scale, equivalent to degradation on the order of 1.0 MOS points (Figure 3).

Figure 3: Background Noise Impairments Measured in Millions of Live Calls in North American GSM and CDMA Networks (100 Percent AMR Calls in GSM Networks) Another approach is to build noise reduction into the codec itself. Unlike GSM, the CDMA standard has mandatory noise reduction integrated into the enhanced variable rate codec (EVRC). The effects of this built-in noise reduction can also be seen in Figure 3, where, CDMA calls perform better than GSM ones for moderate and small impairments of fewer than 24 points. However, neither CDMA nor GSM noise suppression seems to be very effective for heavily impaired calls (where the impairment exceeds 24 points on the R scale), as there is no significant difference between GSM and CDMA calls. This could be explained by the inherent processing power limitations of handset-based noise suppression discussed earlier. Finally, carriers can always improve the quality of the radio network itself by deploying more base station sites and more transceivers. However, this proposition is very costly and does not

address the external impairments that are the root cause of the problem. Voice-Enhancement Devices VEDs are a newer approach to reducing voice impairments in GSM networks. VEDs are network-based functions that typically include adaptive noise cancellation (ANC), acoustic echo cancellation (AEC), and automatic level control (ALC). A VED is typically deployed in the mobile switching center (MSC). VEDs process PCMencoded speech and are typically installed on the standard A interface between the transcoder and the MSC. This allows the VED to process all call types, including mobile-to-mobile and mobile-toPSTN calls, for all codec types such as EFR, AMR, and HR. Basic VEDs typically remove noise and acoustic echo impairments in the uplink, while more advanced ones can remove impairments on the downlink as well. The ability to process downlink speech allows a VED to enhance off-net calls that originate in other mobile carriers networks. More advanced VEDs have additional functions such as enhanced voice intelligibility (EVI), which processes and enhances the downlink speech path to additionally improve voice clarity for callers in noisy environments. Impact of VED Deployment VEDs provide an added layer of voice-quality processing for mobile calls to provide an enhanced customer experience. VEDs also enable GSM carriers to better control the delivered quality distribution and allow them to offer a more consistent voice service based on their minimum quality requirements. When a GSM network uses a VED, the noise impairments are significantly reduced, as shown in Figure 4.

Figure 4: GSM Network Noise Impairment Reduction with a VED (North American Network with 100 Percent AMR) In addition, advanced VEDs are also effective at removing very large impairments exceeding 25

points on the R-factor scale. These impairments correspond to a quality degradation of 1.0 MOS points or more. The delivered quality varies widely without a VED due to the impact of external impairments, which tend to generate a large tail of poor quality calls. These calls are the ones most likely to drive churn and lost revenue, since they fall below the minimum allowable quality level (Figure 5).

Figure 5: Impact of Voice Quality on Churn and Lost Revenue A VED removes these external impairments and reduces the percentage of calls that fall below the minimum quality limit. Thus, the VED helps a carrier to control the percentage of unacceptable calls in its network and provides greater flexibility and better quality margins. VEDs also improve voice quality in crowded and busy urban locations by removing noise and echo impairments and adjusting speech levels for a comfortable listening experience. Subjective testing by Dynastat, a leading listening test lab in the United States, indicates that the voicequality improvement from a VED is on the order of 0.4 to 0.5 MOS points for low-bit-rate codecs such as GSMHR in noisy conditions (Figure 6).

Figure 6: Impact of VED on MOS Scores Figure 7 shows the results of ITUT G.107 measurements performed on millions of live calls in a GSM network in southern Europe. About 40 percent of the HR calls fall below 44 points on the R scale, a figure that corresponds to about 2.2 points on the MOS scale. After VED processing, less than 10 percent of calls have poor quality.

Figure 7: Impact of VED on HR Calls Thus, one important benefit of the VED is that it improves HR and AMR call quality, thereby enabling carriers to increase their use of HR and the lowest AMR codec rates in AMRHR and AMRFR modes to gain more voice capacity during busy hours, while at the same time limiting the percentage of calls with unacceptable quality. Conclusion With a better understanding of the factors impacting voice quality in their networks, GSM operators are better equipped to maintain acceptable quality levels for their subscribers.

Traditional quality management methods address part of the problem, but the key issues are noise and nonlinear acoustic echo. By deploying VEDs, carriers can address voice-quality problems such noise, echo, and level impairments. VEDs deliver better and more consistent voice quality for all calls, regardless of handset model or codec. VEDs combined with G.107 monitoring capabilities also allow carriers to aggressively employ HR and the lowest AMR codec rates to increase call capacity while maintaining acceptable voice-quality levels and controlling the percentage of calls with poor quality. Finally, VEDs enable implementation of a better test methodology that uses the ITUT G.107 e-model and provide the ability to monitor the customer experience for all live calls on the network. Voice quality will always be an important selection criterion for GSM subscribers, and by deploying VEDs, carriers can ensure a quality calling experience. References [1] ITUT Recommendation P.862 (02/2001), Perceptual Evaluation of Speech Quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. [2] Speech Quality Measurement with SQI, Ericsson Technical Paper, Revision B, August 9, 2006. [3] ITUT Recommendation P.800 (08/96), Methods for subjective determination of transmission quality. [4] ITUT Recommendation P.835 (11/2003), Subjective test methodology for evaluating speech communications systems that include noise suppression algorithm. [5] ITUT Recommendation G.107 (03/2005), The E-model, a computational model for use in transmission planning. [6] ITUT Recommendation G.113 (02/2001), Transmission Impairments due to Speech Processing. [7] 3GPP TS 26.077: Minimum Performance Requirements for Noise Suppresser; Application to the Adaptive Multi-Rate (AMR) Speech Encoder.

You might also like