HD Voice Codecs

http://www.voipsupply.
com/hd-voice-codecs
HD voice codecs
What is a codec?
The word codec comes from mashing together the functions of compressing (co) and decompressing (dec) analog sound into digital bits for use by computers and networks. There are literally hundreds of audio codecs -- pieces of computer code -- available today and embedded in any device that plays sound, from a simple MP3 player to the hottest smart phones. Some are open source and free while others are proprietary and/or patented, requiring licensing fees. Why are there so many different codecs? Over the years, people have created and optimized codecs for the specific environments they were going to be used in, so the cellular community built codecs that optimized the use of radio frequency (RF) bandwidth while others wanted adaptable bit-rate codecs suitable for a wired broadband environment that would adjust sound quality depending on how much bandwidth was available -- compress a little if there's a lot of bandwidth, crunch harder if there's less.
More recently, developers have been leveraging more efficient computer processors to develop better codecs. The tradeoff for using more CPU cycles is, of course, more power required to run them -- not an issue at a desktop, but definitely a concern for mobile devices. A number of codecs are ITU (International Telecommunications Union) standards, formalized for international use and incorporation into
http://www.voipsupply.com/hd-voice-codecs
devices. If a codec name starts with a G and a period, such as G.711 or G.722, it's an ITU standard.
Popular HD voice codecs - G.722, AMR-WB, SILK, iSAC

You can't talk about HD voice codecs without first talking about baseline analog and digital voice quality. Established way back in 1972, G.711 is the standard for stock VoIP voice quality and equal to what you get out of a POTS analog phone call. It captures speech in a range of 3.4 kHz, has a sampling rate of 8 kHz, and needs 64 kbit/s of bandwidth to deliver a call. G.722 is Old School when it comes to HD voice, formalized back in 1988. It captures sound in a range of 7 kHz and samples audio at a rate of 16 kHz -- double that of G.711. The result is superior quality and clarity far above a POTS analog phone call. Taking advantage of CPU processing speeds, G.722 can deliver double the quality of a G.711 phone session in the same amount of bandwidth -- 64 kbit/s. You'll find G.722 built into pretty much every desktop VoIP handset built today (2010), regardless of manufacturer or model of phone -- yes, even the modest-looking $129 list price entry models support G.729. Patents on G.722 have expired so there's no licensing fees and the processing requirements are minimal on today's chips. At least one software shop (D2 Technologies) has implemented G.722 for the Android mobile operating system. Handset manufacturers who support G.722 include Aastra, ADTRAN, Allworx, AudioCodes, Avaya, Cisco, Panasonic, Polycom, Siemens and Snom . Coming strong out of Europe and the mobile community is AMR-WB, also known as G.722.2. Mobile operators wanted better sound quality delivered in less bandwidth, so AMR-WB should deliver quality G.722 quality at around 24 kbit/s. France Telecom and Ericsson have been leaders in promoting AMR-WB for mobile HD voice -- in part, because they hold some of the patents in the standard -- and they would like to see AMR-WB appear in desktop phones and software clients so users can
make end-to-end calls in AMR-WB, rather than having to translate (transcode) between G.722 and AMR-WB. You'll see more AMR-WB buzz for desktop handsets later in 2010 and into 2011. SILK is Skype's "super wideband" voice codec. Optimized for real-time communications on the Internet, SILK is an adaptive bit-rate codec that supports multiple sampling rates ranging from 8 kHz narrowband to 24 kHz or more. If you have the CPU cycles and bandwidth of 40 Kbp/s, SILK gives you the best performance possible. On a lower-powered machine and/or with less available bandwidth, SILK drops down and adjusts to the conditions involved. Unlike AMR-WB, SILK is available royalty-free. A few manufacturers, including AudioCodes, have discussed incorporating SILK into their products. Finally, Global IP Solution (GIPS) offers a proprietary wideband speech codec that has been incorporated into a large number of soft clients and applications, including AIM, Citrix Online, CommuniGate, Gizmo5, Google Talk, IBM Lotus, NimBuzz, QQ, WebEx, and Yahoo!
The problem with too many different codec

In order to have a successful HD voice call, both (or nearly all in a conference) need to use the same codec. If both sides are using different HD codecs either one side has to be transcoded -- translated -- into the same codec type or both sides have to shift to a mutually agreeable codec. Transcoding already takes place in the VoIP world on a daily basis, with calls being compressed before sent out long distance and translations taking place between the POTS network and VoIP transport. The issues with transcoding between HD codecs are that it takes more horsepower (processing cycles) than with vanilla VoIP/POTS networks and nobody is willing to say the end translation product is as good as a "pure" end-toend HD voice call using a single codec.
If both sides can't find a mutually agreeable HD voice codec, they end up dropping down to the lowest common denominator -- G.711 -- which kills the primary point of using HD in the first place.
What is HD Voice?
HD voice
is a technology that delivers at least twice the sound as compared to a typical voice phone call (i.e. "Plain old phone service" or POTS to be hip; Public switched telephone network or PSTN if you're more formal) delivered on a landline through the world's analog circuitswitched phone network. Real world benefits from HD voice include:
1. 2. 3. 4.
Better comprehension and clarity, especially in long detailed and/or technical discussions Clarity in understanding acronyms The ability to differentiate between and clearly identify others on a conference call
Clarity and easier understanding in multi-national/multilingual conversation where you have non-native speakers and native speakers communicating in one or more languages 5. More accurate transcriptions (both human and automated)
In short, everything involving voice is better in HD voice, be it simple person-to-person call , a 20 person international conference discussion, or a speech-to-text process.
The technology of HD voice

Sound is measured in hertz, or Hz. The human ear can typically hear everything between 20Hz and 20,000Hz. The higher the number, the
higher (squeaker) the sound is until you move past 20,000Hz and into ultrasound frequencies only a dog can pick up. A landline phone call captures and delivers sound in a range of 300Hz to 3400Hz, so there's a lot of sound information chopped off on both the low and high-ends of the scale. For simplicity's sake, a POTS call has a range of about 3.4 kHz (3400Hz). POTS calls are often called "narrowband" calls because they have such a restricted range as compared to what the human ear can actually hear and process. Since an HD voice call is defined as delivering at least twice the sound range of a traditional phone call, an HD voice call will have a range of about 7 kHz -- or more. Wideband voice and HD voice are often used interchangeably since an HD voice call is "wider" -- more of a Hz range than a narrowband call. In order to deliver twice or better sound than a POTS call, the first thing you need is a phone acoustically built to capture and deliver that extra information, so both the microphone(s) and a speaker/handset must capable of receiving and delivering across a 7 kHz or greater range. Once sound is captured, it needs to be processed into digital form with a codec. The G.722 codec (more on codecs later) is generally considered the baseline for HD voice; it captures and delivers sound between 30Hz to 7000Hz. Interestingly, a HD voice call using G.722 can be delivered on the same amount of bandwidth as its digital POTS equivalent of G.711 -- 64 kbp/s. If you are currently using G.711 in a VoIP phone system, you can switch to HD
voice without needing more bandwidth.
Finally, two (or more, if it's a conference call) parties need to be able to talk to each other using the same voice encoding (codec) scheme. Within an organization/PBX domain, this is pretty easy -- turn on G.722, reboot the phones, and you're done. Communicating between different HD voice groups, or "islands" is more difficult because there's some Internet peering and interconnection issues involved, but service providers are working out the details to transparently provide HD voice calling.
What phones support HD voice?

A better question might be "What phones don't support HD voice?" All the major IP telephone handset manufacturers -- Aastra, Allworx, AudioCodes, Avaya, Cisco, ShoreTel, Panasonic, Polycom, Siemens, and snom --support G.722 in their current (2010) phone lines going all the way down to the entry-level (i.e cheapest) model.
Benefits of HD Voice
Simply put, HD voice makes everything (voice) better. With an HD voice call delivering twice the sound as a narrowband one, there's much more audio information provided for the brain to process, resulting in less fatigue and better comprehension. Computer-based processes like voice recognition and speech-to-text also gain from HD voice, with better accuracy. Advocates of HD voice use the cliche' of a call sounding as clear and natural as if you are talking to someone in the same room, and there's a laundry list of reasons for wideband goodness ranging from being able to understand a three year old (higher voices get clipped) to public safety.
Specific HD Voice benefits for businesses include:

Reduction of fatigue
During a narrowband call, your brain is quietly working to "fill in the blanks" by interpreting word sounds that have been clipped to fit into a sound range of 300Hz to 3400Hz. All the information you normally hear between 20Hz to 300Hz and 3400Hz to 20,000Hz is gone, so your brain has to figure out what is being said by using contextual clues. For short and clear calls, this isn't a big headache, but the longer the call, the more work your brain ends up doing without you thinking about
it. (Yes, there's a reason why you dread hour-long calls and long for them to be over).
Better comprehension and clarity

Acronyms are "notorious" for being hard to understand during a narrowband call, says HD voice expert and Polycom CTO Jeff Rodman. In addition, similar-sounding words like "sail" and "fail" also cause confusion. A narrowband call can result in a lot of repetition and additional explanation -- or people just don't get it the first time through and have to get clarification via email... or another phone call. Because HD voice provides more sound information, it's easy to understand the difference between FEC, FCC, SEC, and FTC on a call. In professions where accuracy and speed counts -- such as medical, legal, and financial -- HD voice is a clear winner because information is communicated more accurately the first time around. Technical conversations are easier because terms can be clearly understood. As a result, it is rare that speakers are asked to repeat themselves -- an occurrence that happens all too often in narrowband. In addition, individual voices -- people -- are highlighted in HD voice, making it easier to know who is talking during a call.
Conference calls rock!

If there's just one "must have" app for HD voice, it is conferencing. The combination of reduced fatigue and better compensation and clarity enable people to worry more about the content of discussions, rather than trying to struggle understanding what is being said. Executives at Fortune 500 companies -- the "C-Level" guys -- are starting to insist upon conducting conference calls in HD voice for the efficiency it brings. Time is money, and HD voice enables people to focus on the job at hand and to get it done more quickly.
Improved multi-national/multi-cultural communication
HD voice is a clear winner when it comes to international calls and another "must have" for businesses regularly doing business with nonnative speakers of another language. For most of us, it is a challenge to speak another language. There are accent issues, vocabulary issues, and even tone can be used differently to communicate nuances. Put all of those factors into a narrowband call and the ability to clearly communicate between offices in Europe and Asia becomes much more difficult. Using HD voice, non-native speakers will be able to more clearly understand what is being said and be more clearly understood when they speak. And if everyone is a non-native speaker of the common language being used during a call, HD voice might make the difference between communication and confusion.
More accurate transcriptions

HD voice provides much better raw information for both humans and computers to process when it comes to creating transcriptions. Human beings can more easily hear what is being said in a recording, saving time. Any automated speech-to-text process -- ranging from transcription to emailing phone messages -- benefits.
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice
VoIP bandwidth fundamentals
E-Mail Print A AA AAA inShare Facebook Twitter Share This RSS Reprints Bandwidth requirements for Voice over IP can be a tricky beast to tame until you look at the method and factors involved. This guide investigates what bandwidth means for VoIP, how to calculate bandwidth consumption for a VoIP network and how bandwidth can be saved by using voice compression. Table of contents
1.
What about bandwidth for VoIP? -- An introduction to bandwidth issues for Voice over IP and its different components. Calculating bandwidth consumption for VoIP -- This section discusses how bandwidth can be calculated for VoIP transmissions and what strategies work best for the majority of situations. How can voice compression save bandwidth? -- Using voice compression can be one of the best strategies when trying to save bandwidth. This section discusses how these 'savings' can be achieved.
2. 3.
4. 5.
6.
What about bandwidth for VoIP? Voice over IP (VoIP) is the descriptor for the technology used to carry digitized voice over an IP data network. VoIP requires two classes of protocols: a signaling protocol such as SIP, H.323 or MGCP that is used to set up, disconnect and control the calls and telephony features; and a protocol to carry speech packets. The Real-Time Transport Protocol (RTP) carries speech transmission. RTP is an IETF standard introduced in 1995 when H.323 was standardized. RTP will work with any signaling protocol. It is the commonly used protocol among IP PBX vendors.
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice An IP phone or softphone generates a voice packet every 10, 20, 30 or 40ms, depending on the vendor's implementation. The 10 to 40ms of digitized speech can be uncompressed, compressed and even encrypted. This does not matter to the RTP protocol. As you have already figured out, it takes many packets to carry one word. The shorter the packet, the shorter the delay End-to-end (phone-to-phone) delay needs to be limited. The shorter the packet creation delay, the more network delay the VoIP call can tolerate. Shorter packets cause less of a problem if the packet is lost. Short packets require more bandwidth, however, because of increased packet overhead (this is discussed below). Longer packets that contain more speech bytes reduce the bandwidth requirements but produce a longer construction delay and are harder to fix if lost. Many vendors have chosen 20 or 30ms size packets. RTP packet format The RTP header field contains the digitized speech sample (20 or 30ms of a word) time stamp and sequence number and identifies the content of each voice packet. The content descriptor defines the compression technique (if there is one) used in the packet. The RTP packet format for VoIP over Ethernet is shown below.
Ethernet Digitized RTP UDP IP Ethernet Trailer Voice Header Header Header Header RTP can be carried on frame relay, ATM, PPP and other networks with only the far right header and left trailer varying by protocol. The digitized voice field, RTP, UDP and IP headers remain the same. Each of these packets will contain part of a digitized spoken word. The packet rate is 50 packets per second for 20ms and 33.3 packets per second for 30ms voice samples.The voice packets are transmitted at these fixed rates. The digitized voice field can contain as few as 10 bytes of compressed voice or as many as 320 bytes of uncompressed voice. The UDP header carries the sending and receiving port numbers for the call. The IP header carries the sending and receiving IP addresses for the call plus other control information. The Ethernet header carries the
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice LAN MAC addresses of the sending and receiving devices. The Ethernet trailer is used for error detection purposes. The Ethernet header is replaced with a frame relay, ATM or PPP header and trailer when the packet enters a WAN. 'Shipping and handling' In reality, there is no Voice over IP. It is really voice over RTP, over UDP, over IP and usually over Ethernet. The headers and trailers are required fields for the networks to carry the packets. The header and trailer overhead can be called the shipping and handling cost. The RTP plus UDP plus IP headers will add on 40 bytes. The Ethernet header and trailer account for another 18 bytes of overhead, for a total of at least 58 bytes of overhead before there are any voice bytes in the packet. These headers, plus the Ethernet header, produce the overhead for shipping the packets. This overhead can range from 20% to 80% of the bandwidth consumed over the LAN and WAN. Many implementations of RTP have no encryption, or the vendor has provided its own encryption facilities. An IP PBX vendor may offer a standardized secure version of RTP (SRTP). Shorter packets have higher overhead. There are 54 bytes of overhead carrying the voice bytes. As the size of the voice field gets larger with longer packets, the percentage of overhead decreases -- therefore the needed bandwidth decreases. In other words, bigger packets are more efficient than smaller packets. Header compression Cisco has created a header compression technique that is now the standard called RTP header compression. This technique actually compresses the RTP, UDP and IP headers and significantly reduces the RTP, UDP and IP overhead from 40 bytes to between 4 and 6 bytes. The bandwidth consumption for compressed voice packets can be reduced by nearly 60%. This technique has less value for large uncompressed voice packets. The header compression technique is not recommended for the LAN implementations because there is typically more than enough bandwidth for voice calls. The header compression technique should be considered for the WAN implementations, where bandwidth is limited and much more expensive.
Calculating bandwidth consumption for VoIP The bandwidth needed for VoIP transmission will depend on a few factors: the compression technology, packet overhead, network protocol used and whether silence suppression is used. This tip investigates the first three considerations. Silence suppression will be covered in a later tip. There are two primary strategies for improving IP network performance for voice: Allocate more VoIP bandwidth (reduce utilization) or implement QoS. How much bandwidth to allocate depends on:

Packet size for voice (10 to 320 bytes of digital voice) CODEC and compression technique (G.711, G.729, G.723.1, G.722, proprietary) Header compression (RTP + UDP + IP), which is optional Layer 2 protocols, such as point-to-point protocol (PPP), Frame Relay and Ethernet Silence suppression/voice activity detection
Calculating the bandwidth for a VoIP call is not difficult once you know the method and the factors to include. The chart below, "Calculating one-way voice bandwidth," demonstrates the overhead calculation for 20 and 40 byte compressed voice (G.729) being transmitted over a Frame Relay WAN connection. Twenty bytes of G.729 compressed voice is equal to 20 ms of a word. Forty bytes of G.729 compressed voice is equal to 40 ms of a word.
The results of this method of calculation are contained in the next table, "Packet voice transmission requirements." The table demonstrates these points:

Bandwidth requirements reduce with compression, G.711 vs. G.729. Bandwidth requirements reduce when longer packets are used, thereby reducing overhead. Even though the voice compression is an 8 to 1 ratio, the bandwidth reduction is about 3 or 4 to 1. The overhead negates some of the voice compression bandwidth savings. Compressing the RTP, UDP and IP headers (cRTP) is most valuable when the packet also carries compressed voice.
Packet voice transmission requirements (Bits per second per voice channel) Codec Voice bit Sample rate time Voice payload Packets per second Ethernet PPP or Frame Relay RTP cRTP
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice G.711 64 Kbps 20 msec 160 bytes 50 87.2 Kbps 79.4 Kbps 75.6 Kbps 31.2 Kbps 23.4 Kbps 19.6 Kbps 82.4 Kbps 76.2 Kbps 73.2 Kbps 26.4 Kbps 20.2 Kbps 17.2 Kbps 68.0 Kbps 66.6 Kbps 66.0 Kbps 12.0 Kbps 10.7 Kbps 10.0 Kbps
G.711 64 Kbps
30 msec
240 bytes
33.3
G.711 64 Kbps
40 msec
320 bytes
25
G.729A8 Kbps
20 msec
20 bytes
50
G.729A8 Kbps
30 msec
30 bytes
33.3
G.729A8 Kbps
40 msec
40 bytes
25
Note: RTP assumes 40-octets RTP/UDP/IP overhead per packet Compressed RTP (cRTP) assumes 4-octets RTP/UDP/IP overhead per packet Ethernet overhead adds 18-octets per packet PPP/Frame Relay overhead adds 6-octets per packet This table provided courtesy of Michael Finneran. The varying designs of packet size, voice compression choice and header compression make it difficult to determine the bandwidth to calculate for a continuous speech voice call. The IP PBX or IP phone vendor should be able to provide tables like the one above for their products. Many vendors have selected 30 ms for the payload size of their VoIP implementations. A good rule of thumb is to reserve 24 Kbps of IP network bandwidth per call for 8 Kbps (G.729-like) compressed voice. If G.711 is used, then reserve 80 Kbps of bandwidth. If silence suppression/voice activity detection is used, the bandwidth consumption may drop 50% -- to 8 Kbps total per VoIP call. But the assumption that everyone will alternate between voice and silence without conflicting with each other is not always realistic. Silence suppression will be discussed in a later tip. Most enterprise designers do not perform these calculations. The vendor provides the necessary information. The designer does have
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice some freedom, such as selecting the compression technique for voice payloads and headers, and may be able to vary the packet size.
How can voice compression save bandwidth? The Public Switched Telephone Network (PSTN) started with the transmission of analog speech. This worked well for decades until the areas under city streets became saturated with copper cables, one copper pair per call. Starting in the 1950s, AT&T Bell Labs developed a technique to carry more voice calls over copper wire. They developed digitized voice technology through which 24 digital calls can be carried on two pairs of copper wire, thereby increasing the carrying capacity of the cables twelvefold. The voice is digitized into streams of 64,000 bps per call. The technology is called a T1 circuit and the bandwidth for the 24 calls is 1.544 Mbps. This worked well for domestic connections. The T1 technology then became the mechanism for long-distance domestic transmission. Most of the early voice compression technologies were designed for undersea cables, where bandwidth was limited and expensive. Voice compression technologies were created to reduce this bandwidth requirement. Voice compression is also used for digital cell calls, operating at about 8 Kbps instead of 64 Kbps. So voice compression is not new. As the PBX market has moved into an IP-based environment, voice compression has become attractive for WAN transmission. Voice compression can be used on a LAN, but since LANs have so much available bandwidth, it is not commonly applied to the LAN. The quality of a PSTN voice call provides enough analog bandwidth to understand the speaker in any language. It is also enough bandwidth for speaker recognition. The analog bandwidth delivered by the PSTN is about 3.4 KHz. This is considered toll quality. Voice compression can reduce the speech quality and may affect speaker recognition, so there is a limit to how much bandwidth reduction is possible before callers complain about voice quality.
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice The CODEC (COder/DECoder) is the component in an IP phone that digitizes the voice and converts it back into an analog stream of speech. The CODEC is the analog-to-digital-to-analog converter. The CODEC may also perform the voice compression and decompression. There are several voice digitization standards and some proprietary techniques in use for VoIP transmission. Most vendors support one or more of the following ITU standards and avoid proprietary solutions:
G.711 is the default standard for IP PBX vendors, as well as for the PSTN. This standard digitizes voice into 64 Kbps. There is no voice compression.
G.729 is supported by many vendors for compressed voice operating at 8 Kbps, 8 to 1 compression. With quality just below that of G.711, it is the second most commonly implemented standard.
G.723.1 was once the recommended compression standard. It operates at 6.3 Kbps and 5.3 Kbps. Although this standard further reduces bandwidth consumption, voice is noticeably poorer than with G.729, so it is not very popular for VoIP.
G.722 operates at 64 Kbps, but offers high-fidelity speech.
Whereas the three previously described standards deliver an analog sound range of 3.4 kHz, G.722 delivers 7 kHz. This version of digitized speech has been announced by several vendors and will become common in the future. It is important to note that all of the voice digitization transmission speeds are for voice only. The actual transmission speed required must include the packet protocol overhead. The quality of a voice call is defined by the Mean Opinion Score (MOS). A score of 4.4 to 4.5 out of a possible 5.0 is considered to be toll quality. Voice compression will affect the MOS. An MOS below 4.0 will usually produce complaints from the callers. Cell phone calls average about 3.8 to 4.0 for the MOS. The following table presents the voice MOS for different standard CODECs:
http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice Standard Speed MOSSampling delay per phone G.711 64 Kbps 4.4 G.729 8 Kbps 4.2 0.75 ms 10 ms 30 ms
G.723.1 6.3 Kbps 4.0 5.3 Kbps 3.5
This table illustrates two points. First, as the voice is compressed, the voice quality (MOS) decreases. The MOS in the table does not include network impairments such as jitter and packet loss. These impairments will further reduce the voice quality. The VoIP network designer should choose a compression technique with a higher MOS so the network impairments will not reduce the voice quality to an unacceptable level. Second, voice compression also adds delay to the end-to-end call. The table shows the sampling delay for one phone. This delay is doubled for the two phones of a call. This end-to-end delay needs to be limited. As compression increases, the delay experienced in the IP network needs to decrease, which increases the cost of transmission over the WAN, but not the LAN. The delays shown in the table are the theoretical minimum. The actual delays experienced will probably exceed 30 ms, no matter what compression technology is implemented. This delay will vary by vendor. The conclusion is that digital voice compression is worth pursuing for VoIP transmission on a WAN, but it comes with some costs in voice quality reduction and increased end-to-end delay. For more information, view this VoIP over WAN tutorial. About the author: Gary Audin has more than 40 years of computer, communications and security experience. He has planned, designed, specified, implemented and operated data, LAN and telephone networks. These have included local area, national and international networks, as well as VoIP and IP convergent networks in the U.S., Canada, Europe, Australia and Asia.
About G.711
ISDN audio telecommunication may in principle be accomplished in many ways, but most regular calls are compressed according to the G.711 recommendation of the CCITT (Comit Consultatif International Tlphonique et Tlgraphique, which nowadays has been integrated into ITU). G.711 allows compression by using logarithmic interpolation, which reduces 14 most significant bits into 8. As the sampling rate is 8 kHz, the transmission rate equals to the 64 kbps offered by one ISDN B-channel. There are two brands of G.711: A-law is dominant in Europe, whereas United States and Japan are commonly using u-law.
What is HD voice? Your standard POTS call captures and delivers sound in an audio range of 300 hertz to 3400 hertz with standards set back in 1937. The VoIP equivalent of POTS is G.711 and takes up 64 kbit/s of bandwidth. The baseline definition for wideband voice typically called HD voice is G.722. It delivers audio in the range of 30 to 7000 hertz, about twice as good as a typical POTS calls and G.711. Due to a little data compression on the fly, a G.722 phone call only takes up 64 kbit/s of bandwidth. The combination of upper and lower frequency sounds gives a much clearer and "richer" experience on voice calls with the key marketing-speak phrase used to describe it as, "Conversations sound as clear and natural as if talking to someone in the same room." Additional/complementary buzz-phrases include "a dramatically improved communications experience" and "Conference calls will be easy to follow and much less exhausting."
Why is HD voice such a big deal? Current quality of phone calls suck compared to FM radio or CDs and mobile calls suck more. Cellular tech heads started with a 1937-era audio standard and then ran the the quality of experience through a more via data compression blender to cram more calls into radio frequency (RF) spectrum. Implementing HD voice should make everything revolving around voice conference calls, IVR, speech-to-text, calls to Mum and the wee ones a much better experience. How do you deliver mobile HD voice? First, forget about the POTS network and all that legacy analogue crap. You need an all-IP network with low latency and enough bandwidth to transport a wideband voice call, so you need the latest hot-rocking 3G and 4G-esque data networks. France Telecom is delivering HD voice over the latest GSM HSPA-alphabet-soup via a soft client, but you can do the same thing on a fast enough WiMAX or LTE network. Qualcomm has done some demos over CDMA, but given the worldwide love of 4G, mobile HD on that tech might be some wishful thinking. You also need end-user devices (i.e. phones) with a quality microphone to capture 7 KHz of sound, enough CPU horsepower to encode and decode that information on the fly, and a speaker/headphone to deliver the sound to the human ear. Nokia and Sony Ericsson have announced phones that support AMR-WB, the de facto standard of mobile HD voice. You can also do mobile HD voice with a softclient and a
sufficiently powerful smartphone; expect to see HD voice clients for the iPhone being demoed by Global IP Solutions (GIPS) and Fraunhofer using codecs other than AMRWB. AMR-WB what the hell? AMR-WB (AMR-wideband) is the codec and heir-apparent replacement for AMR used in "standard" GSM calls to provide mobile HD voice. Also called G.722.2, it is designed to provide an HD voice experience in 24 kbit/s a big deal to the cellular world that wants to conserve both RF and network bandwidth. But there's no free beer when compared to G.722. AMR-WB requires more CPU cycles and number crunching for efficient compression which translates to shorter battery life. Further, AMR-WB is a patented codec with intellectual property contributed by France Telecom/Orange, Nokia, and Ericsson and VoiceAge. Alternatives to AMR-WB have been floated ranging from implementing G.722 to Skype's SILK to Fraunhofer providing an "AAC Enhanced Low Delay" codec based on MPEG. G.722 has the advantages of being royalty-free and not such as CPU devourer, but it takes up 64 kbit/s for the cellular RF heads, this is a theoretical show stopper, but since the mobile people are pimping their data networks to support two-way video calling with HD voice, the whole "conserve RF/conserve network bandwidth" argument is crap. Device manufacturers also like the fact that G.722 is a simple piece of code to implement relative to all the different profile flavors for AMR-WB. Skype wants everyone to use SILK and offers it as royalty-free and open source but after the skeletons as to who owned what IP after eBay bought Skype, well It doesn't stop people loading Skype clients on mobile phones and running SILK "natively." And it works just like normal phone calling, eh? If I have an HD voice phone and my bud doesWellnot really, not yet. Carriers and businesses running HD voice currently operate as islands you can communicate with someone within your network, but if carrier A has HD voice and carrier B has HD voice, you aren't going to be able to connect an HD voice phone call because the higher level SIP/IP connectivity isn't set up if you are using those oldfashioned phone numbers to "dial" another person. Some sort of HD voice interoperability / interconnection announcement at Mobile World Congress is purportedly going to take place where a group of mobile carriers have agreed to exchange AMR-WB calls among themselves and if so, this is one of those Key Announcements which will get HD voice moving faster. HD voice interoperability is not technically hard, since mobile carriers already have ways to exchange MMS and picture mail and all those other multimedia-loaded services via IP; supporting AMR-WB calls is just another data type to exchange via IP. But the politics is another story. Speaking of ugly, how do calls move between the PSTN, HD voice, AMR-WB, G.722, SILK, and whatever flavor-of-the-day codecs pop up? Calls need to be transcoded translated between codecs. For example, France Telecom already has to transcode between mobile HD voice users and its own PSTN connections to the rest of the world. And you have to transcode between AMR-WB
and G.722 (mobile HD voice and broadband HD voice), plus SILK since Skype wants its due for HD voice. Doug Mohney is Editor-in-Chief of HD Voice News (www.hdvoicenews.com) and is happy to cause heartburn in league with Mike Magee whenever he can.
Read more: http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice#ixzz2cO4dkViL

HD Voice Codecs

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HD Voice Codecs

Uploaded by

Copyright:

Available Formats

http://www.voipsupply.

Popular HD voice codecs - G.722, AMR-WB, SILK, iSAC

The problem with too many different codec

The technology of HD voice

What phones support HD voice?

Specific HD Voice benefits for businesses include:

Better comprehension and clarity

Conference calls rock!

Improved multi-national/multi-cultural communication

More accurate transcriptions

VoIP bandwidth fundamentals

G.722 operates at 64 Kbps, but offers high-fidelity speech.

G.723.1 6.3 Kbps 4.0 5.3 Kbps 3.5

Read more: http://news.techeye.net/mobile/the-two-minute-guide-to-mobile-hd-voice#ixzz2cO4dkViL

You might also like