Audio Over Ip - Practical Guide - Bro - en - v1

UPDA
TED!
rd
3 E dition
The practical
guide to
audio
IP
over
for Broadcast
the essential introduction to audio over IP brought to you by APT
Introduction & Contents
Over the last number of years, APT has gained extensive experience in the field of audio
over IP networking. We have supported many customers through the transition from
synchronous to IP and worked hard in standards bodies within the industry to ensure
interoperability of audio codecs over IP networks. Now, with the help of this booklet, we
would like to share our knowledge and experience with you.
This practical guide provides background information on IP networks, their implications

for real-time audio transfer and recommendations for those specifying both networks
and equipment for audio over IP broadcasting. We trust you will find the information
beneficial and will feel free to get in touch with any further questions you might have.
Table of Contents Page

Why IP for Broadcast Audio? 1
IP Network Considerations 2
Network Selection 2
Transmission Protocol Selection 4
Specifying Network Bandwidth and Size 5
Dealing with Network Jitter 6
Dealing with Delay 6
Dealing with Packet Loss 7
Concealment 7
Forward Error Correction 8
Quality of Service 9
Service Level Agreements 11
Synchronous Back-up 12
Network Testing 12
Ping 12
Traceroute 13
IP Connection Verifier 13
Equipment Considerations 14
Design Philosophy 14
Hardware Redundancy 14
Configurability & Quality of Service 14
Audio Algorithms 15
Management and Monitoring 16
IP Audio Applications 17
Wireless IP 17
18
Unicast &Multicast
Summary and Checklist 19
APT’s IP Codec Solutions 20
Copyright APT 2009

Why IP For Broadcast Audio?
The foundation of the broadcast industry for many years, synchronous networks have
been considered the industry standard for audio transport worldwide. Balanced analog
copper circuits, microwave and synchronous based systems such as V.35/X.21 or T1/E1
have been the traditional choice for studio transmitter and inter-studio links in
professional audio broadcast networks. Readily available from all major service
providers, the popularity of synchronous links has been largely due to the fact that they
offer dedicated, reliable, point-to-point and bi-directional communication at guaranteed
data and error rates.
However, the reign of synchronous links as the preferred choice for STLs is currently
coming under threat from a new challenger, in the form of IP based network technology.
While IP technology does have some disadvantages for audio transport, the benefits
over existing synchronous networks are increasingly proving too persuasive for
broadcasters to ignore:
• Cost - IP links are generally much cheaper to install and operate

• Greater flexibility - in contrast to finite bandwidth limits of synchronous networks
(i.e. T1 (1.5Mbits) or E1 (2Mbits)
• Greater scalability - the incremental cost of adding bandwidth is much lower with IP
• Widespread availability - IP links are generally available from all Service Providers
• Network efficiency - IP networks can offer flexible multipoint configurations
• Resource efficiency - use of IP for audio enables consolidation of Engineering and
IT skills
It is clear to see that the use of IP networks for audio delivery enables broadcasters both
to leverage their existing infrastructure and achieve
greater flexibility in terms of content sharing and
network configuration. It also provides them with a
scalable platform for future development in areas
such as HD-Radio, data services etc.
However, despite the scalability, cost-efficiency and

flexibility of IP networks, packetized audio
transport is not a perfect solution and broadcasters
must take care that the quality of their audio and
reliability of their audio delivery is not sacrificed in
the interest of cost savings.
This booklet provides an overview of the most

important issues that broadcasters need to
consider before embarking upon IP migration. The
issues are certainly not insurmountable and, if
correctly and intelligently implemented, IP audio
networking can have a significantly positive impact
on a radio station’s efficiency and output.
A Practical Guide to IP Audio Networking p1

IP Network Considerations
IP networks were designed to transport non-time sensitive data from point A to point B
within an acceptable timeframe. Should data get lost or delayed, files can be re-sent and
webpages refreshed. However, this is not possible in a broadcast environment where
reliable, real-time audio transport is an imperative.
In order to deliver successful audio over IP broadcasts, engineers need to familiarize

themselves with the nature and inherent characteristics of packetized networks. This
section looks at the choice of networks and transmission protocols as well as the issues
of jitter, latency and packet loss.
Network Selection
Service providers offer broadcasters a variety of different options for IP audio delivery.
These range from dedicated links with a guaranteed Quality of Service to the open
internet or contentious ADSL links. We will examine each option in turn and evaluate
their usefulness to the broadcaster.
Dedicated IP Links
Professional studio transmitter links and inter-studio networks require a reliability and
robustness that is just not available on unmanaged or highly contended networks. The
mission-critical nature necessitates a guaranteed service level that will ensure the
uninterrupted flow of packets from the sender to the receiver with minimum delay and
no loss of audio quality.
For these applications, most service providers will offer some form of dedicated IP
connection offering ‘always on’ access and, a choice of failsafe options to ensure mission
critical connectivity. In remote or unusual locations, this
may take the form of Wireless IP. This service should be
uncontended with no bandwidth sharing to avoid disruption
of on-air content. If this is not possible, the broadcaster
should request the lowest contention ratio possible and
certainly no greater than 10:1.
IP access service is typically backed by a Service Level

Agreement (SLA) and traffic priority mechanism. Without
such an agreement, a broadcaster will have no control over
IP network conditions and therefore no control over the
quality of the audio emanating from that network. Further
details on SLAs are included on page 10.
MPLS Links
Offering one of the highest levels of service possible with IP,
Multi-Protocol Label Switching (MPLS) virtual private
networks are increasingly replacing leased lines as the
transport mechanism of choice for STLs and SSLs. The
technology offers many of the benefits of leased lines in that
it is a connection-oriented service and so has the ability to
support bandwidth reservation and service guarantees. In
addition, it is also complementary to IP transfer and
therefore offers the cost, flexibility and efficiency benefits of
IP audio networking.
MPLS assigns each network packet with short (20bit) labels that describe the path which
that packet should take. In comparison to a traditional IP network where individual
routers make independent routing decisions, MPLS traffic is analyzed upon entry to the
MPLS cloud and assigned a 'label' which dictates its path throughout the network.
Without the need for each router to look up the address of the next node, MPLS offers a
faster, more efficient service than a standard IP connection. Additional information for
traffic class of service (priority) can also be included in the MPLS label to ensure
prioritization of critical, time-sensitive content.
Overall MPLS networks offer an attractive

solution for broadcast networks. They are
typically available at a lower cost than
traditional synchronous leased lines with a
higher performance than conventional IP
links. They enable scalable and flexible
networking, support Quality of Service and
will integrate with many transport methods
including IP, ATM and Frame Relay.
Public Internet
As we have noted, it is not recommended to utilize unmanaged networks such as the
open Internet, contentious ADSL links or contentious WANs or LANs for professional
broadcasting applications. However, in practice, the Internet is used for remote
broadcasts and it is possible to achieve high quality real time audio transfer using
contended IP links.
Utilizing the public internet means that the broadcaster will be more exposed to the
risks associated with IP links and, therefore, extra care needs to be taken to eliminate
any risks with regards the codec equipment and technology employed. As a minimum,
the codec should be DSP-based for rock solid reliability and offer remote configuration
and control over IP. In addition, the following should be ensured:
• Auto Re-connecting Codec - The codec used must enable fast re-connection if the
link is dropped. Some manufacturers’ codecs require a manual reboot at both ends to
re-establish the connection.
• Low Delay, ADPCM Coding - Perceptual coding technologies such as MPEG Layer 2,
AAC etc are frame-based and therefore require a minimum of one frame to be buffered
before compression is applied. If the link is dropped due to network outages, this
buffering will introduce additional delay into the audio stream. ADPCM algorithms
encode and decode 'on the fly' enabling instant audio immediately upon reconnection.
They also enable flexible packet sizes which can minimize the effects of dropped packets
on the audio stream.
• IP Packet Resequencer - In contended networks such as the public internet, there is

a higher likelihood that packets will be delivered out of sequence. Codecs that have been
developed for professional use should offer resequencing technology to ensure that all
packets received are played out in order thereby minimising audio glitches. An IP packet
resequencer will work within the receive buffer to re-order the packets according to
their RTP time stamp.

Transmission Protocol Selection
It is important to understand how the audio data is actually carried within an IP
infrastructure. The digitized audio stream is assembled into packets of audio
information for transmission over a network. This is achieved using Real-time Transport
Protocol (RTP) which attaches a timestamp and sequence number to each audio packet.
The receiving end will then use this information to re-assemble the packets in order and
play them out in the original sequence. It will also alert the destination to missing or out
of sequence packets.
UDP or TCP?
The RTP packet is further enclosed inside a Transmission Control Protocol (TCP) or User
Datagram Protocol (UDP) packet. There is a common mistaken assumption when
broadcasters first broach the subject of audio over IP that TCP will be the most
appropriate protocol. However, as a connection-oriented protocol, TCP dictates that the
receiving end must acknowledge receipt of every packet sent. Should a packet be
dropped, this protocol will cause the sender to repeatedly and fruitlessly request an
acknowledgment from the receiver producing unwanted data rate peaks on the link.
These peaks will deplete available bandwidth, cause audio glitches and create
unacceptable audio delay.
As a connectionless protocol, UDP employs a "send and forget" strategy, with no

acknowledgement required, making it much more suited to real time audio applications.
As the protocol is fairly lightweight, it also creates less overhead and processing delay
on the link. Although offering little in the way of payload protection or guaranteed
delivery, UDP is generally reliable in practice. Ethernet itself has some built-in error
recovery (CRC) and it is unusual, though not unheard of, for an Ethernet datagram to be
lost or corrupted. UDP is thus the preferred transport protocol for audio over IP, but
care should be taken to ensure that quality of service on the link is maintained at a high
level.
SIP and SDP

Typically IP codecs are connected together over a private LAN or dedicated WAN. When
this is not possible (e.g. when negotiating firewalls), then a signalling protocol must be
employed to make the connection. While several protocols are available, SIP is the most
widely accepted by both end users and standards bodies.
SIP is a signalling protocol for creating, modifying, and terminating sessions with one or
more participants. A lightweight protocol with only six messages, SIP minimizes
complexity and is also transport-independent so it can be used with both UDP and TCP.
As SIP is a peer-to-peer
protocol it is possible for clients
to connect directly with each
other using the concept of client
(audio codec) and server
(computer system used to direct
SIP calls). Larger systems will
require the use of proxy servers
to forward SIP calls towards the
intended destination (see
diagram) and registrar servers
which are essentially databases
of SIP clients.
Figure 1: A Typical SIP Session for Audio Transfer
SIP acts as a carrier for the Session Description Protocol (SDP), which describes the
media content of the session, e.g. what IP ports to use, the algorithm being used etc.
Once the connections have been made, SIP endpoints simply exchange media streams -
typically using RTP over UDP.
Specifying Network Bandwidth and Packet Size

The choices made with regard to audio settings (e.g. algorithm, mode, sample rate etc.)
will define the data bandwidth required to transport encoded audio over the IP network.
For synchronous connections this actually equals the bandwidth required to transport
the compressed audio, but in IP we must add an overhead that is required to packetize
the audio data.
Figure 2: How an IP packet is constructed

Encapsulation into an IP packet adds Ethernet header bytes containing information
related to the routing of the packet along the IP Network, as well as header bytes for
UDP and RTP, an ethernet preamble and CRC (12 Bytes). The packet information is
examined by the routers and switches which constitute the routing intelligence on the
network and take decisions as to how, when and where to route each packet.
Audio Data Audio Packet IP Packet Size IP Packets/ Packetization

IP Data Rate
Rate Size (bytes) (bytes) sec Delay (ms)
128 194 62.5 16 97 kbps
256 322 31.25 32 80.5 kbps
64 kbps
512 578 15.625 64 72.3 kbps
1280 1346 6.25 160 67.3 kbps
128 194 125 8 194 kbps
256 322 62.5 16 161 kbps
128 kbps
512 578 31.25 32 144.5 kbps
1280 1346 12.5 80 134.6 kbps
128 194 250 4 388 kbps
256 322 125 8 322 kbps
256 kbps
512 578 62.5 16 289 kbps
1280 1346 25 40 269.2 kbps
128 194 375 2.7 582 kbps
256 322 187.5 5.3 483 kbps
384 kbps
512 578 93.75 10.7 433.5 kbps
1280 1346 37.5 26.7 403.8 kbps
128 194 562.5 1.8 873 kbps
256 322 281.25 3.6 724.5 kbps
576 kbps
512 578 140.625 7.1 650.3 kbps
1280 1346 56.25 17.8 605.7 kbps
Figure 3: Table Showing relationship between IP bandwidth requirements, packetization

delay and data rates

The packet headers containing this data must be included in every packet that originates
from an IP device on an IP network. Hence, there is a correlation between packet size
and bandwidth requirements. The table above (Fig 3) details the IP data rate for audio
transportation against corresponding synchronous data rate.
While choosing a larger packet size will reduce the overall bandwidth requirements and
network jitter (see below), it also means that if a packet is dropped, a correspondingly
larger amount of payload i.e. audio is dropped. In addition, some networks are
configured to work only with IP packets below a certain size - the Maximum
Transmission Unit (MTU) - and will fragment larger packets using a process that works
poorly with RTP.
On the other hand, reducing packet size will reduce packetization delay at the cost of
higher bandwidth requirements. Finding the optimum packet size will always be a
balance between bandwidth efficiency, network latency and audio quality.
Dealing with Network Jitter

It is a feature of packet switched networks that any packet can take any route from
source to destination, and jitter is the inevitable side-effect of this feature. Jitter occurs
when packets arrive either side of their predicted arrival time and the receiving codec is
unable to achieve real-time playout. The larger and more complex the network
structure, the more susceptible the data stream will be to jitter.
Transmission
Reception
Playout
Figure 4: Network Jitter Effects

The effects of jitter can be overcome by the use of a jitter buffer which stores enough
packets to compensate for the effects of out-of-sequence packets. The above diagram
shows the effect of network jitter on the reception of audio and its subsequent playout
through an audio system. The buffer depth will usually be set in milliseconds but, for the
purposes of this example, it is set to a two packet buffer. Provided the network jitter is
low, the system is unaffected and plays out the packets received in sequence. However,
should jitter increase beyond the pre-determined buffer, packets which arrive after the
determined playout time will be dropped, resulting in corrupted audio.
Again a trade-off is necessary, this time between the size of the jitter buffer and the
delay introduced. Setting a large buffer to minimize the effects of jitter may substantially
increase the overall network delay.
Dealing with Delay

All networks have transport latency due to the natural laws of physics. Transporting an
electronic signal through any medium will take a finite amount of time that cannot be
removed.
In an IP network where there is both the standard transmission delay and also the
packetizing delay to contend with, this latency will typically equate to approximately 10-
30 ms. As noted above, packet size and jitter buffering will also have an effect on the
delay experienced.
The latency figure quoted represents the inherent latency throughout the network as the
data passes through switches, routers etc and does not include audio compression delay
nor sample frequency effects. Any coding delay resulting from the use of compression
will add directly to the existing latency of the system. The choice of audio compression
algorithm is therefore critical in determining the end-to-end latency of the system and
low delay coding techniques must be selected for for real-time audio over IP
applications.
Dealing with Packet Loss

Depending on the quality of an IP link and the bandwidth available, packet-based
systems can be susceptible to dropped packets. The resultant loss in audio is related to
the size of the lost packets, the number of packets lost and the compression ratio used.
Packet Size No Of Audio

Algorithm Mode Bit Rate (Bytes) Samples Audio Lost
MPEG Layer 2 16 Bit Stereo 256kbps 768 2304 24ms

Enhanced apt-X 16 Bit Stereo 256 / 384 kbps 512 64 16ms / 10.7ms
Enhanced apt-X 16 Bit Stereo 256 / 384 kbps 64 8 2ms / 1.3ms
Figure 5: Table showing how choice of compression algorithm affects packet loss
With frame-based algorithms such as MPEG, the loss of any packet in a frame requires
the frame to be discarded. Therefore, using small packet sizes in conjuction with these
coding technologies will not bring any benefit or lessen the effects of packet loss.
Enhanced apt-X requires no frame buffering and offers greater flexibility in packet size
selection which reduces the susceptibility of an audio stream to the consequences of
packet loss. Packet sizes with durations shorter than the 3 msec psychoacoustic gap
detection threshold are easily achieved with Enhanced apt-X.
Theoretically, if a packet is lost, the receiving codec could request that the sending codec
retransmit the packet in question but this is usually impractical as the delay involved
would be substantial. The other options for dealing with packet loss are concealment,
correction or temporarily abandoning the packetized network in favor of an automated
backup to a synchronous network.
Concealment
Various methods can be used to conceal lost packets in the final reproduction of the
audio. They range from simple repetition of the last good packet received, to
silence/noise injection or interpolation and retransmission. All have an impact on the
reproduced audio.
In listening tests the injection of silence produced unacceptable breaks in the audio that
led to a level of incoherence. The use of white noise improved the intelligibility of the
reproduced audio but was again noticeable. The use of repetition of the last known good
frame produced more favorable results.

The use of interpolation/pattern matching/waveform substitution to conceal the loss of
packets is possible but the benefits versus complexity are governed by a law of
diminishing returns. The results of these techniques are all governed by subjective
improvements in audio quality and are also subject to the amount of audio lost that is
being concealed or repaired.
Figure 6: Packet Loss Concealment
None of these concealment options produce an easily workable solution and it is the
generally accepted view that a better approach is to minimise the packet loss rather
than trying to disguise it.
Forward Error Correction

Forward Error Correction (FEC) is a means by which lost packets can be reconstructed
for playout. The simplest form of FEC adds redundant data based on the XOR of the data
in each packet with at least one or two other packets (Figure 7). The resultant FEC
packet is added to the transmission and used in conjunction with the data received to
correct any errors present and reconstruct the audio stream.
Figure 7: A Basic FEC Scheme

While this basic form of FEC works well for small amounts of random packet loss, it
cannot deal with the more common occurrence of burst packet loss (ie several adjacent
packets lost at once). In order to deal with burst errors, a more complex FEC scheme
such as that shown in Figure 8 is required. This FEC scheme calculates in two
dimensions which provides more data to the recovery engine.
As you can see from both Figures 7 & 8, any form of FEC will add substantial overhead
to the audio stream and, in some cases, the transmission bandwidth is effectively
doubled. In networks where bandwidth limitation or congestion is a problem, using FEC
may only exacerbate the problem.
In addition to the bandwidth issues it introduces, calculating FEC at the encoder is

processor intensive and will introduce latency making it unsuitable for use in real time
applications.
The complexity of the FEC, the packet size and compression ratio used are all factors
which influence the resulting delay. For example a two by two FEC requires the buffering
of four packets. Given our earlier calculations concerning the amount of audio in an
MPEG L2 packet, this equates to 96ms. A two by two FEC will only protect against a small
burst error and the more realistic five by five FEC (as shown in figure 6) will require 25
packet buffering which, using the same calculations, is equivalent to 600ms delay.
Sample Frequency
x
Compression Ratio
x
Packet Size
(samples)
x
FEC Width
x
FEC Depth
=
Resultant Delay
Figure 8: A Two Dimensional FEC Scheme
Recovery on the decoder-side is also processor intensive. The process of amassing the
required block of packets, determining the location of the lost packets and resolving
them one by one can be a lengthy and complex procedure.
As with concealment, the use of FEC can cause as many, if not more, problems than it
solves. It can go some way to overcome the inadequacies of an IP based transport
mechanism but at a cost of additional delay, complexity, bandwidth and processing
overhead.
For professional STLs and audio backhaul, the emphasis should be on ensuring that the
IP network used is of sufficient quality to guarantee minimal packet loss. Implementing
methods to conceal or correct errors is an unnecessary distraction to the main aim of
ensuring reliable, robust audio delivery over an IP link.

Quality of Service
In order to improve upon the basic transport service offered by typical IP networks,
known as "Best Effort Service", many service providers will offer mechanisms to
guarantee the delivery of time-sensitive content. An audio stream on an STL, for
example, will require minimal interruptions in packet flow to avoid jitter and delay while
data may only require that it reaches its destination within a reasonable timeframe.
Quality of Service (QoS) was designed to provide a mechanism which allocates different
levels of service or priority based on the importance and time-sensitivity of the traffic.
There are two main methods for the improvement of link quality: RSVP and DiffServ.
RSVP (Resource reSerVation Protocol) is more complex and involves the reservation and
relinquishing of required resources throughout the network. DiffServ (Differentiated
Services) on the other hand offers a traffic classification framework that evaluates the
priority of network traffic on a "per hop" basis. Using Diffserv, each packet is classified
and awarded a DSCP (Diff Serv Code Point) value that is evaluated by the network and
prioritized accordingly.
With DSCPs 0 through 63 available, a maximum of 64 different aggregates/classes are

theoretically possible. They can be entered as numeric values or as special keyword
names known as Per-Hop Behaviours (PHBs).
There are four main classes of PHBs which are detailed in the table opposite. Because
of the intense efforts required to determine the appropriate class of traffic for packets,
it is recommended to minimize the number of classification occurrences within the
network infrastructure - four classes is the typical value.
Service Level Agreements

The use of SLAs can provide long-term stability on an IP Broadcast link in guarding
against packet loss. Any Telco or provider will issue a SLA for an IP link typically
guaranteeing uptime in percentage terms. This percentage can be reconciled to criteria
such as lost packets and actual down time on the link.
Typically an SLA will consist of the following parameters:
• The values of QoS metrics which the service provider will guarantee for the
client's traffic. This will usually include the delay across the network,
maximum jitter and packet loss levels.
• The values of non-QoS metrics of the service such as availability which for
broadcast applications should be 99.999% or higher.
• The scope of the service i.e the specific routers between which the SLA
prevails
• The traffic profile of the stream directed to the service provider. This is
particularly relevant in applications such as HD Radio where the inclusion of
HD data can cause the data rate to exceed the average. The burst data rate
must be considered to avoid an increased level of contention.
• Performance monitoring procedures and expected levels of reporting
• Support and troubleshooting procedures including time-frame for response
and resolution and consequences for non-compliance
• The administrative/legal part defining processes for requesting and
cancelling certain services.
DSCP Binary DSCP
Description
value Value Name
Best Effort (BE) - With Best effort delivery, there are no guarantees
BE/ that data is delivered nor that it will be of a certain quality. BE traffic
0 000000
Default is the default setting for all IP traffic and indicates that the bit rate
and delivery method may vary depending on the current traffic load.
8 001000 CS1
16 010000 CS2
Class selector (CS). CS code points enable backward compatibility
24 011000 CS3 with the IP Precedence field - an early attempt to establish a QoS
32 100000 CS4 standard. The Class Selector codepoints are of the form 'xxx000' with
40 101000 CS5 the first three bits composed of the IP precedence bits. Each IP
precedence value is then mapped into a DiffServ class.
48 110000 CS6
56 111000 CS7
10 001010 AF11
12 001100 AF12
14 001110 AF13
18 010010 AF21 Assured Forwarding (AF) -AF PHB provides an assurance of delivery
20 010100 AF22 as long as the traffic does not exceed a defined rate. Traffic that
22 010110 AF23 exceeds the subscription rate faces a higher probability of being
dropped if congestion occurs. AF provides four different forwarding
26 011010 AF31 classes that you can assign to a packet. Every forwarding class
28 011100 AF32 provides three drop probabilities, which yields a total of 12 DSCP
30 011110 AF33 values from AF11 to AF43.
34 100010 AF41
36 100100 AF42
38 100110 AF43
Expedited Forwarding (EF). Expedited forwarding offers low delay,
low packet loss and low jitter and is often given priority above all
other traffic classes. This makes it highly suitable for real-time
services and critical content such as audio delivery. A packet that is
46 101110 EF
marked with 46 receives guaranteed low-drop precedence as the
packet traverses Diffserv-aware networks en route to its destination.
EF traffic should be limited to a maximum of 30% of the capacity of a
link.
Figure 9: Table Outlining the Main Classes of Per-Hop Behaviours in QoS
Of course the actual SLA is a legal document with lots of fine print. For example, the SLA
for AT&T's Virtual Private Network Tunnelling Service is part of a 78 page long
document! Generally the SLA will specify how the customer is to monitor performance
under the SLA, often via an online tool.
If performance fails to meet the figures specified, the SLA also covers the formulas to
determine penalties to the carrier (most often in the form of credits to the customer, not
refunds). These can consist of both "reactive" and "proactive" components. In order for
credits to apply, the customer must follow the rules specified in the SLA, with regards to
reporting reactive SLA problems etc. Proactive components are to be handled by the
provider with credits issued automatically.
An SLA may only be available to the broadcaster with certain revenue commitments
(contract amount) or periods (contract duration).

Synchronous Back-up
Even with all necessary due diligence applied in the selection of the IP Network & Service
Provider, there is still the possibility of a major outage on the network. This can equate to
consecutive dropped packets over a sustained period of time but is more likely to be large
consecutive bursts of dropped packets. In the event of such a failure, the broadcaster will
be off-air unless they have a backup. A primary IP link can be backed up either by a
secondary IP link supplied by a different service provider, or a point-to-point synchronous
link. A many to one backup is also possible but will only be effective if the Primary IP links
are diversely routed.
Professional audio codecs will provide the ability to trigger the backup from the primary
IP link to the secondary synchronous link using a number of different criteria such as
silence on the audio output of a specific audio module or a defined threshold in the
Performance Monitoring log.
Figure 10: Synchronous Back-up on APT’s WorldCast Eclipse
Similarly, the automated restore back to the primary IP link could be defined in the
Performance Monitoring log i.e. number of consecutive packets received without a single
drop would equate to a restoration of the primary link.
Network Testing
Once your network has been installed, you should confirm that it meets the criteria of the
SLA you are paying for. In the case of a network with no SLA or QoS, you will also need to
confirm if it will be suitable for your needs. This qualification testing should be done
rigorously over a period of at least a week. This can generally be done rather easily with
software tools on a regular PC (see below). If the network will also handle non-audio data,
pay careful attention to assign codecs to the correct "Class of Service" (COS) using the
Difserv or other QoS mechanism. The amount of data at a given COS level must not
exceed the amount specified in your contract or severe problems can result.
Your first round of testing can be done using the tools Ping and Traceroute. These are
available on most computers built into the OS. We'll give a quick overview of using them
with WindowsTM but you can use other implementations if you wish.
Ping
This is the most basic "hello? are you there?", "yes I am here!" test for any IP network.
When anything IP-based fails to connect, ping should be the first tool out of your IP
toolbox.
Once the codecs and network are hooked up, you can connect a PC at one end and use it
to ping the codec at the far end. Later you should reverse the test.
First go to the WindowsTM ‘Run’ menu and enter "CMD". This should open a black DOS
command menu. Now enter the IP address of the far end codec in the format:
ping 123.456.789.101 <enter>
You will see that if the ping is sucessfull, a latency figure will be shown. Next, enable a
continuous ping for a period of time, and look to see the maximum latency variation by
entering the following command:
ping 123.456.789.101 -t <enter>
Allow it to run for at least 15 minutes. To stop the ping enter CTRL-C.
Traceroute
Traceroute gives the IP router itinerary of your packets. It will list each router your
packets travel through, plus a latency for each "hop". Just like travelling by plane, where
the more stops and plane changes you have, the higher the odds you may be delayed at
some point, the same is true of the number of routers used to relay your packets to their
destination.
To use traceroute, simply open a command window (see above) and type the following
command: tracert 123.456.789.101 (where 123.456.789.101 is the IP address of the far
end codec).
IP Connection Verifier
It is recommended that second-round testing be performed as well. In this case, the tests
should be configured to emulate the audio data closely. The APT IP Verifier software tool
can be configured with the same QoS, and IP ports you will be using for audio. Next start
adding applications, if any, starting with those at
the lowest COS to the network. Once all non-audio
applications are successfully working, and your
testing continues to show that the top COS is
working properly, deploy the audio network. Then
use the diagnostics built into your codecs to
continue to monitor a few weeks further.
APT's IP connection verifier tool is available to any

interested party. It is a PC application that runs
under WindowsTM. A copy is run on a PC at each
location. Each site can transmit or receive packets,
or can do both. The transmission tab (shown)
allows configuration of the destination IP address,
destination port, and packet size. The receive tab
permits configuration of listen port and provides
detailed performance statistics. The IP Verifier tool
is also helpful for pre-testing ad-hoc networks in
advance of deploying codecs. Contact APT for a
copy of this useful tool. Figure 11: Transmission Control on
APT’s IP Connection Verifier

Equipment Considerations
A typical IP network is made up of much more than just links for audio transfer. A
professional approach to audio transport over IP requires mastery of not just the
network, but also the suite of hardware and software tools which allow control,
supervision and operation of packetized audio delivery.
These tools enable not only network and equipment monitoring, but the implementation
of remedial action, hardware redundancy and error alleviation. Where possible, the
broadcaster should seek to source an integrated solution which delivers all these
services in a single product, specifically the audio codec. This integrated solution allows
the administrator to manage both audio AND data services from a central location either
by a unified control software or on a higher level by SNMP.
Design Philosophy
The design philosophy behind products is a key factor to consider when purchasing
equipment for use in a professional broadcast environment. There are two key
approaches: DSP-based or PC-based product development.
PC architecture uses off-the-shelf motherboards which

are generic, low cost platforms not designed for use with
audio or 24/7 operation. Instability and memory leaks
within the operating core can lead the system to "hang" as
a PC is prone to do.
DSP-based systems on the other hand are designed from

the outset for high quality audio delivery and signal
integrity. They typically offer faster boot-up operation,
much greater stability and a significantly greater operating
bit depth (resolution).
While it may be acceptable for a home user to reset their

PC, it is definitely not acceptable for professional
broadcast applications and PC-based architecture should
be avoided for “always on" applications.
Hardware Redundancy
For mission-critical STL applications, hardware redundancy is vital to ensure back-up in
the case of network or equipment failure. A broadcaster must consider the importance
of each link and source equipment that conveniently provides the necessary fail-safe
options. Hot-swappable audio modules, redundant power supplies and automatic back-
up functionality are just some of the options that should be considered.
Configurability & Quality of Service

As noted in the previous section, there are many variables in the world of IP networking.
It is therefore vital that the audio codec selected provides the broadcaster with the
flexibility and control to manage anomalies on their IP network and get the best quality
audio performance from the bandwidth available. This will typically include audio setting
configuration, control of packet size, ability to buffer audio to compensate for jitter, and
the ability to set Quality of Service at the transmission point.
The codec should also provide maximum flexibility with regards to network
configuration, allowing the broadcaster to easily implement unicast, multiple unicast
and multicast applications.
Audio Algorithms
Having prepared your IP network for audio transport, the next step is to choose the best
method of sending audio down the link. Restrictions in available bandwidth will often
rule out linear/PCM audio and some form of compression is usually required. There are
two main types of compression techniques: ADPCM and Perceptual algorithms.
Perceptual based algorithms (such as MPEG L2, MPEG L3 (MP3), AAC and their many
derivatives) use psycho-acoustic based principles which analyze audio content and
determine what is audible to the human ear. The algorithm will remove all inaudible
content and is therefore, by definition, "lossy". Using multiple passes of a perceptual
codec (for example, consider the broadcast chain for HD Radio or DAB) will result in
content heavy with artifacts. Ultimately this will cause "listener fatigue," swiftly
followed by tune-out to a station offering higher audio quality.
Additionally, perceptual coding will introduce a delay to the audio delivery which is
generally unacceptable for real-time audio applications. Working on the assumption
that the IP transport stream and packetization will naturally introduce a minimum delay
of 20 milliseconds, it is imperative to minimize the latency of the compression algorithm
employed. In essence, using a perceptual coder, even a low delay variant, will render the
solution unusable for any level of real-time broadcast such as talkback applications and
off-air monitoring.
ADPCM algorithms offer a more attractive

alternative given their gentler, non-destructive
approach to coding. ADPCM-based, Enhanced apt-
X® technology delivers both exceptional acoustics
and ultra low delay, making it particularly suited for
audio over IP applications. Enhanced apt-X
overcomes the problems associated with multiple
psycho-acoustic passes of audio in the broadcast
chain as it is extremely resilient to tandem coding,
retaining acoustic integrity up to and beyond 10
encode-decode cycles.
Along with the well-documented features of low

latency and audio performance, Enhanced apt-X
also features AutoSyncTM, an embedded word
pattern which aids connection and synchronization
and complements the packetizing nature of UDP/IP.
As a non-frame based algorithm, Enhanced apt-X
allows for smaller packets (as small as 64 bytes)
contributing less delay and enabling quicker
synchronization. The ability to start synchronisation
on receipt of the next valid sample and to achieve
full synchronisation within 3ms @ Fs=48kHz
ensures faster recovery from packet loss, making
dropouts less audible.

Management & Monitoring
With a wide number of variables and constantly changing network conditions within the
field of IP audio networking, it is vital that broadcasters have access to extensive control
and monitoring capabilities. This can be achieved either by front panel control, SNMP or
a dedicated Management System software package.
Whichever option is selected, the user should ensure that it provides them with the
following capabilities:
• At-a-glance status of all codecs throughout network

• Flexible configuration of audio settings: algorithm, sample rate, data rate,
mode etc…
• Ability to define audio profiles for quick and simple configuration
• Flexible configuration of transport link. For IP, this will involve setting up
packet size, jitter buffers and IP unicast and multicast routes. It also
encompasses the configuration of automatic back-up to either a secondary IP
port or synchronous links. For synchronous back-up, transport configuration
will also require extensive functionality in the form of timeslot allocation, drop
and insert configuration, speed-dial set-up etc.
• Performance Monitoring providing statistics on packets transmitted and
received, error counts, sequence errors etc..
• Ability to set critical, major and minor alarm conditions relating to issues such
as silence detection, loss of connection, loss of sync & exceeding jitter buffers
• Ability to set conditions which are triggered on alarms i.e. switch to automatic
backup and revert after nsecs of stable audio stream.
• Alarm and Event Logs to enable analysis of recurring errors and
conduct accurate
network diagnostics.
• Remote Software
Update Control
Fig 12:
Configuration and
Monitoring using the APT
Codec Management System
IP Audio Applications
Wireless IP Applications
When wired ISP services to the studio or transmitter sites are unavailable, the other
option is to use IP over a wireless link. RF / Microwave connections can be suitable both
for Studio Transmitter Links and Remote / Outside Broadcast applications.
Studio Transmitter Links

Wireless IP links constitute an alternative STL approach for those who are unable to get
a reliable ADSL or FiOS (Fiber Optic Service) to their premises or those for whom the
MRC (Monthly Recurring Charge) of a managed MPLS-type connection is out of reach.
Both point to point and point to multipoint microwave solutions are available operating
in both licensed and unlicensed (typically 5.8Ghz) bands. Popular choices for radio
systems include Proxim Tsunami and the Motorola Canopy.
There are additional considerations for those contemplating the use of RF IP links for
STL applications; extra care must be taken to ensure the path calculations are for
reliability over speed. Typical IP applications allow data to be re-sent, and RF links are
therefore usually optimized for speed. Audio networks require error rates that are
significantly lower and, thus, this must be taken into account during the design stage or
results will suffer. If a design consultant is used, be sure that s/he has experience
designing links for IP audio.
Remotes /Outside Broadcast

With the ever-increasing coverage of high speed wireless broadband access, the options
for conducting high quality remote broadcasts are greater than ever.
The ever-greater speeds available to broadcasters coupled with the cost-efficiency of

the solution (no expensive ISDN line rental or long-distance charges) mean that it is an
attractive alternative to the traditional RPU approach. Verizon Wireless and Sprint
Nextel both offer 3G EV-DO services which lend themselves well to this application.
Figure 13: A Typical Wireless IP Application for Remote Broadcast
While wireless IP services are widely available and many broadcasters are using them
successfully, the observations made previously with regards use of the public internet
still apply. As much as possible, every effort should be made to secure some form of
bandwidth guarantee and the network should be tested thoroughly before transmission.
Key elements of a successful wireless set-up are:
• A reliable, DSP-based codec running a low delay algorithm

• A fixed IP address from your DSL or wireless broadband provider
• IP connections from the same provider at both ends

Unicast Applications
Unicast links offer simplex, point-to-point connections. Despite their simplicity, they
still play an important role in the construction of professional audio networks.
In the example below, a stereo codec at the studio site has established a multiple unicast
to a number of transmitter sites. The studio is able to monitor the off-air content by
means of a return feed from Transmitter Site 3.
A
A
A
A
A
Multicast Applications
Multicasting is a highly efficient technique used to transmit from a single audio source
to many destinations using the IP infrastructure. The source sends the IP packets to a
multicast router using a Multicast Group address as its IP destination address.
Receivers use the same address to inform the network that they are interested in
receiving packets sent to that group. This is carried out using Internet Group
Management Protocol (IGMP). The nodes in the network take care of copying the IP
packets and routing them to all subscribed destinations.
Multicasting is a useful technique for the

reduction of network congestion as the
packets are only replicated when two
separate receiving sites can only be
reached using separate paths.
In the diagram, a studio is multicasting

to a number of transmitter sites. As can
be seen, a unicast stream can also be
used to enable a studio to monitor
one of the transmitter sites.
Summary & Checklist
As we mentioned at the beginning of this guide, the potential benefits of switching from
Synchronous to IP audio networks can be substantial. It is possible to achieve greater network
efficiency and considerable cost-savings while still maintaining the quality and reliability required
for professional audio broadcasting. In order to achieve this, the following criteria must be
assured:
Network Checklist
Fully uncontended symmetrical line for duplex

connections (1:1 contention)
Bandwidth must be equal or higher than audio

bit rate plus overhead (see table P5)
Calculation of optimum packet size and jitter

buffer to suit network conditions
Guaranteed Quality of Service through the use

of SLAs (Service Level Agreements)
Pre-deployment & routine testing of network

using Ping, Traceroute and IP Connection
Verifier
Equipment Checklist
Robust, DSP-based hardware codec with high
level of redundancy
Support for SIP/SDP to ensure interoperability

with other manufacturers’ codecs. (In
accordance with EBU Tech 3326 N/ACIP
standard.)
Configurable parameters to adjust for

network conditions
Powerful performance monitoring to help

enforce SLAs
Low delay algorithm such as Enhanced apt-X

to counter inherent delay associated with IP

APT’s IP Codec Solutions
WorldNet Oslo
The jewel in the crown of APT's broadcast audio codecs, the
WorldNet Oslo offers broadcasters and service providers a
flexible, highly reliable and multi-featured audio multiplexing
solution for Studio Transmitter Links and Inter-studio
networking.
• Modular, Multiple Channel Audio Multiplexer • Redundant Power Supplies
• Audio, Voice & Data over T1, E1 or IP • All cards hot-swappable
• Up to 4 audio channels per card, up to 6 audio cards • Automatic Back-up and Restore for link & audio
• Enhanced apt-X, MPEG L2, J.57, J.41 or Linear audio • Powerful Codec Management System
• Analog and AES/EBU audio interfaces • Supports SIP/SDP for EBU N/ACIP compliance
• 5.1 Phase-Locking for seamless surround sound • In-band Management over E1/T1 link
Designed to transport both compressed and uncompressed audio, voice and data over
various digital networks, the WorldNet Oslo is based around a 19 inch, 3U high standard
rackmount chassis which is card-based expandable. Redundant power supplies, "hot-
swappable" cards and automatic back-up functionality ensure 24/7/365 reliability for
mission-critical applications.
Various network interface modules eliminate the need for external multiplexers or
media converters. Audio can be transported via synchronous or packet-switched
networks with support for T1 (1.5Mbit/s), E1 (2Mbit/s) and Ethernet (IP) interfaces.
A maximum of 24 audio channels in simplex mode and 12 audio channels in duplex mode
are possible in each frame. Plug-in audio modules in over 15 different configurations
offer analog, AES/EBU, simplex, duplex and 5.1 phase-locked options. As well as
uncompressed linear audio, J.57, J.41 and MPEG L2, the WorldNet Oslo also supports 16
or 24-bit Enhanced apt-X offering cascade-resilient, near-lossless audio quality with
under 2ms delay.
Codec and Network Management Systems

Supplied as standard with the WorldNet Oslo and all WorldCast IP audio codecs, APT’s Codec
Management System (CMS) Software provides control of up to 6 units
on a network. The exceptionally intuitive user interface offers at-a-
glance status and alarm monitoring as well as extensive configuration
capabilities for both audio and transport settings. A demo version can
be downloaded from www.aptx.com
For large-scale broadcast networks, APT’s Network Management

System offers control of unlimited codecs, hierarchical views, event
logging, extensive alarm configuration and much more...
WorldCast Codecs
Designed specifically to deliver optimum audio performance and reliability over IP networks,
WorldCast stereo IP Audio Codecs are the choice of professional broadcasters worldwide and
many hundreds of WorldCast units have already been deployed in STLs and studio-studio links
throughout the globe. The core features of all products in the range include:
• Outstanding Audio Quality - All WorldCast codecs offer both linear & Enhanced apt-X coding
• High Compatibility - All WorldCast codecs support the SIP/SDP protocols enabling
quick and easy connection to all compliant IP codecs.(according
to EBU N/ACIP Standard)
• Consistent Reliability - All WorldCast codecs are designed around a DSP-based

architecture to ensure 24/7 reliability.
WorldCast Eclipse
A multi-interface, multi-algorithm codec for the ultimate in
flexibility, the WorldCast Eclipse delivers bidirectional stereo
audio over multiple networks:
• IP (allowing connection to other codecs linked to Wide Area and Local Area Networks);
• X.21/V.35 (allowing connection to high speed fixed synchronous networks);
• ISDN (allowing connection to other codecs over dial up digital ISDN links)
Linear audio, Standard 16-bit apt-X, Enhanced 16 & 24-bit apt-X are supplied as standard and
an optional multi-algorithm suite incorporating MPEG 1/2 Layer II/III, MPEG 4 AAC, G.711 and
G.722 is also available. A rich array of features are provided on the WorldCast Eclipse
including Automatic Back-up, Silence Detect, Contact Closures and Alarm Ports.
WorldCast Equinox
WorldCast Equinox is a multi-algorithm, fully duplex, stereo
audio codec offering IP, Leased Line (X.21/V.35) and ISDN connectivity. Offering unprecedented
redundancy for a stereo IP audio codec, users have the option of dual IP interfaces, dual ISDN
ports and dual power supplies all on a cost-effective 1RU rackmountable unit.
This reliability coupled with the commitment to delivering broadcast-grade audio using linear
and Enhanced apt-X means that the WorldCast Equinox is the most professional and affordable
choice for broadcasters worldwide. The unit was voted a winner in the Radio World “Cool Stuff”
Awards at NAB 2009 for its innovative approach and appeal.
WorldCast Horizon
The WorldCast Horizon is a fully duplex, two channel stereo codec
designed to enable real-time transport of broadcast quality audio over IP networks.
Both analog and digital (AES/EBU with external reference) units are available.
In addition to linear audio, the WorldCast Horizon incorporates Enhanced apt-X coding
technology which, thanks to its low delay and exceptional acoustic properties, is particularly
suited to the transport of audio over packet-switched networks. Contact closures and opto-
couples for remote status alarms are also provided.

APT:
729 Springfield Road
Whiterock Business Park
Belfast, BT12 7FP
United Kingdom
T: +44 28 90 677 200

F: +44 28 90 677 201
E: info@aptcodecs.com
WorldCast Systems:
20, av Neil Armstrong,
Parc d'Activités J.F. Kennedy
33700 Bordeaux-Mérignac
FRANCE
T: +33 557 928 928

F: 33 557 928 929
WorldCast Systems Inc:

19595 NE 10th Av, Suite A
Miami, FL 33179
USA
T: 305 249 3110

F: 305 249 3113
www.aptcodecs.com
www.WorldCastSystems.com
© Copyright APT Ltd 2009

Audio Over Ip - Practical Guide - Bro - en - v1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Audio Over Ip - Practical Guide - Bro - en - v1

Uploaded by

Copyright:

Available Formats

UPDA

This practical guide provides background information on IP networks, their implications

Table of Contents Page

Copyright APT 2009

• Cost - IP links are generally much cheaper to install and operate

However, despite the scalability, cost-efficiency and

This booklet provides an overview of the most

A Practical Guide to IP Audio Networking p1

In order to deliver successful audio over IP broadcasts, engineers need to familiarize

IP access service is typically backed by a Service Level

Overall MPLS networks offer an attractive

• IP Packet Resequencer - In contended networks such as the public internet, there is

A Practical Guide to IP Audio Networking p3

As a connectionless protocol, UDP employs a "send and forget" strategy, with no

SIP and SDP

Specifying Network Bandwidth and Packet Size

Figure 2: How an IP packet is constructed

Audio Data Audio Packet IP Packet Size IP Packets/ Packetization

Figure 3: Table Showing relationship between IP bandwidth requirements, packetization

A Practical Guide to IP Audio Networking p5

Dealing with Network Jitter

Figure 4: Network Jitter Effects

Dealing with Delay

Dealing with Packet Loss

Packet Size No Of Audio

MPEG Layer 2 16 Bit Stereo 256kbps 768 2304 24ms

A Practical Guide to IP Audio Networking p7

Figure 6: Packet Loss Concealment

Forward Error Correction

Figure 7: A Basic FEC Scheme

In addition to the bandwidth issues it introduces, calculating FEC at the encoder is

Figure 8: A Two Dimensional FEC Scheme

A Practical Guide to IP Audio Networking p9

With DSCPs 0 through 63 available, a maximum of 64 different aggregates/classes are

Service Level Agreements

Typically an SLA will consist of the following parameters:

A Practical Guide to IP Audio Networking p11

Figure 10: Synchronous Back-up on APT’s WorldCast Eclipse

APT's IP connection verifier tool is available to any

A Practical Guide to IP Audio Networking p13

PC architecture uses off-the-shelf motherboards which

DSP-based systems on the other hand are designed from

While it may be acceptable for a home user to reset their

Configurability & Quality of Service

ADPCM algorithms offer a more attractive

Along with the well-documented features of low

A Practical Guide to IP Audio Networking p15

• At-a-glance status of all codecs throughout network

Studio Transmitter Links

Remotes /Outside Broadcast

The ever-greater speeds available to broadcasters coupled with the cost-efficiency of

Figure 13: A Typical Wireless IP Application for Remote Broadcast

• A reliable, DSP-based codec running a low delay algorithm

A Practical Guide to IP Audio Networking p17

Multicasting is a useful technique for the

In the diagram, a studio is multicasting

Fully uncontended symmetrical line for duplex

Bandwidth must be equal or higher than audio

Calculation of optimum packet size and jitter

Guaranteed Quality of Service through the use

Pre-deployment & routine testing of network

Support for SIP/SDP to ensure interoperability

Configurable parameters to adjust for