Professional Documents
Culture Documents
TED!
rd
3 E dition
The practical
guide to
audio
IP
over
for Broadcast
the essential introduction to audio over IP brought to you by APT
Introduction & Contents
Over the last number of years, APT has gained extensive experience in the field of audio
over IP networking. We have supported many customers through the transition from
synchronous to IP and worked hard in standards bodies within the industry to ensure
interoperability of audio codecs over IP networks. Now, with the help of this booklet, we
would like to share our knowledge and experience with you.
IP Audio Applications 17
Wireless IP 17
18
Unicast &Multicast
Summary and Checklist 19
APT’s IP Codec Solutions 20
However, the reign of synchronous links as the preferred choice for STLs is currently
coming under threat from a new challenger, in the form of IP based network technology.
While IP technology does have some disadvantages for audio transport, the benefits
over existing synchronous networks are increasingly proving too persuasive for
broadcasters to ignore:
It is clear to see that the use of IP networks for audio delivery enables broadcasters both
to leverage their existing infrastructure and achieve
greater flexibility in terms of content sharing and
network configuration. It also provides them with a
scalable platform for future development in areas
such as HD-Radio, data services etc.
Network Selection
Service providers offer broadcasters a variety of different options for IP audio delivery.
These range from dedicated links with a guaranteed Quality of Service to the open
internet or contentious ADSL links. We will examine each option in turn and evaluate
their usefulness to the broadcaster.
Dedicated IP Links
Professional studio transmitter links and inter-studio networks require a reliability and
robustness that is just not available on unmanaged or highly contended networks. The
mission-critical nature necessitates a guaranteed service level that will ensure the
uninterrupted flow of packets from the sender to the receiver with minimum delay and
no loss of audio quality.
For these applications, most service providers will offer some form of dedicated IP
connection offering ‘always on’ access and, a choice of failsafe options to ensure mission
critical connectivity. In remote or unusual locations, this
may take the form of Wireless IP. This service should be
uncontended with no bandwidth sharing to avoid disruption
of on-air content. If this is not possible, the broadcaster
should request the lowest contention ratio possible and
certainly no greater than 10:1.
MPLS Links
Offering one of the highest levels of service possible with IP,
Multi-Protocol Label Switching (MPLS) virtual private
networks are increasingly replacing leased lines as the
transport mechanism of choice for STLs and SSLs. The
technology offers many of the benefits of leased lines in that
it is a connection-oriented service and so has the ability to
support bandwidth reservation and service guarantees. In
addition, it is also complementary to IP transfer and
therefore offers the cost, flexibility and efficiency benefits of
IP audio networking.
MPLS assigns each network packet with short (20bit) labels that describe the path which
that packet should take. In comparison to a traditional IP network where individual
routers make independent routing decisions, MPLS traffic is analyzed upon entry to the
MPLS cloud and assigned a 'label' which dictates its path throughout the network.
Without the need for each router to look up the address of the next node, MPLS offers a
faster, more efficient service than a standard IP connection. Additional information for
traffic class of service (priority) can also be included in the MPLS label to ensure
prioritization of critical, time-sensitive content.
Public Internet
As we have noted, it is not recommended to utilize unmanaged networks such as the
open Internet, contentious ADSL links or contentious WANs or LANs for professional
broadcasting applications. However, in practice, the Internet is used for remote
broadcasts and it is possible to achieve high quality real time audio transfer using
contended IP links.
Utilizing the public internet means that the broadcaster will be more exposed to the
risks associated with IP links and, therefore, extra care needs to be taken to eliminate
any risks with regards the codec equipment and technology employed. As a minimum,
the codec should be DSP-based for rock solid reliability and offer remote configuration
and control over IP. In addition, the following should be ensured:
• Auto Re-connecting Codec - The codec used must enable fast re-connection if the
link is dropped. Some manufacturers’ codecs require a manual reboot at both ends to
re-establish the connection.
• Low Delay, ADPCM Coding - Perceptual coding technologies such as MPEG Layer 2,
AAC etc are frame-based and therefore require a minimum of one frame to be buffered
before compression is applied. If the link is dropped due to network outages, this
buffering will introduce additional delay into the audio stream. ADPCM algorithms
encode and decode 'on the fly' enabling instant audio immediately upon reconnection.
They also enable flexible packet sizes which can minimize the effects of dropped packets
on the audio stream.
UDP or TCP?
The RTP packet is further enclosed inside a Transmission Control Protocol (TCP) or User
Datagram Protocol (UDP) packet. There is a common mistaken assumption when
broadcasters first broach the subject of audio over IP that TCP will be the most
appropriate protocol. However, as a connection-oriented protocol, TCP dictates that the
receiving end must acknowledge receipt of every packet sent. Should a packet be
dropped, this protocol will cause the sender to repeatedly and fruitlessly request an
acknowledgment from the receiver producing unwanted data rate peaks on the link.
These peaks will deplete available bandwidth, cause audio glitches and create
unacceptable audio delay.
SIP is a signalling protocol for creating, modifying, and terminating sessions with one or
more participants. A lightweight protocol with only six messages, SIP minimizes
complexity and is also transport-independent so it can be used with both UDP and TCP.
As SIP is a peer-to-peer
protocol it is possible for clients
to connect directly with each
other using the concept of client
(audio codec) and server
(computer system used to direct
SIP calls). Larger systems will
require the use of proxy servers
to forward SIP calls towards the
intended destination (see
diagram) and registrar servers
which are essentially databases
of SIP clients.
Figure 1: A Typical SIP Session for Audio Transfer
SIP acts as a carrier for the Session Description Protocol (SDP), which describes the
media content of the session, e.g. what IP ports to use, the algorithm being used etc.
Once the connections have been made, SIP endpoints simply exchange media streams -
typically using RTP over UDP.
While choosing a larger packet size will reduce the overall bandwidth requirements and
network jitter (see below), it also means that if a packet is dropped, a correspondingly
larger amount of payload i.e. audio is dropped. In addition, some networks are
configured to work only with IP packets below a certain size - the Maximum
Transmission Unit (MTU) - and will fragment larger packets using a process that works
poorly with RTP.
On the other hand, reducing packet size will reduce packetization delay at the cost of
higher bandwidth requirements. Finding the optimum packet size will always be a
balance between bandwidth efficiency, network latency and audio quality.
Transmission
Reception
Playout
Again a trade-off is necessary, this time between the size of the jitter buffer and the
delay introduced. Setting a large buffer to minimize the effects of jitter may substantially
increase the overall network delay.
The latency figure quoted represents the inherent latency throughout the network as the
data passes through switches, routers etc and does not include audio compression delay
nor sample frequency effects. Any coding delay resulting from the use of compression
will add directly to the existing latency of the system. The choice of audio compression
algorithm is therefore critical in determining the end-to-end latency of the system and
low delay coding techniques must be selected for for real-time audio over IP
applications.
Figure 5: Table showing how choice of compression algorithm affects packet loss
With frame-based algorithms such as MPEG, the loss of any packet in a frame requires
the frame to be discarded. Therefore, using small packet sizes in conjuction with these
coding technologies will not bring any benefit or lessen the effects of packet loss.
Enhanced apt-X requires no frame buffering and offers greater flexibility in packet size
selection which reduces the susceptibility of an audio stream to the consequences of
packet loss. Packet sizes with durations shorter than the 3 msec psychoacoustic gap
detection threshold are easily achieved with Enhanced apt-X.
Theoretically, if a packet is lost, the receiving codec could request that the sending codec
retransmit the packet in question but this is usually impractical as the delay involved
would be substantial. The other options for dealing with packet loss are concealment,
correction or temporarily abandoning the packetized network in favor of an automated
backup to a synchronous network.
Concealment
Various methods can be used to conceal lost packets in the final reproduction of the
audio. They range from simple repetition of the last good packet received, to
silence/noise injection or interpolation and retransmission. All have an impact on the
reproduced audio.
In listening tests the injection of silence produced unacceptable breaks in the audio that
led to a level of incoherence. The use of white noise improved the intelligibility of the
reproduced audio but was again noticeable. The use of repetition of the last known good
frame produced more favorable results.
None of these concealment options produce an easily workable solution and it is the
generally accepted view that a better approach is to minimise the packet loss rather
than trying to disguise it.
The complexity of the FEC, the packet size and compression ratio used are all factors
which influence the resulting delay. For example a two by two FEC requires the buffering
of four packets. Given our earlier calculations concerning the amount of audio in an
MPEG L2 packet, this equates to 96ms. A two by two FEC will only protect against a small
burst error and the more realistic five by five FEC (as shown in figure 6) will require 25
packet buffering which, using the same calculations, is equivalent to 600ms delay.
Sample Frequency
x
Compression Ratio
x
Packet Size
(samples)
x
FEC Width
x
FEC Depth
=
Resultant Delay
Recovery on the decoder-side is also processor intensive. The process of amassing the
required block of packets, determining the location of the lost packets and resolving
them one by one can be a lengthy and complex procedure.
As with concealment, the use of FEC can cause as many, if not more, problems than it
solves. It can go some way to overcome the inadequacies of an IP based transport
mechanism but at a cost of additional delay, complexity, bandwidth and processing
overhead.
For professional STLs and audio backhaul, the emphasis should be on ensuring that the
IP network used is of sufficient quality to guarantee minimal packet loss. Implementing
methods to conceal or correct errors is an unnecessary distraction to the main aim of
ensuring reliable, robust audio delivery over an IP link.
There are two main methods for the improvement of link quality: RSVP and DiffServ.
RSVP (Resource reSerVation Protocol) is more complex and involves the reservation and
relinquishing of required resources throughout the network. DiffServ (Differentiated
Services) on the other hand offers a traffic classification framework that evaluates the
priority of network traffic on a "per hop" basis. Using Diffserv, each packet is classified
and awarded a DSCP (Diff Serv Code Point) value that is evaluated by the network and
prioritized accordingly.
There are four main classes of PHBs which are detailed in the table opposite. Because
of the intense efforts required to determine the appropriate class of traffic for packets,
it is recommended to minimize the number of classification occurrences within the
network infrastructure - four classes is the typical value.
• The values of QoS metrics which the service provider will guarantee for the
client's traffic. This will usually include the delay across the network,
maximum jitter and packet loss levels.
• The values of non-QoS metrics of the service such as availability which for
broadcast applications should be 99.999% or higher.
• The scope of the service i.e the specific routers between which the SLA
prevails
• The traffic profile of the stream directed to the service provider. This is
particularly relevant in applications such as HD Radio where the inclusion of
HD data can cause the data rate to exceed the average. The burst data rate
must be considered to avoid an increased level of contention.
• Performance monitoring procedures and expected levels of reporting
• Support and troubleshooting procedures including time-frame for response
and resolution and consequences for non-compliance
• The administrative/legal part defining processes for requesting and
cancelling certain services.
DSCP Binary DSCP
Description
value Value Name
Best Effort (BE) - With Best effort delivery, there are no guarantees
BE/ that data is delivered nor that it will be of a certain quality. BE traffic
0 000000
Default is the default setting for all IP traffic and indicates that the bit rate
and delivery method may vary depending on the current traffic load.
8 001000 CS1
16 010000 CS2
Class selector (CS). CS code points enable backward compatibility
24 011000 CS3 with the IP Precedence field - an early attempt to establish a QoS
32 100000 CS4 standard. The Class Selector codepoints are of the form 'xxx000' with
40 101000 CS5 the first three bits composed of the IP precedence bits. Each IP
precedence value is then mapped into a DiffServ class.
48 110000 CS6
56 111000 CS7
10 001010 AF11
12 001100 AF12
14 001110 AF13
18 010010 AF21 Assured Forwarding (AF) -AF PHB provides an assurance of delivery
20 010100 AF22 as long as the traffic does not exceed a defined rate. Traffic that
22 010110 AF23 exceeds the subscription rate faces a higher probability of being
dropped if congestion occurs. AF provides four different forwarding
26 011010 AF31 classes that you can assign to a packet. Every forwarding class
28 011100 AF32 provides three drop probabilities, which yields a total of 12 DSCP
30 011110 AF33 values from AF11 to AF43.
34 100010 AF41
36 100100 AF42
38 100110 AF43
Expedited Forwarding (EF). Expedited forwarding offers low delay,
low packet loss and low jitter and is often given priority above all
other traffic classes. This makes it highly suitable for real-time
services and critical content such as audio delivery. A packet that is
46 101110 EF
marked with 46 receives guaranteed low-drop precedence as the
packet traverses Diffserv-aware networks en route to its destination.
EF traffic should be limited to a maximum of 30% of the capacity of a
link.
Figure 9: Table Outlining the Main Classes of Per-Hop Behaviours in QoS
Of course the actual SLA is a legal document with lots of fine print. For example, the SLA
for AT&T's Virtual Private Network Tunnelling Service is part of a 78 page long
document! Generally the SLA will specify how the customer is to monitor performance
under the SLA, often via an online tool.
If performance fails to meet the figures specified, the SLA also covers the formulas to
determine penalties to the carrier (most often in the form of credits to the customer, not
refunds). These can consist of both "reactive" and "proactive" components. In order for
credits to apply, the customer must follow the rules specified in the SLA, with regards to
reporting reactive SLA problems etc. Proactive components are to be handled by the
provider with credits issued automatically.
An SLA may only be available to the broadcaster with certain revenue commitments
(contract amount) or periods (contract duration).
Professional audio codecs will provide the ability to trigger the backup from the primary
IP link to the secondary synchronous link using a number of different criteria such as
silence on the audio output of a specific audio module or a defined threshold in the
Performance Monitoring log.
Similarly, the automated restore back to the primary IP link could be defined in the
Performance Monitoring log i.e. number of consecutive packets received without a single
drop would equate to a restoration of the primary link.
Network Testing
Once your network has been installed, you should confirm that it meets the criteria of the
SLA you are paying for. In the case of a network with no SLA or QoS, you will also need to
confirm if it will be suitable for your needs. This qualification testing should be done
rigorously over a period of at least a week. This can generally be done rather easily with
software tools on a regular PC (see below). If the network will also handle non-audio data,
pay careful attention to assign codecs to the correct "Class of Service" (COS) using the
Difserv or other QoS mechanism. The amount of data at a given COS level must not
exceed the amount specified in your contract or severe problems can result.
Your first round of testing can be done using the tools Ping and Traceroute. These are
available on most computers built into the OS. We'll give a quick overview of using them
with WindowsTM but you can use other implementations if you wish.
Ping
This is the most basic "hello? are you there?", "yes I am here!" test for any IP network.
When anything IP-based fails to connect, ping should be the first tool out of your IP
toolbox.
Once the codecs and network are hooked up, you can connect a PC at one end and use it
to ping the codec at the far end. Later you should reverse the test.
First go to the WindowsTM ‘Run’ menu and enter "CMD". This should open a black DOS
command menu. Now enter the IP address of the far end codec in the format:
ping 123.456.789.101 <enter>
You will see that if the ping is sucessfull, a latency figure will be shown. Next, enable a
continuous ping for a period of time, and look to see the maximum latency variation by
entering the following command:
ping 123.456.789.101 -t <enter>
Allow it to run for at least 15 minutes. To stop the ping enter CTRL-C.
Traceroute
Traceroute gives the IP router itinerary of your packets. It will list each router your
packets travel through, plus a latency for each "hop". Just like travelling by plane, where
the more stops and plane changes you have, the higher the odds you may be delayed at
some point, the same is true of the number of routers used to relay your packets to their
destination.
To use traceroute, simply open a command window (see above) and type the following
command: tracert 123.456.789.101 (where 123.456.789.101 is the IP address of the far
end codec).
IP Connection Verifier
It is recommended that second-round testing be performed as well. In this case, the tests
should be configured to emulate the audio data closely. The APT IP Verifier software tool
can be configured with the same QoS, and IP ports you will be using for audio. Next start
adding applications, if any, starting with those at
the lowest COS to the network. Once all non-audio
applications are successfully working, and your
testing continues to show that the top COS is
working properly, deploy the audio network. Then
use the diagnostics built into your codecs to
continue to monitor a few weeks further.
These tools enable not only network and equipment monitoring, but the implementation
of remedial action, hardware redundancy and error alleviation. Where possible, the
broadcaster should seek to source an integrated solution which delivers all these
services in a single product, specifically the audio codec. This integrated solution allows
the administrator to manage both audio AND data services from a central location either
by a unified control software or on a higher level by SNMP.
Design Philosophy
The design philosophy behind products is a key factor to consider when purchasing
equipment for use in a professional broadcast environment. There are two key
approaches: DSP-based or PC-based product development.
Hardware Redundancy
For mission-critical STL applications, hardware redundancy is vital to ensure back-up in
the case of network or equipment failure. A broadcaster must consider the importance
of each link and source equipment that conveniently provides the necessary fail-safe
options. Hot-swappable audio modules, redundant power supplies and automatic back-
up functionality are just some of the options that should be considered.
Audio Algorithms
Having prepared your IP network for audio transport, the next step is to choose the best
method of sending audio down the link. Restrictions in available bandwidth will often
rule out linear/PCM audio and some form of compression is usually required. There are
two main types of compression techniques: ADPCM and Perceptual algorithms.
Perceptual based algorithms (such as MPEG L2, MPEG L3 (MP3), AAC and their many
derivatives) use psycho-acoustic based principles which analyze audio content and
determine what is audible to the human ear. The algorithm will remove all inaudible
content and is therefore, by definition, "lossy". Using multiple passes of a perceptual
codec (for example, consider the broadcast chain for HD Radio or DAB) will result in
content heavy with artifacts. Ultimately this will cause "listener fatigue," swiftly
followed by tune-out to a station offering higher audio quality.
Additionally, perceptual coding will introduce a delay to the audio delivery which is
generally unacceptable for real-time audio applications. Working on the assumption
that the IP transport stream and packetization will naturally introduce a minimum delay
of 20 milliseconds, it is imperative to minimize the latency of the compression algorithm
employed. In essence, using a perceptual coder, even a low delay variant, will render the
solution unusable for any level of real-time broadcast such as talkback applications and
off-air monitoring.
Whichever option is selected, the user should ensure that it provides them with the
following capabilities:
Fig 12:
Configuration and
Monitoring using the APT
Codec Management System
IP Audio Applications
Wireless IP Applications
When wired ISP services to the studio or transmitter sites are unavailable, the other
option is to use IP over a wireless link. RF / Microwave connections can be suitable both
for Studio Transmitter Links and Remote / Outside Broadcast applications.
There are additional considerations for those contemplating the use of RF IP links for
STL applications; extra care must be taken to ensure the path calculations are for
reliability over speed. Typical IP applications allow data to be re-sent, and RF links are
therefore usually optimized for speed. Audio networks require error rates that are
significantly lower and, thus, this must be taken into account during the design stage or
results will suffer. If a design consultant is used, be sure that s/he has experience
designing links for IP audio.
While wireless IP services are widely available and many broadcasters are using them
successfully, the observations made previously with regards use of the public internet
still apply. As much as possible, every effort should be made to secure some form of
bandwidth guarantee and the network should be tested thoroughly before transmission.
Key elements of a successful wireless set-up are:
In the example below, a stereo codec at the studio site has established a multiple unicast
to a number of transmitter sites. The studio is able to monitor the off-air content by
means of a return feed from Transmitter Site 3.
A
A
A
A
A
Multicast Applications
Multicasting is a highly efficient technique used to transmit from a single audio source
to many destinations using the IP infrastructure. The source sends the IP packets to a
multicast router using a Multicast Group address as its IP destination address.
Receivers use the same address to inform the network that they are interested in
receiving packets sent to that group. This is carried out using Internet Group
Management Protocol (IGMP). The nodes in the network take care of copying the IP
packets and routing them to all subscribed destinations.
Network Checklist
Equipment Checklist
Robust, DSP-based hardware codec with high
level of redundancy
WorldNet Oslo
The jewel in the crown of APT's broadcast audio codecs, the
WorldNet Oslo offers broadcasters and service providers a
flexible, highly reliable and multi-featured audio multiplexing
solution for Studio Transmitter Links and Inter-studio
networking.
• Up to 4 audio channels per card, up to 6 audio cards • Automatic Back-up and Restore for link & audio
• Enhanced apt-X, MPEG L2, J.57, J.41 or Linear audio • Powerful Codec Management System
• Analog and AES/EBU audio interfaces • Supports SIP/SDP for EBU N/ACIP compliance
• 5.1 Phase-Locking for seamless surround sound • In-band Management over E1/T1 link
Designed to transport both compressed and uncompressed audio, voice and data over
various digital networks, the WorldNet Oslo is based around a 19 inch, 3U high standard
rackmount chassis which is card-based expandable. Redundant power supplies, "hot-
swappable" cards and automatic back-up functionality ensure 24/7/365 reliability for
mission-critical applications.
Various network interface modules eliminate the need for external multiplexers or
media converters. Audio can be transported via synchronous or packet-switched
networks with support for T1 (1.5Mbit/s), E1 (2Mbit/s) and Ethernet (IP) interfaces.
A maximum of 24 audio channels in simplex mode and 12 audio channels in duplex mode
are possible in each frame. Plug-in audio modules in over 15 different configurations
offer analog, AES/EBU, simplex, duplex and 5.1 phase-locked options. As well as
uncompressed linear audio, J.57, J.41 and MPEG L2, the WorldNet Oslo also supports 16
or 24-bit Enhanced apt-X offering cascade-resilient, near-lossless audio quality with
under 2ms delay.
• Outstanding Audio Quality - All WorldCast codecs offer both linear & Enhanced apt-X coding
• High Compatibility - All WorldCast codecs support the SIP/SDP protocols enabling
quick and easy connection to all compliant IP codecs.(according
to EBU N/ACIP Standard)
WorldCast Eclipse
A multi-interface, multi-algorithm codec for the ultimate in
flexibility, the WorldCast Eclipse delivers bidirectional stereo
audio over multiple networks:
• IP (allowing connection to other codecs linked to Wide Area and Local Area Networks);
• X.21/V.35 (allowing connection to high speed fixed synchronous networks);
• ISDN (allowing connection to other codecs over dial up digital ISDN links)
Linear audio, Standard 16-bit apt-X, Enhanced 16 & 24-bit apt-X are supplied as standard and
an optional multi-algorithm suite incorporating MPEG 1/2 Layer II/III, MPEG 4 AAC, G.711 and
G.722 is also available. A rich array of features are provided on the WorldCast Eclipse
including Automatic Back-up, Silence Detect, Contact Closures and Alarm Ports.
WorldCast Equinox
WorldCast Equinox is a multi-algorithm, fully duplex, stereo
audio codec offering IP, Leased Line (X.21/V.35) and ISDN connectivity. Offering unprecedented
redundancy for a stereo IP audio codec, users have the option of dual IP interfaces, dual ISDN
ports and dual power supplies all on a cost-effective 1RU rackmountable unit.
This reliability coupled with the commitment to delivering broadcast-grade audio using linear
and Enhanced apt-X means that the WorldCast Equinox is the most professional and affordable
choice for broadcasters worldwide. The unit was voted a winner in the Radio World “Cool Stuff”
Awards at NAB 2009 for its innovative approach and appeal.
WorldCast Horizon
The WorldCast Horizon is a fully duplex, two channel stereo codec
designed to enable real-time transport of broadcast quality audio over IP networks.
Both analog and digital (AES/EBU with external reference) units are available.
In addition to linear audio, the WorldCast Horizon incorporates Enhanced apt-X coding
technology which, thanks to its low delay and exceptional acoustic properties, is particularly
suited to the transport of audio over packet-switched networks. Contact closures and opto-
couples for remote status alarms are also provided.
WorldCast Systems:
20, av Neil Armstrong,
Parc d'Activités J.F. Kennedy
33700 Bordeaux-Mérignac
FRANCE
www.aptcodecs.com
www.WorldCastSystems.com
© Copyright APT Ltd 2009