You are on page 1of 7


Adaptive Streaming:
The New Revolution in IP Video Delivery

Online and mobile viewing of widely-available, high-quality video content including TV
programming, movies, sports events, and news is now poised to go mainstream. Driven by
the recent availability of low-cost, high-resolution desktop/laptop/tablet PCs, smart phones,
set-top boxes and now Ethernet-enabled TV sets, consumers have rapidly moved through the
‘novelty’ phase of acceptance into expectation that any media should be available essentially
on any device over any network connection. Whether regarded as a disruption for cable TV,
telco or satellite TV providers, or an opportunity for service providers to extend TV services
onto the web for on-demand, time-shifted and place-shifted programming environments –
often referred to as ‘three screen delivery’ or ‘TV Anywhere’ – this new video delivery model
is here to stay.

While tremendous advancements in core and last mile bandwidth have been achieved in the
last decade around the world – primarily driven by web-based data consumption – video
traffic represents a quantum leap in bandwidth requirements. Coupled with the fact that the
Internet at large is not a managed quality-of-service environment, requires that new methods
of video transport be considered to provide the quality of video experience across any device
and network that we have come to expect from managed TV-delivery networks.

The evolution of video delivery transport has led to a new set of de facto standard adaptive
delivery protocols from Apple, Microsoft, Adobe that are now positioned for broad adoption.
Consequently, networks must now be equipped with servers that can take high-quality video
content from its source live or file format and ‘package’ it for transport to devices ready to
accept these new delivery protocols.

The purpose of this paper is to provide insight into the history of video protocol evolution, the
need for newer delivery protocols, how they work, and how RGB Networks’ TransAct Packager
can be deployed in video delivery network environments to enable large-scale, cost-effective
video delivery over IP networks.
Video Delivery Background

The Era of Stateful Protocols

For many years, stateful protocols including Real Time Streaming Protocol (RTSP), Adobe’s Real Time
Messaging Protocol (RTMP), and Real Networks' RTSP over Real Data Transport (RDT) protocol were
utilized to stream video content to desktop and mobile clients. Stateful protocols require that from
the time a client connects to a streaming server until the time it disconnects, the server tracks client
state. If the client needs to perform any video session control commands like start, stop, pause or
fast-forward it must do so by communicating state information back to the streaming server. Once a
session between the client and the server has been established, the server sends media as a stream
of small packets typically representing a few milliseconds of video. These packets can be transmitted
over UDP or TCP. TCP overcomes firewall blocking of UDP packets, but may also incur increased
latency as packets are sent, and resent if not acknowledged, until received at the far end.

These protocols served the market well, particularly during the era where desktop and mobile device
experiences were limited by frequency, quality, duration, screen/window size/resolution,
constrained processor, memory and storage capabilities of mobile devices, etc.

However, the above experience factors have all changed dramatically in the last few years. And that
has exposed a number of stateful protocol implementation weaknesses:

• Stateful media protocols have difficulty getting through firewalls and routers
• Stateful media protocols require special proxies/caches
• Stateful media protocols cannot react quickly or gracefully to rapidly fluctuating network
• Stateful media client server implementations are vendor-specific, and thus require the
purchase of vendor-specific servers and licensing arrangements – which are also more
expensive to operate and maintain

The Era of the Stateless Protocol – HTTP Progressive Download

A newer type of media delivery is HTTP progressive download. Progressive download (as opposed to
‘traditional’ file download) pulls a file from a web server using HTTP and allows the video file to start
playing before the entire file has been downloaded. Most media players including Adobe Flash,
Windows Media Player, Apple Quicktime, etc., support progressive download. Further, most video
hosting websites use progressive download extensively, if not exclusively.

HTTP progressive download differs from traditional file download in one important respect.
Traditional files have audio and video data separated in the file. At the end of the file, a record of the
location and structure of the audio and video tracks (track data) is provided. Progressively
downloadable files have track data at the beginning of the file and interleave the audio and video
data. A player downloading a traditional file must wait until the end of the file is reached in order to
understand track data. A player downloading a progressively downloadable file gets track data
immediately and can, therefore, play back audio/video as it is received.

Unfortunately, it isn’t possible to efficiently store the audio and video, and create progressive
download files from live streams. Audio and video track data needs to be 1) computed after the
entire file is created and, then 2) written to the front of the file. Thus, it isn’t possible to deliver a live
stream using progressive download, because the track data can never be available until after the
entire file has been created.

Even so, HTTP Progressive Download greatly improves upon its stateful protocol predecessors as a
result of the following:

• No issue getting through firewalls and routers as HTTP traffic is passed through Port 80
• Utilizes the same web download infrastructure utilized by CDNs and hosting providers to
provide web data content – making it much easier and less expensive to deliver rich media
• Takes advantage of newer desktop and mobile clients’ formidable processing, memory and
storage capabilities to get start video playback quickly, maintain flow, and preserve a high-
quality experience

The Modern Era – Adaptive HTTP Streaming

Adaptive HTTP Streaming takes HTTP video delivery several steps further. In this case, the source
video, whether a file or a live stream, is encoded into segments – sometimes referred to as "chunks"
– using a desired delivery format, which includes a container, video codec, audio codec, encryption
protocol, etc. Segments typically represent two to ten seconds of video. Each segment is sliced at
video Group of Pictures (GOP) boundaries beginning with a key frame, giving the segment complete
independence from previous and successive segments. Encoded segments are subsequently hosted
on a regular HTTP web server.

Clients request segments from the web server, downloading them via HTTP. As the segments are
downloaded to the client, the client plays back the segments in the order received. Since the
segments are sliced along GOP boundaries with no gaps between, video playback is seamless – even
though it is actually just a file download via a series of HTTP GET requests.

Adaptive delivery enables a client to ‘adapt’ to fluctuating network conditions by selecting video file
segments encoded to different bit rates. As an example, suppose a video file had been encoded to
11 different bit rates from 500 Kbps to 1 Mbps in 50 Kbps increments, i.e., 500 Kbps, 550 Kbps, 600
Kbps, etc. The client then observes the effective bandwidth throughout the playback period by
evaluating its buffer fill/depletion rate. If a higher quality stream is available, and network
bandwidth appears able to support it, the client will switch to the higher-quality bit rate segment. If
a lower quality stream is available, and network bandwidth appears too limited to support the
currently used bit rate segment flow, the client will switch to the lower quality bit rate segment
flow. The client can choose between segments encoded at different bit rates every few seconds.

This delivery model works for both live- and file-based content. In either case, a manifest file is
provided to the client, which defines the parameters of each segment. In the case of an on-demand
file request, the manifest is sent at the beginning of the session. In the case of a live feed, updated
‘rolling window’ manifest files are sent as new segments are created.
Since the web server can typically send data as fast as its network connection will allow, the client
can evaluate its buffer conditions and make forward-looking decisions on whether future segment
requests should be at a higher or lower bit rate to avoid buffer overrun or starvation. Each client will
make this decision based on trying to select the highest possible bit rate for maximum quality of
playback experience, but not so great that it starves its own buffer of the next needed segments.

A number of advantages accrue with this delivery protocol approach:

• Lower infrastructure costs for content providers by eliminating specialty streaming servers in
lieu of generic HTTP caches/proxies already in place for HTTP data serving
• Content delivery is dynamically adapted to the weakest link in the end-to end-delivery chain,
including highly varying last mile conditions
• Subscribers no longer need to statically select a bit rate on their own, as the client can now
perform that function dynamically and automatically
• Subscribers enjoy fast start-up and seek times as playback control functions can be initiated
via the lowest bit rate and subsequently ratcheted up to a higher bit rate
• Annoying user experience shortcomings including long initial buffer time, disconnects, and
playback start/stop are virtually eliminated
• Client can control bit rate switching – with no intelligence in the server – taking into account
CPU load, available bandwidth, resolution, codec, and other local conditions
• Simplified ad insertion accomplished by file substitution

Content Protection

Segments can also be encrypted as they are encoded, enabling content rights to be managed on an
individual session basis. If so, the manifest file defines for the client the location of decryption keys,
assuming the client is authorized to do so. The client is subsequently responsible for retrieving
decryption keys and authenticating or presenting a user interface to allow authentication, and
decrypting media files as required.

RGB Networks’ TransAct Packager

RGB’s TransAct Packager ingests H.264 over MPEG-2 transport stream encoded video streams and
outputs Apple HTTP Live Streaming (HLS), Microsoft Smooth Streaming, and soon, Adobe Zeri
segmented streams for consumption by Ethernet-enabled TVs, set-top boxes, desktop/laptop
computers and mobile devices.

Additionally, the TransAct Packager can encrypt traffic using AES-128 for Apple HTTP Live Streaming
(HLS) and PlayReady for Microsoft Silverlight Smooth Streaming, integrating key exchange with
leading Digital Rights Management (DRM) servers.

Deployment Scenarios

The TransAct Packager works with RGB’s Video Multiprocessing Gateway (VMGTM) to provide a
complete transcoding and packaging solution. By separating transcoding and packaging functionality,
content delivery providers can deploy in a centralized or distributed fashion.

The VMG is a high-density, carrier-class hardware platform that provides advanced video preparation
and delivery services including MPEG-2 and MPEG-4/H.264 high definition (HD) and standard
definition (SD) video transcoding.

The VMG can transcode input video streams into multi-bitrate, multi-resolution streams suitable for
Apple-, Microsoft-, and Adobe-enabled client devices including desktops, laptops, tablet PCs, smart
phones, set-top boxes, and Ethernet-enabled TV sets.

These streams are then sent directly to a TransAct Packager (co-located or distributed at the network
edge) for packaging into Apple HLS, Microsoft Smooth Streaming and Adobe Zeri. Packaged streams
are delivered directly to origin web servers or to a content delivery network (CDN) for wider
distribution to end devices.

Scenario 1 – Centralized Delivery

In this scenario, content is ingested by the VMG, transcoded into multiple bit rates, packaged into
segments, and provided directly to Apple HLS clients, or to a Microsoft IIS server for Smooth
Streaming delivery to Silverlight clients from a centralized point within a single content delivery
provider network.

Scenario 2 – Distributed Delivery

In this scenario, content is ingested by the VMG, transcoded into multiple bit rates, packaged into
segments, and provided to a TransAct Packager for subsequent delivery to Apple HLS clients, or to a
Microsoft IIS server for Smooth Streaming delivery to Silverlight clients. The Packager can either be
located at edge delivery points within the content provider’s network or located within a CDN
partner’s distribution infrastructure.


Online and mobile viewing of premium video content is rapidly complimenting the traditional TV
experience. But delivery over the Internet requires new protocols to produce a high quality of
experience based on device type and network congestion. Apple HLS, Microsoft Smooth Streaming
and Adobe Zeri represent adaptive delivery protocols that enable high-quality video consumption
experiences over the Internet. Content providers must now equip network delivery infrastructure
with products capable of receiving standard video containers, slicing them into segments and
delivering those segments along with encryption where required. RGB Network’s TransAct Packager
accomplishes this goal in a manner that supports large-scale centralized or decentralized

390 West Java Drive

Sunnyvale, CA 94089 USA

© 2010, RGB Networks, Inc. All rights reserved.

RGB Networks and VMG are trademarks of RGB Networks, Inc.
All other brand and product names are trademarks or registered trademarks of their respective holders.