Professional Documents
Culture Documents
Media Services
Media Services
NOTE
No new features are being added to Media Services v2.
Check out the latest version, Media Services v3. Also, see migration guidance from v2 to v3
Microsoft Azure Media Services (AMS) is an extensible cloud-based platform that enables developers to build
scalable media management and delivery applications. Media Services is based on REST APIs that enable you to
securely upload, store, encode, and package video or audio content for both on-demand and live streaming
delivery to various clients (for example, TV, PC, and mobile devices).
You can build end-to-end workflows using entirely Media Services. You can also choose to use third-party
components for some parts of your workflow. For example, encode using a third-party encoder. Then, upload,
protect, package, deliver using Media Services. You can choose to stream your content live or deliver content on-
demand.
Prerequisites
To start using Azure Media Services, you should have the following:
An Azure account. If you don't have an account, you can create a free trial account in just a couple of
minutes. For details, see Azure Free Trial.
An Azure Media Services account. For more information, see Create Account.
(Optional) Set up development environment. Choose .NET or REST API for your development environment.
For more information, see Set up environment.
Also, learn how to connect programmatically to AMS API.
A standard or premium streaming endpoint in started state. For more information, see Managing streaming
endpoints
NOTE
To get the latest version of Java SDK and get started developing with Java, see Get started with the Java client SDK for Media
Services.
To download the latest PHP SDK for Media Services, look for version 0.5.7 of the Microsoft/WindowAzure package in the
Packagist repository.
Code samples
Find multiple code samples in the Azure Code Samples gallery: Azure Media Services code samples.
Concepts
For Azure Media Services concepts, see Concepts.
Support
Azure Support provides support options for Azure, including Media Services.
Provide feedback
Use the User Voice forum to provide feedback and make suggestions on how to improve Azure Media Services.
You also can go directly to one of the following categories:
Azure Media Player
Client SDK libraries
Encoding and processing
Live streaming
Media Analytics
Azure portal
REST API and platform
Video-on-demand streaming
Signaling Timed Metadata in Live Streaming
9/22/2020 • 35 minutes to read • Edit Online
1. Introduction
In order to signal the insertion of advertisements or custom metadata events on a client player, broadcasters often
make use of timed metadata embedded within the video. To enable these scenarios, Media Services provides
support for the transport of timed metadata from the ingest point of the live streaming channel to the client
application. This specification outlines several modes that are supported by Media Services for timed metadata
within live streaming signals.
1. [SCTE-35] signaling that complies with the standards outlined by [SCTE-35], [SCTE-214-1], [SCTE-214-3] and
[RFC8216]
2. [SCTE-35] signaling that complies with the legacy [Adobe-Primetime] specification for RTMP ad signaling.
3. A generic timed metadata signaling mode, for messages that are NOT [SCTE-35] and could carry [ID3v2] or
other custom schemas defined by the application developer.
Ad Decision Service external service that decides which ad(s) and durations will be
shown to the user. The services is typically provided by a
partner and are out of scope for this document.
Presentation Time The time that an event is presented to a viewer. The time
represents the moment on the media timeline that a viewer
would see the event. For example, the presentation time of a
SCTE-35 splice_info() command message is the splice_time().
T ERM DEF IN IT IO N
Arrival Time The time that an event message arrives. The time is typically
distinct from the presentation time of the event, since event
messages are sent ahead of the presentation time of the
event.
Sparse track media track that is not continuous, and is time synchronized
with a parent or control track.
STA N DA RD DEF IN IT IO N
[SCTE-214-1] SCTE 214-1 2016 – MPEG DASH for IP-Based Cable Services
Part 1: MPD Constraints and Extensions
[SCTE-214-3] SCTE 214-3 2015 MPEG DASH for IP-Based Cable Services
Part 3: DASH/FF Profile
Example XML Event Stream with ID3 schema ID and base64-encoded data payload.
Example Event Stream with custom schema ID and base64-encoded binary data
<?xml version="1.0" encoding="UTF-8"?>
<EventStream schemeIdUri="urn:example.org:custom:binary">
<Event contentEncoding="Base64">
-- base64 encoded custom binary data message --
</Event>
<EventStream>
Example MPEG DASH manifest output when using Adobe RTMP simple mode
See example 3.3.2.1 MPEG DASH .mpd EventStream using Adobe simple mode
See example 3.3.3.1 DASH manifest with single period and Adobe simple mode
Example HLS manifest output when using Adobe RTMP simple mode
See example 3.2.2 HLS manifest using Adobe simple mode and EXT-X-CUE tag
time The time in seconds at which the cue point occurred in the
video file during timeline
When this mode of ad marker is used, the HLS manifest output is similar to Adobe "Simple" mode.
Example MPEG DASH MPD, single period, Adobe Simple mode signals
<?xml version="1.0" encoding="utf-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" profiles="urn:mpeg:dash:profile:isoff-live:2011"
type="dynamic" publishTime="2020-01-07T18:58:03Z" minimumUpdatePeriod="PT0S" timeShiftBufferDepth="PT58M56S"
availabilityStartTime="2020-01-07T17:44:47Z" minBufferTime="PT7S">
<Period start="PT0S">
<EventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="scte35" timescale="10000000">
<Event presentationTime="1583497601000000" duration="300000000" id="1085900"/>
<Event presentationTime="1583500901666666" duration="300000000" id="1415966"/>
<Event presentationTime="1583504202333333" duration="300000000" id="1746033"/>
<Event presentationTime="1583507502666666" duration="300000000" id="2076066"/>
<Event presentationTime="1583510803333333" duration="300000000" id="2406133"/>
<Event presentationTime="1583514104000000" duration="300000000" id="2736200"/>
<Event presentationTime="1583517404666666" duration="300000000" id="3066266"/>
<Event presentationTime="1583520705333333" duration="300000000" id="3396333"/>
<Event presentationTime="1583524006000000" duration="300000000" id="3726400"/>
<Event presentationTime="1583527306666666" duration="300000000" id="4056466"/>
<Event presentationTime="1583530607333333" duration="300000000" id="4386533"/>
</EventStream>
<AdaptationSet id="1" group="1" profiles="ccff" bitstreamSwitching="false" segmentAlignment="true"
contentType="video" mimeType="video/mp4" codecs="avc1.4D400C" maxWidth="256" maxHeight="144" startWithSAP="1">
<InbandEventStream schemeIdUri="urn:mpeg:dash:event:2012" value="1"/>
<InbandEventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="scte35"/>
<SegmentTemplate timescale="10000000" presentationTimeOffset="1583486678426666"
media="QualityLevels($Bandwidth$)/Fragments(video=$Time$,format=mpd-time-csf)"
initialization="QualityLevels($Bandwidth$)/Fragments(video=i,format=mpd-time-csf)">
<SegmentTimeline>
<S t="1583495318000000" d="64000000" r="34"/>
<S d="43000000"/>
<S d="21000000"/>
<!-- ... Truncated for brevity of sample-->
</SegmentTimeline>
</SegmentTemplate>
<ProducerReferenceTime id="1583495318000000" type="0" wallClockTime="2020-01-07T17:59:10.957Z"
presentationTime="1583495318000000"/>
<Representation id="1_V_video_3750956353252827751" bandwidth="149952" width="256" height="144"/>
</AdaptationSet>
<AdaptationSet id="2" group="5" profiles="ccff" bitstreamSwitching="false" segmentAlignment="true"
contentType="audio" mimeType="audio/mp4" codecs="mp4a.40.2" lang="en">
<InbandEventStream schemeIdUri="urn:mpeg:dash:event:2012" value="1"/>
<InbandEventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="scte35"/>
<Label>ambient</Label>
<SegmentTemplate timescale="10000000" presentationTimeOffset="1583486678426666"
media="QualityLevels($Bandwidth$)/Fragments(ambient=$Time$,format=mpd-time-csf)"
initialization="QualityLevels($Bandwidth$)/Fragments(ambient=i,format=mpd-time-csf)">
<SegmentTimeline>
<S t="1583495254426666" d="64000000" r="35"/>
<S d="43093334"/>
<S d="20906666"/>
<!-- ... Truncated for brevity of sample-->
</SegmentTimeline>
</SegmentTemplate>
<ProducerReferenceTime id="1583495254426666" type="0" wallClockTime="2020-01-07T17:59:04.600Z"
presentationTime="1583495254426666"/>
<Representation id="5_A_ambient_9125670592623055209" bandwidth="96000" audioSamplingRate="48000"/>
</AdaptationSet>
</Period>
</MPD>
Example HLS playlist, Adobe Simple mode signals using EXT-X-CUE tag (truncated "..." for brevity )
The following example shows the output from the Media Services dynamic packager for an RTMP ingest stream
using Adobe "simple" mode signals and the legacy [Adobe-Primetime] EXT-X-CUE tag.
#EXTM3U
#EXT-X-VERSION:8
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:7
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-PROGRAM-DATE-TIME:2020-01-07T17:44:47Z
#EXTINF:6.400000,no-desc
Fragments(video=1583486742000000,format=m3u8-aapl-v8)
#EXTINF:6.400000,no-desc
Fragments(video=1583486806000000,format=m3u8-aapl-v8)
...
#EXTINF:6.166667,no-desc
Fragments(video=1583487638000000,format=m3u8-aapl-v8)
#EXT-X-CUE:ID=95766,TYPE="SpliceOut",DURATION=30.000000,TIME=158348769.966667
#EXTINF:0.233333,no-desc
Fragments(video=1583487699666666,format=m3u8-aapl-v8)
#EXT-X-CUE:ID=95766,TYPE="SpliceOut",DURATION=30.000000,TIME=158348769.966667,ELAPSED=0.233333
#EXTINF:6.400000,no-desc
Fragments(video=1583487702000000,format=m3u8-aapl-v8)
#EXT-X-CUE:ID=95766,TYPE="SpliceOut",DURATION=30.000000,TIME=158348769.966667,ELAPSED=6.633333
#EXTINF:6.400000,no-desc
Fragments(video=1583487766000000,format=m3u8-aapl-v8)
#EXT-X-CUE:ID=95766,TYPE="SpliceOut",DURATION=30.000000,TIME=158348769.966667,ELAPSED=13.033333
#EXTINF:6.400000,no-desc
Fragments(video=1583487830000000,format=m3u8-aapl-v8)
#EXT-X-CUE:ID=95766,TYPE="SpliceOut",DURATION=30.000000,TIME=158348769.966667,ELAPSED=19.433333
#EXTINF:6.400000,no-desc
Fragments(video=1583487894000000,format=m3u8-aapl-v8)
#EXT-X-CUE:ID=95766,TYPE="SpliceOut",DURATION=30.000000,TIME=158348769.966667,ELAPSED=25.833333
#EXTINF:4.166667,no-desc
Fragments(video=1583487958000000,format=m3u8-aapl-v8)
#EXTINF:2.233333,no-desc
Fragments(video=1583487999666666,format=m3u8-aapl-v8)
#EXTINF:6.400000,no-desc
Fragments(video=1583488022000000,format=m3u8-aapl-v8)
...
The 'moov' box SHOULD contain a HandlerBox ('hdlr') as defined in [ISO-14496-12] with the following
constraints:
F IEL D N A M E F IEL D T Y P E REQ UIRED? DESC RIP T IO N
The 'stsd' box SHOULD contain a MetaDataSampleEntry box with a coding name as defined in [ISO-14496-12]. For
example, for SCTE-35 messages the coding name SHOULD be 'scte'.
2.2.3 Movie Fragment Box and Media Data Box
Sparse track fragments consist of a Movie Fragment Box ('moof') and a Media Data Box ('mdat').
NOTE
In order to achieve frame-accurate insertion of ads, the encoder MUST split the fragment at the presentation time where the
cue is required to be inserted. A new fragment MUST be created that begins with a newly created IDR frame, or Stream
Access Points (SAP) of type 1 or 2, as defined in [ISO-14496-12] Annex I
#EXTM3U
#EXT-X-VERSION:8
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:2
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-PROGRAM-DATE-TIME:2020-01-07T19:40:50Z
#EXTINF:1.501500,no-desc
Fragments(video=22567545,format=m3u8-aapl-v8)
#EXTINF:1.501500,no-desc
Fragments(video=22702680,format=m3u8-aapl-v8)
#EXTINF:1.501500,no-desc
Fragments(video=22837815,format=m3u8-aapl-v8)
#EXTINF:1.501500,no-desc
Fragments(video=22972950,format=m3u8-aapl-v8)
#EXTINF:1.501500,no-desc
Fragments(video=23108085,format=m3u8-aapl-v8)
#EXTINF:1.234567,no-desc
Fragments(video=23243220,format=m3u8-aapl-v8)
#EXTINF:0.016689,no-desc
Fragments(video=23354331,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=0.000022
#EXTINF:0.250244,no-desc
Fragments(video=23355833,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=0.250267
#EXTINF:0.850856,no-desc
Fragments(video=23378355,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=1.101122
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=0.000000,TIME=260.610344,CUE="/DAgAAAAAAXdAP/wDwUAAAPqf0/+AWXk0wABAQEAAGB8
6Fo="
#EXTINF:0.650644,no-desc
Fragments(video=23454932,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=1.751767
#EXTINF:0.050044,no-desc
Fragments(video=23513490,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=1.801811
#EXTINF:1.451456,no-desc
Fragments(video=23517994,format=m3u8-aapl-v8)
#EXT-X-
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=3.253267
#EXTINF:1.501500,no-desc
Fragments(video=23648625,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=4.754767
#EXTINF:1.501500,no-desc
Fragments(video=23783760,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=6.256267
#EXTINF:1.501500,no-desc
Fragments(video=23918895,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=7.757767
#EXTINF:1.501500,no-desc
Fragments(video=24054030,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=9.259267
#EXTINF:1.501500,no-desc
Fragments(video=24189165,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=10.760767
#EXTINF:1.501500,no-desc
Fragments(video=24324300,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=12.262267
#EXTINF:1.501500,no-desc
Fragments(video=24459435,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=13.763767
#EXTINF:1.501500,no-desc
Fragments(video=24594570,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=15.265267
#EXTINF:1.501500,no-desc
Fragments(video=24729705,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=16.766767
#EXTINF:1.501500,no-desc
Fragments(video=24864840,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=18.268267
#EXTINF:1.501500,no-desc
Fragments(video=24999975,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=19.769767
#EXTINF:1.501500,no-desc
Fragments(video=25135110,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=21.271267
#EXTINF:1.501500,no-desc
Fragments(video=25270245,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=22.772767
#EXTINF:1.501500,no-desc
Fragments(video=25405380,format=m3u8-aapl-v8)
Fragments(video=25405380,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=24.274267
#EXTINF:1.501500,no-desc
Fragments(video=25540515,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=25.775767
#EXTINF:1.501500,no-desc
Fragments(video=25675650,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=27.277267
#EXTINF:1.501500,no-desc
Fragments(video=25810785,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=28.778767
#EXTINF:1.501500,no-desc
Fragments(video=25945920,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=30.280267
#EXTINF:1.501500,no-desc
Fragments(video=26081055,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=31.781767
#EXTINF:1.501500,no-desc
Fragments(video=26216190,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=33.283267
#EXTINF:1.501500,no-desc
Fragments(video=26351325,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=34.784767
#EXTINF:1.501500,no-desc
Fragments(video=26486460,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=36.286267
#EXTINF:1.501500,no-desc
Fragments(video=26621595,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=37.787767
#EXTINF:1.501500,no-desc
Fragments(video=26756730,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=39.289267
#EXTINF:1.501500,no-desc
Fragments(video=26891865,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=40.790767
#EXTINF:1.501500,no-desc
Fragments(video=27027000,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=42.292267
#EXTINF:1.501500,no-desc
Fragments(video=27162135,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=43.793767
#EXTINF:1.501500,no-desc
Fragments(video=27297270,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=45.295267
#EXTINF:1.501500,no-desc
Fragments(video=27432405,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=46.796767
#EXTINF:1.501500,no-desc
Fragments(video=27567540,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=48.298267
#EXTINF:1.501500,no-desc
Fragments(video=27702675,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=49.799767
#EXTINF:1.501500,no-desc
Fragments(video=27837810,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=51.301267
#EXTINF:1.501500,no-desc
Fragments(video=27972945,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=52.802767
#EXTINF:1.501500,no-desc
Fragments(video=28108080,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=54.304267
#EXTINF:1.501500,no-desc
Fragments(video=28243215,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=55.805767
#EXTINF:1.501500,no-desc
Fragments(video=28378350,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=57.307267
#EXTINF:1.501500,no-desc
Fragments(video=28513485,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=58.808767
#EXTINF:1.501500,no-desc
Fragments(video=28648620,format=m3u8-aapl-v8)
ELAPSED decimal floating point Optional, but Required for When the signal is being
number sliding window repeated to support a sliding
presentation window, this
field MUST be the amount
of presentation time that has
elapsed since the event
began. Units are fractional
seconds. This value may
exceed the original specified
duration of the splice or
segment.
The HLS player application layer will use the TYPE to identify the format of the message, decode the message, apply
the necessary time conversions, and process the event. The events are time synchronized in the segment playlist of
the parent track, according to the event timestamp. They are inserted before the nearest segment (#EXTINF tag).
3.2.3 HLS .m3u8 manifest example using Adobe Primetime EXT -X -CUE
The following example shows HLS manifest decoration using the Adobe Primetime EXT-X-CUE tag. The "CUE"
parameter contains only the TYPE and Duration properties which means that this was an RTMP source using Adobe
"simple" mode signaling.
#EXTM3U
#EXT-X-VERSION:4
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-ALLOW-CACHE:NO
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:11
#EXT-X-PROGRAM-DATE-TIME:2019-12-10T09:18:14Z
#EXTINF:10.010000,no-desc
Fragments(video=4011540820,format=m3u8-aapl)
#EXTINF:10.010000,no-desc
Fragments(video=4011550830,format=m3u8-aapl)
#EXTINF:10.010000,no-desc
Fragments(video=4011560840,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000
#EXTINF:8.008000,no-desc
Fragments(video=4011570850,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=0.593000
#EXTINF:4.170000,no-desc
Fragments(video=4011578858,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=4.763000
#EXTINF:9.844000,no-desc
Fragments(video=4011583028,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=14.607000
#EXTINF:10.010000,no-desc
Fragments(video=4011592872,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=24.617000
#EXTINF:10.010000,no-desc
Fragments(video=4011602882,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=34.627000
#EXTINF:10.010000,no-desc
Fragments(video=4011612892,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=44.637000
#EXTINF:10.010000,no-desc
Fragments(video=4011622902,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=54.647000
#EXTINF:10.010000,no-desc
Fragments(video=4011632912,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=64.657000
#EXTINF:10.010000,no-desc
Fragments(video=4011642922,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=74.667000
#EXTINF:10.010000,no-desc
Fragments(video=4011652932,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=84.677000
#EXTINF:10.010000,no-desc
Fragments(video=4011662942,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=94.687000
#EXTINF:10.010000,no-desc
Fragments(video=4011672952,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=104.697000
#EXTINF:10.010000,no-desc
Fragments(video=4011682962,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=114.707000
#EXTINF:10.010000,no-desc
Fragments(video=4011692972,format=m3u8-aapl)
#EXTINF:8.008000,no-desc
Fragments(video=4011702982,format=m3u8-aapl)
NOTE
For brevity purposes [SCTE-35] allows use of the base64-encoded section in Signal.Binary element (rather than the
Signal.SpliceInfoSection element) as an alternative to carriage of a completely parsed cue message. Azure Media Services uses
this 'xml+bin' approach to signaling in the MPD manifest. This is also the recommended method used in the [DASH-IF-IOP] -
see section titled 'Ad insertion event streams' of the DASH IF IOP guideline
<!-- Example EventStream element using "urn:com:adobe:dpi:simple:2015" Adobe simple signaling per [Adobe-
Primetime] -->
<EventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="simplesignal" timescale="10000000">
<Event presentationTime="1583497601000000" duration="300000000" id="1085900"/>
<Event presentationTime="1583500901666666" duration="300000000" id="1415966"/>
<Event presentationTime="1583504202333333" duration="300000000" id="1746033"/>
<Event presentationTime="1583507502666666" duration="300000000" id="2076066"/>
<Event presentationTime="1583510803333333" duration="300000000" id="2406133"/>
<Event presentationTime="1583514104000000" duration="300000000" id="2736200"/>
<Event presentationTime="1583517404666666" duration="300000000" id="3066266"/>
<Event presentationTime="1583520705333333" duration="300000000" id="3396333"/>
<Event presentationTime="1583524006000000" duration="300000000" id="3726400"/>
<Event presentationTime="1583527306666666" duration="300000000" id="4056466"/>
<Event presentationTime="1583530607333333" duration="300000000" id="4386533"/>
</EventStream>
3.3.2.2 Example MPEG DASH .mpd manifest signaling of an RTMP stream using Adobe SCTE-35 mode
The following example shows an excerpt EventStream from the Media Services dynamic packager for an RTMP
stream using Adobe SCTE-35 mode signaling.
Example EventStream element using xml+bin style signaling per [SCTE-214-1]
<EventStream schemeIdUri="urn:scte:scte35:2014:xml+bin" value="scte35" timescale="10000000">
<Event presentationTime="2595092444" duration="11011000" id="1002">
<Signal xmlns="http://www.scte.org/schemas/35/2016">
<Binary>/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAEBAQAA8g1eNw==</Binary>
</Signal>
</Event>
<Event presentationTime="2606103444" id="1002">
<Signal xmlns="http://www.scte.org/schemas/35/2016">
<Binary>/DAgAAAAAAXdAP/wDwUAAAPqf0/+AWXk0wABAQEAAGB86Fo=</Binary>
</Signal>
</Event>
</EventStream>
IMPORTANT
Note that presentationTime is the presentation time of the [SCTE-35] event translated to be relative to the Period Start time,
not the arrival time of the message. [MPEGDASH] defines the Event@presentationTime as "Specifies the presentation time of
the event relative to the start of the Period. The value of the presentation time in seconds is the division of the value of this
attribute and the value of the EventStream@timescale attribute. If not present, the value of the presentation time is 0.
3.3.3.1 Example MPEG DASH manifest (MPD) with single-period, EventStream, using Adobe simple mode signals
The following example shows the output from the Media Services dynamic packager for a source RTMP stream
using the Adobe "simple" mode ad signal method. The output is a single period manifest showing an EventStream
using the schemeId Uri set to "urn:com:adobe:dpi:simple:2015" and value property set to "simplesignal". Each
simple signal is provided in an Event element with the @presentationTime, @duration, and @id properties
populated based on the incoming simple signals.
<?xml version="1.0" encoding="utf-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" profiles="urn:mpeg:dash:profile:isoff-live:2011"
type="static" mediaPresentationDuration="PT28M1.680S" minBufferTime="PT3S">
<Period>
<EventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="simplesignal" timescale="1000">
<Event presentationTime="4011578265" duration="119987" id="4011578265"/>
</EventStream>
<AdaptationSet id="1" group="1" profiles="ccff" bitstreamSwitching="false" segmentAlignment="true"
contentType="video" mimeType="video/mp4" codecs="avc1.4D4028" maxWidth="1920" maxHeight="1080"
startWithSAP="1">
<InbandEventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="simplesignal"/>
<ProducerReferenceTime id="4011460740" type="0" wallClockTime="2020-01-25T19:35:54.740Z"
presentationTime="4011460740"/>
<SegmentTemplate timescale="1000" presentationTimeOffset="4011460740"
media="QualityLevels($Bandwidth$)/Fragments(video=$Time$,format=mpd-time-csf)"
initialization="QualityLevels($Bandwidth$)/Fragments(video=i,format=mpd-time-csf)">
<SegmentTimeline>
<S t="4011460740" d="2002" r="57"/>
<S d="1401"/>
<S d="601"/>
<S d="2002"/>
</SegmentTimeline>
</SegmentTemplate>
<Representation id="1_V_video_14759481473095519504" bandwidth="6000000" width="1920"
height="1080"/>
<Representation id="1_V_video_1516803357996956148" bandwidth="4000000" codecs="avc1.4D401F"
width="1280" height="720"/>
<Representation id="1_V_video_5430608182379669372" bandwidth="2600000" codecs="avc1.4D401F"
width="960" height="540"/>
<Representation id="1_V_video_3780180650986497347" bandwidth="1000000" codecs="avc1.4D401E"
width="640" height="360"/>
<Representation id="1_V_video_13759117363700265707" bandwidth="699000" codecs="avc1.4D4015"
width="480" height="270"/>
<Representation id="1_V_video_6140004908920393176" bandwidth="400000" codecs="avc1.4D4015"
width="480" height="270"/>
<Representation id="1_V_video_10673801877453424365" bandwidth="200000" codecs="avc1.4D400D"
width="320" height="180"/>
</AdaptationSet>
<AdaptationSet id="2" group="5" profiles="ccff" bitstreamSwitching="false" segmentAlignment="true"
contentType="audio" mimeType="audio/mp4" codecs="mp4a.40.2">
<InbandEventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="simplesignal"/>
<ProducerReferenceTime id="4011460761" type="0" wallClockTime="2020-01-25T19:35:54.761Z"
presentationTime="4011460761"/>
<Label>audio</Label>
<SegmentTemplate timescale="1000" presentationTimeOffset="4011460740"
media="QualityLevels($Bandwidth$)/Fragments(audio=$Time$,format=mpd-time-csf)"
initialization="QualityLevels($Bandwidth$)/Fragments(audio=i,format=mpd-time-csf)">
<SegmentTimeline>
<S t="4011460761" d="1984"/>
<S d="2005" r="1"/>
<S d="2006"/>
</SegmentTimeline>
</SegmentTemplate>
<Representation id="5_A_audio_17504386117102112482" bandwidth="128000" audioSamplingRate="48000"/>
</AdaptationSet>
</Period>
</MPD>
3.3.4 MPEG DASH In-band Event Message Box Signaling
An in-band event stream requires the MPD to have an InbandEventStream element at the Adaptation Set level. This
element has a mandatory schemeIdUri attribute and optional timescale attribute, which also appear in the Event
Message Box ('emsg'). Event message boxes with scheme identifiers that are not defined in the MPD SHOULD not
be present.
For in-band [SCTE-35] carriage, signals MUST use the schemeId = "urn:scte:scte35:2013:bin". Normative definitions
of carriage of [SCTE-35] in-band messages are defined in [SCTE-214-3] sec 7.3.2 (Carriage of SCTE 35 cue
messages).
The following details outline the specific values the client should expect in the 'emsg' in compliance with [SCTE-
214-3]:
Change History
DAT E C H A N GES
Next steps
View Media Services learning paths.
Media Services v3 (latest)
Check out the latest version of Azure Media Services!
Overview
Concepts
Start developing
Migration guidance from v2 to v3
Media Services v2 (legacy)
Overview
Create account
Deliver on-demand
Deliver live
Provide feedback
Use the User Voice forum to provide feedback and make suggestions on how to improve Azure Media Services. You
also can go directly to one of the following categories:
Azure Media Player
Client SDK libraries
Encoding and processing
Live streaming
Media Analytics
Azure portal
REST API and platform
Video-on-demand streaming
Smooth Streaming Protocol (MS-SSTR) Amendment
for HEVC
9/22/2020 • 11 minutes to read • Edit Online
1 Introduction
This article provides detailed amendments to be applied to the Smooth Streaming Protocol specification [MS-SSTR]
to enable Smooth Streaming of HEVC encoded video. In this specification, we outline only the changes required to
deliver the HEVC video codec. The article follows the same numbering schema as the [MS-SSTR] specification. The
empty headlines presented throughout the article are provided to orient the reader to their position in the [MS-
SSTR] specification. “(No Change)” indicates text is copied for clarification purposes only.
The article provides technical implementation requirements for the signaling of HEVC video codec (using either
'hev1' or 'hvc1' format tracks) in a Smooth Streaming manifest and normative references are updated to reference
the current MPEG standards that include HEVC, Common Encryption of HEVC, and box names for ISO Base Media
File Format have been updated to be consistent with the latest specifications.
The referenced Smooth Streaming Protocol specification [MS-SSTR] describes the wire format used to deliver live
and on-demand digital media, such as audio and video, in the following manners: from an encoder to a web server,
from a server to another server, and from a server to an HTTP client. The use of an MPEG-4 ([MPEG4-RA])-based
data structure delivery over HTTP allows seamless switching in near real time between different quality levels of
compressed media content. The result is a constant playback experience for the HTTP client end user, even if
network and video rendering conditions change for the client computer or device.
1.1 Glossary
The following terms are defined in [MS-GLOS]:
composition time: The time a sample is presented at the client, as defined in [ISO/IEC-14496-12].
CENC : Common Encryption, as defined in [ISO/IEC 23001-7] Second Edition.
decode time: The time a sample is required to be decoded on the client, as defined in [ISO/IEC 14496-
12:2008].
fragment: An independently downloadable unit of media that comprises one or more samples .
1.2 References
References to Microsoft Open Specifications documentation do not include a publishing year because links are
to the latest version of the documents, which are updated frequently. References to other documents include a
publishing year when one is available.
1.3 Overview
Only changes to the Smooth Streaming specification required for the delivery of HEVC are specified below.
Unchanged section headers are listed to maintain location in the referenced Smooth Streaming specification
[MS-SSTR].
1.4 Relationship to Other Protocols
1.5 Prerequisites/Preconditions
1.6 Applicability Statement
1.7 Versioning and Capability Negotiation
1.8 Vendor-Extensible Fields
The following method SHALL be used identify streams using the HEVC video format:
Custom Descriptive Codes for Media Formats: This capability is provided by the FourCC field, as
specified in section 2.2.2.5. Implementers can ensure that extensions do not conflict by registering extension
codes with the MPEG4-RA, as specified in [ISO/IEC-14496-12]
MinorVersion (variable): The minor version of the Manifest Response message. MUST be set to 2. (No
Change)
TimeScale (variable): The time scale of the Duration attribute, specified as the number of increments in one
second. The default value is
1. (No Change)
The recommended value is 90000 for representing the exact duration of video frames and fragments
containing fractional framerate video (for example, 30/1.001 Hz).
2.2.2.2 ProtectionElement
The ProtectionElement SHALL be present when Common Encryption (CENC) has been applied to video or audio
streams. HEVC encrypted streams SHALL conform to Common Encryption 2nd Edition [ISO/IEC 23001-7]. Only
slice data in VCL NAL Units SHALL be encrypted.
2.2.2.3 StreamElement
StreamTimeScale (variable): The time scale for duration and time values in this stream, specified as the
number of increments in one second. A value of 90000 is recommended for HEVC streams. A value matching
the waveform sample frequency (for example, 48000 or 44100) is recommended for audio streams.
2 .2 .2 .3 .1 St r e a m P r o t e c t i o n El e m e n t
2.2.2.4 UrlPattern
2.2.5 TrackElement
FourCC (variable): A four-character code that identifies which media format is used for each sample. The
following range of values is reserved with the following semantic meanings:
"hev1”: Video samples for this track use HEVC video, using the ‘hev1’ sample description format
specified in [ISO/IEC-14496-15].
"hvc1”: Video samples for this track use HEVC video, using the ‘hvc1’ sample description format
specified in [ISO/IEC-14496-15].
CodecPrivateData (variable): Data that specifies parameters specific to the media format and
common to all samples in the track, represented as a string of hex-coded bytes. The format and semantic
meaning of byte sequence varies with the value of the FourCC field as follows:
When a TrackElement describes HEVC video, the FourCC field SHALL equal "hev1" or "hvc1"
The CodecPrivateData field SHALL contain a hex-coded string representation of the following byte
sequence, specified in ABNF [RFC5234]: (no change from MS-SSTR)
%x00 %x00 %x00 %x01 SPSField %x00 %x00 %x00 %x01 PPSField
SPSField contains the Sequence Parameter Set (SPS).
PPSField contains the Slice Parameter Set (PPS).
Note: The Video Parameter Set (VPS) is not contained in CodecPrivateData, but should be contained in
the file header of stored files in the ‘hvcC’ box. Systems using Smooth Streaming Protocol must signal
additional decoding parameters (for example, HEVC Tier) using the Custom Attribute “codecs.”
2 .2 .2 .5 .1 C u st o m A t t r i b u t e s El e m e n t
2.2.6 StreamFragmentElement
The SmoothStreamingMedia’s MajorVersion field MUST be set to 2, and MinorVersion field MUST be set
to 2. (No Change)
2 .2 .2 .6 .1 T r a c k F r a g m e n t El e m e n t
The TfxdBox is deprecated, and its function replaced by the Track Fragment Decode Time Box (‘tfdt’) specified in
[ISO/IEC 14496-12] section 8.8.12.
Note : A client may calculate the duration of a fragment by summing the sample durations listed in the Track
Run Box (‘trun’) or multiplying the number of samples times the default sample duration. The
baseMediaDecodeTime in ‘tfdt’ plus fragment duration equals the URL time parameter for the next fragment.
A Producer Reference Time Box (‘prft’) SHOULD be inserted prior to a Movie Fragment Box (‘moof’) as needed,
to indicate the UTC time corresponding to the Track Fragment Decode Time of the first sample referenced by
the Movie Fragment Box, as specified in [ISO/IEC 14496-12] section 8.16.5.
2.2.4.5 TfrfBox
The TfrfBox is deprecated, and its function replaced by the Track Fragment Decode Time Box (‘tfdt’) specified in
[ISO/IEC 14496-12] section 8.8.12.
Note : A client may calculate the duration of a fragment by summing the sample durations listed in the Track
Run Box (‘trun’) or multiplying the number of samples times the default sample duration. The
baseMediaDecodeTime in ‘tfdt’ plus fragment duration equals the URL time parameter for the next fragment.
Look ahead addresses are deprecated because they delay live streaming.
2.2.4.6 TfhdBox
The TfhdBox and related fields encapsulate defaults for per sample metadata in the fragment. The syntax of the
TfhdBox field is a strict subset of the syntax of the Track Fragment Header Box defined in [ISO/IEC-14496-12]
section 8.8.7.
BaseDataOffset (8 bytes): The offset, in bytes, from the beginning of the MdatBox field to the sample field
in the MdatBox field. To signal this restriction, the default-base-is-moof flag (0x020000) must be set.
2.2.4.7 TrunBox
The TrunBox and related fields encapsulate per sample metadata for the requested fragment. The syntax of
TrunBox is a strict subset of the Version 1 Track Fragment Run Box defined in [ISO/IEC-14496-12] section 8.8.8.
SampleCompositionTimeOffset (4 bytes): The Sample Composition Time offset of each sample adjusted
so that the presentation time of the first presented sample in the fragment is equal to the decode time of the
first decoded sample. Negative video sample composition offsets SHALL be used,
as defined in [ISO/IEC-14496-12].
Note: This avoids a video synchronization error caused by video lagging audio equal to the largest decoded
picture buffer removal delay, and maintains presentation timing between alternative fragments that may have
different removal delays.
The syntax of the fields defined in this section, specified in ABNF [RFC5234], remains the same, except as
follows:
SampleCompositionTimeOffset = SIGNED_INT32
2.2.4.8 MdatBox
2.2.4.9 Fragment Response Common Fields
FileType (variable): specifies the subtype and intended use of the MPEG-4 ([MPEG4-RA]) file, and high-level
attributes.
MajorBrand (variable): The major brand of the media file. MUST be set to "isml."
MinorVersion (variable): The minor version of the media file. MUST be set to 1.
CompatibleBrands (variable): Specifies the supported brands of MPEG-4. MUST include "ccff" and "iso8."
The syntax of the fields defined in this section, specified in ABNF [RFC5234], is as follows:
FileType = MajorBrand MinorVersion CompatibleBrands
MajorBrand = STRING_UINT32
MinorVersion = STRING_UINT32
CompatibleBrands = "ccff" "iso8" 0\*(STRING_UINT32)
Note : The compatibility brands ‘ccff’ and ‘iso8’ indicate that fragments conform to “Common Container File
Format” and Common Encryption [ISO/IEC 23001-7] and ISO Base Media File Format Edition 4 [ISO/IEC 14496-12].
2.2.7.2 StreamManifestBox
2 .2 .7.2 .1 St r e a m SM I L
2.2.7.3 LiveServerManifestBox
2 .2 .7.3 .1 L i v e SM I L
2.2.7.4 MoovBox
2.2.7.5 Fragment
2 .2 .7.5 .1 T r a c k F r a g m e n t Ex t e n d e d H e a d e r
3 Protocol Details
3.1 Client Details
3.1.1 Abstract Data Model
3.1.1.1 Presentation Description
The Presentation Description data element encapsulates all metadata for the presentation.
Presentation Metadata: A set of metadata that is common to all streams in the presentation. Presentation
Metadata comprises the following fields, specified in section 2.2.2.1:
MajorVersion
MinorVersion
TimeScale
Duration
IsLive
LookaheadCount
DVRWindowLength
Presentations containing HEVC Streams SHALL set:
MajorVersion = 2
MinorVersion = 2
TimeScale = 90000
Stream Collection: A collection of Stream Description data elements, as specified in section 3.1.1.1.2.
Protection Description: A collection of Protection System Metadata Description data elements, as specified in
section 3.1.1.1.1.
3 .1 .1 .1 .1 P r o t e c t i o n Sy st e m M e t a d a t a D e sc r i p t i o n
The Protection System Metadata Description data element encapsulates metadata specific to a single Content
Protection System. (No Change)
Protection Header Description: Content protection metadata that pertains to a single Content Protection
System. Protection Header Description comprises the following fields, specified in section 2.2.2.2:
SystemID
ProtectionHeaderContent
3 .1 .1 .1 .2 St r e a m D e sc r i p t i o n
3.1.1.1.2.1 Tra c k De s c ri p t i o n
3.1.1.1.2.1.1 C u s t o m A t t ri b u t e De s c ri p t i o n
3 .1 .1 .3 F r a g m e n t R e fe r e n c e D e sc r i p t i o n
3.1.1.3.1 Tra c k -Sp e c i f i c F ra g me n t R e f e re n c e De s c ri p t i o n
3.1.2 Timers
3.1.3 Initialization
3.1.4 Higher-Layer Triggered Events
3.1.4.1 Open Presentation
3.1.4.2 Get Fragment
3.1.4.3 Close Presentation
ProtectionElement 2.2.2.2
Provide feedback
Use the User Voice forum to provide feedback and make suggestions on how to improve Azure Media Services. You
also can go directly to one of the following categories:
Azure Media Player
Client SDK libraries
Encoding and processing
Live streaming
Media Analytics
Azure portal
REST API and platform
Video-on-demand streaming
Azure Media Services fragmented MP4 live ingest
specification
9/22/2020 • 17 minutes to read • Edit Online
This specification describes the protocol and format for fragmented MP4-based live streaming ingestion for Azure
Media Services. Media Services provides a live streaming service that customers can use to stream live events and
broadcast content in real time by using Azure as the cloud platform. This document also discusses best practices
for building highly redundant and robust live ingest mechanisms.
1. Conformance notation
The key words "MUST," "MUST NOT," "REQUIRED," "SHALL," "SHALL NOT," "SHOULD," "SHOULD NOT,"
"RECOMMENDED," "MAY," and "OPTIONAL" in this document are to be interpreted as they are described in RFC
2119.
2. Service diagram
The following diagram shows the high-level architecture of the live streaming service in Media Services:
1. A live encoder pushes live feeds to channels that are created and provisioned via the Azure Media Services SDK.
2. Channels, programs, and streaming endpoints in Media Services handle all the live streaming functionalities,
including ingest, formatting, cloud DVR, security, scalability, and redundancy.
3. Optionally, customers can choose to deploy an Azure Content Delivery Network layer between the streaming
endpoint and the client endpoints.
4. Client endpoints stream from the streaming endpoint by using HTTP Adaptive Streaming protocols. Examples
include Microsoft Smooth Streaming, Dynamic Adaptive Streaming over HTTP (DASH, or MPEG-DASH), and
Apple HTTP Live Streaming (HLS).
Requirements
Here are the detailed requirements:
1. The encoder SHOULD start the broadcast by sending an HTTP POST request with an empty “body” (zero content
length) by using the same ingestion URL. This can help the encoder quickly detect whether the live ingestion
endpoint is valid, and if there are any authentication or other conditions required. Per HTTP protocol, the server
can't send back an HTTP response until the entire request, including the POST body, is received. Given the long-
running nature of a live event, without this step, the encoder might not be able to detect any error until it
finishes sending all the data.
2. The encoder MUST handle any errors or authentication challenges because of (1). If (1) succeeds with a 200
response, continue.
3. The encoder MUST start a new HTTP POST request with the fragmented MP4 stream. The payload MUST start
with the header boxes, followed by fragments. Note that the ftyp , Live Ser ver Manifest Box , and moov
boxes (in this order) MUST be sent with each request, even if the encoder must reconnect because the previous
request was terminated prior to the end of the stream.
4. The encoder MUST use chunked transfer encoding for uploading, because it’s impossible to predict the entire
content length of the live event.
5. When the event is over, after sending the last fragment, the encoder MUST gracefully end the chunked transfer
encoding message sequence (most HTTP client stacks handle it automatically). The encoder MUST wait for the
service to return the final response code, and then terminate the connection.
6. The encoder MUST NOT use the Events() noun as described in 9.2 in [1] for live ingestion into Media Services.
7. If the HTTP POST request terminates or times out with a TCP error prior to the end of the stream, the encoder
MUST issue a new POST request by using a new connection, and follow the preceding requirements.
Additionally, the encoder MUST resend the previous two MP4 fragments for each track in the stream, and
resume without introducing a discontinuity in the media timeline. Resending the last two MP4 fragments for
each track ensures that there is no data loss. In other words, if a stream contains both an audio and a video
track, and the current POST request fails, the encoder must reconnect and resend the last two fragments for the
audio track, which were previously successfully sent, and the last two fragments for the video track, which were
previously successfully sent, to ensure that there is no data loss. The encoder MUST maintain a “forward” buffer
of media fragments, which it resends when it reconnects.
5. Timescale
[MS-SSTR] describes the usage of timescale for SmoothStreamingMedia (Section 2.2.2.1), StreamElement
(Section 2.2.2.3), StreamFragmentElement (Section 2.2.2.6), and LiveSMIL (Section 2.2.7.3.1). If the timescale
value is not present, the default value used is 10,000,000 (10 MHz). Although the Smooth Streaming format
specification doesn’t block usage of other timescale values, most encoder implementations use this default value
(10 MHz) to generate Smooth Streaming ingest data. Due to the Azure Media Dynamic Packaging feature, we
recommend that you use a 90-KHz timescale for video streams and 44.1 KHz or 48.1 KHz for audio streams. If
different timescale values are used for different streams, the stream-level timescale MUST be sent. For more
information, see [MS-SSTR].
6. Definition of “stream”
Stream is the basic unit of operation in live ingestion for composing live presentations, handling streaming failover,
and redundancy scenarios. Stream is defined as one unique, fragmented MP4 bitstream that might contain a single
track or multiple tracks. A full live presentation might contain one or more streams, depending on the
configuration of the live encoders. The following examples illustrate various options of using streams to compose a
full live presentation.
Example:
A customer wants to create a live streaming presentation that includes the following audio/video bitrates:
Video – 3000 kbps, 1500 kbps, 750 kbps
Audio – 128 kbps
Option 1: All tracks in one stream
In this option, a single encoder generates all audio/video tracks, and then bundles them into one fragmented MP4
bitstream. The fragmented MP4 bitstream is then sent via a single HTTP POST connection. In this example, there is
only one stream for this live presentation.
Summary
This is not an exhaustive list of all possible ingestion options for this example. As a matter of fact, any grouping of
tracks into streams is supported by live ingestion. Customers and encoder vendors can choose their own
implementations based on engineering complexity, encoder capacity, and redundancy and failover considerations.
However, in most cases, there is only one audio track for the entire live presentation. So, it’s important to ensure
the healthiness of the ingest stream that contains the audio track. This consideration often results in putting the
audio track in its own stream (as in Option 2) or bundling it with the lowest-bitrate video track (as in Option 3).
Also, for better redundancy and fault tolerance, sending the same audio track in two different streams (Option 2
with redundant audio tracks) or bundling the audio track with at least two of the lowest-bitrate video tracks (Option
3 with audio bundled in at least two video streams) is highly recommended for live ingest into Media Services.
7. Service failover
Given the nature of live streaming, good failover support is critical for ensuring the availability of the service.
Media Services is designed to handle various types of failures, including network errors, server errors, and storage
issues. When used in conjunction with proper failover logic from the live encoder side, customers can achieve a
highly reliable live streaming service from the cloud.
In this section, we discuss service failover scenarios. In this case, the failure happens somewhere within the service,
and it manifests itself as a network error. Here are some recommendations for the encoder implementation for
handling service failover:
1. Use a 10-second timeout for establishing the TCP connection. If an attempt to establish the connection takes
longer than 10 seconds, abort the operation and try again.
2. Use a short timeout for sending the HTTP request message chunks. If the target MP4 fragment duration is N
seconds, use a send timeout between N and 2 N seconds; for example, if the MP4 fragment duration is 6
seconds, use a timeout of 6 to 12 seconds. If a timeout occurs, reset the connection, open a new connection,
and resume stream ingest on the new connection.
3. Maintain a rolling buffer that has the last two fragments for each track that were successfully and
completely sent to the service. If the HTTP POST request for a stream is terminated or times out prior to the
end of the stream, open a new connection and begin another HTTP POST request, resend the stream
headers, resend the last two fragments for each track, and resume the stream without introducing a
discontinuity in the media timeline. This reduces the chance of data loss.
4. We recommend that the encoder does NOT limit the number of retries to establish a connection or resume
streaming after a TCP error occurs.
5. After a TCP error:
a. The current connection MUST be closed, and a new connection MUST be created for a new HTTP POST
request.
b. The new HTTP POST URL MUST be the same as the initial POST URL.
c. The new HTTP POST MUST include stream headers (ftyp , Live Ser ver Manifest Box , and moov boxes)
that are identical to the stream headers in the initial POST.
d. The last two fragments sent for each track must be resent, and streaming must resume without
introducing a discontinuity in the media timeline. The MP4 fragment timestamps must increase
continuously, even across HTTP POST requests.
6. The encoder SHOULD terminate the HTTP POST request if data is not being sent at a rate commensurate
with the MP4 fragment duration. An HTTP POST request that does not send data can prevent Media Services
from quickly disconnecting from the encoder in the event of a service update. For this reason, the HTTP
POST for sparse (ad signal) tracks SHOULD be short-lived, terminating as soon as the sparse fragment is
sent.
8. Encoder failover
Encoder failover is the second type of failover scenario that needs to be addressed for end-to-end live streaming
delivery. In this scenario, the error condition occurs on the encoder side.
The following expectations apply from the live ingestion endpoint when encoder failover happens:
1. A new encoder instance SHOULD be created to continue streaming, as illustrated in the diagram (Stream for
3000k video, with dashed line).
2. The new encoder MUST use the same URL for HTTP POST requests as the failed instance.
3. The new encoder’s POST request MUST include the same fragmented MP4 header boxes as the failed instance.
4. The new encoder MUST be properly synced with all other running encoders for the same live presentation to
generate synced audio/video samples with aligned fragment boundaries.
5. The new stream MUST be semantically equivalent with the previous stream, and interchangeable at the header
and fragment levels.
6. The new encoder SHOULD try to minimize data loss. The fragment_absolute_time and fragment_index of media
fragments SHOULD increase from the point where the encoder last stopped. The fragment_absolute_time and
fragment_index SHOULD increase in a continuous manner, but it is permissible to introduce a discontinuity, if
necessary. Media Services ignores fragments that it has already received and processed, so it's better to err on
the side of resending fragments than to introduce discontinuities in the media timeline.
9. Encoder redundancy
For certain critical live events that demand even higher availability and quality of experience, we recommended
that you use active-active redundant encoders to achieve seamless failover with no data loss.
As illustrated in this diagram, two groups of encoders push two copies of each stream simultaneously into the live
service. This setup is supported because Media Services can filter out duplicate fragments based on stream ID and
fragment timestamp. The resulting live stream and archive is a single copy of all the streams that is the best
possible aggregation from the two sources. For example, in a hypothetical extreme case, as long as there is one
encoder (it doesn’t have to be the same one) running at any given point in time for each stream, the resulting live
stream from the service is continuous without data loss.
The requirements for this scenario are almost the same as the requirements in the "Encoder failover" case, with the
exception that the second set of encoders are running at the same time as the primary encoders.
Provide feedback
Use the User Voice forum to provide feedback and make suggestions on how to improve Azure Media Services.
You also can go directly to one of the following categories:
Azure Media Player
Client SDK libraries
Encoding and processing
Live streaming
Media Analytics
Azure portal
REST API and platform
Video-on-demand streaming