You are on page 1of 48

Contents

Azure Media Services v3


Azure Media Services v2 (legacy)
Azure Media Services Video Indexer
Azure Media Player
Specifications
Timed Metadata
RTMP & Smooth Streaming
Protocols
Smooth Streaming (MS-SSTR)
HEVC support
Smooth Streaming ingest
Resources
Azure Roadmap
Pricing calculator
Azure Media Services overview
9/22/2020 • 3 minutes to read • Edit Online

NOTE
No new features are being added to Media Services v2.
Check out the latest version, Media Services v3. Also, see migration guidance from v2 to v3

Microsoft Azure Media Services (AMS) is an extensible cloud-based platform that enables developers to build
scalable media management and delivery applications. Media Services is based on REST APIs that enable you to
securely upload, store, encode, and package video or audio content for both on-demand and live streaming
delivery to various clients (for example, TV, PC, and mobile devices).
You can build end-to-end workflows using entirely Media Services. You can also choose to use third-party
components for some parts of your workflow. For example, encode using a third-party encoder. Then, upload,
protect, package, deliver using Media Services. You can choose to stream your content live or deliver content on-
demand.

Compliance, Privacy and Security


As an important reminder, you must comply with all applicable laws in your use of Azure Media Services, and you
may not use Media Services or any Azure service in a manner that violates the rights of others, or that may be
harmful to others.
Before uploading any video/image to Media Services, You must have all the proper rights to use the video/image,
including, where required by law, all the necessary consents from individuals (if any) in the video/image, for the
use, processing, and storage of their data in Media Services and Azure. Some jurisdictions may impose special legal
requirements for the collection, online processing and storage of certain categories of data, such as biometric data.
Before using Media Services and Azure for the processing and storage of any data subject to special legal
requirements, You must ensure compliance with any such legal requirements that may apply to You.
To learn about compliance, privacy and security in Media Services please visit the Microsoft Trust Center. For
Microsoft’s privacy obligations, data handling and retention practices, including how to delete your data, please
review Microsoft’s Privacy Statement, the Online Services Terms (“OST”) and Data Processing Addendum (“DPA”).
By using Media Services, you agree to be bound by the OST, DPA and the Privacy Statement.

Prerequisites
To start using Azure Media Services, you should have the following:
An Azure account. If you don't have an account, you can create a free trial account in just a couple of
minutes. For details, see Azure Free Trial.
An Azure Media Services account. For more information, see Create Account.
(Optional) Set up development environment. Choose .NET or REST API for your development environment.
For more information, see Set up environment.
Also, learn how to connect programmatically to AMS API.
A standard or premium streaming endpoint in started state. For more information, see Managing streaming
endpoints

SDKs and tools


To build Media Services solutions, you can use:
Media Services REST API
One of the available client SDKs:
Azure Media Services SDK for .NET
NuGet package
GitHub source code
Azure SDK for Java,
Azure PHP SDK,
Azure Media Services for Node.js (This is a non-Microsoft version of a Node.js SDK. It is maintained
by a community and currently does not have a 100% coverage of the AMS APIs).
Existing tools:
Azure portal
Azure-Media-Services-Explorer (Azure Media Services Explorer (AMSE) is a Winforms/C# application for
Windows)

NOTE
To get the latest version of Java SDK and get started developing with Java, see Get started with the Java client SDK for Media
Services.
To download the latest PHP SDK for Media Services, look for version 0.5.7 of the Microsoft/WindowAzure package in the
Packagist repository.

Code samples
Find multiple code samples in the Azure Code Samples gallery: Azure Media Services code samples.

Concepts
For Azure Media Services concepts, see Concepts.

Supported scenarios and availability of Media Services across data


centers
For detailed information, see AMS scenarios and availability of features and services across data centers.

Service Level Agreement (SLA)


For more information, see Microsoft Azure SLA.
For information about availability in datacenters, see the Availability section.

Support
Azure Support provides support options for Azure, including Media Services.

Provide feedback
Use the User Voice forum to provide feedback and make suggestions on how to improve Azure Media Services.
You also can go directly to one of the following categories:
Azure Media Player
Client SDK libraries
Encoding and processing
Live streaming
Media Analytics
Azure portal
REST API and platform
Video-on-demand streaming
Signaling Timed Metadata in Live Streaming
9/22/2020 • 35 minutes to read • Edit Online

Last Updated: 2019-08-22


Conformance Notation
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119

1. Introduction
In order to signal the insertion of advertisements or custom metadata events on a client player, broadcasters often
make use of timed metadata embedded within the video. To enable these scenarios, Media Services provides
support for the transport of timed metadata from the ingest point of the live streaming channel to the client
application. This specification outlines several modes that are supported by Media Services for timed metadata
within live streaming signals.
1. [SCTE-35] signaling that complies with the standards outlined by [SCTE-35], [SCTE-214-1], [SCTE-214-3] and
[RFC8216]
2. [SCTE-35] signaling that complies with the legacy [Adobe-Primetime] specification for RTMP ad signaling.
3. A generic timed metadata signaling mode, for messages that are NOT [SCTE-35] and could carry [ID3v2] or
other custom schemas defined by the application developer.

1.1 Terms Used


T ERM DEF IN IT IO N

Ad Break A location or point in time where one or more ads may be


scheduled for delivery; same as avail and placement
opportunity.

Ad Decision Service external service that decides which ad(s) and durations will be
shown to the user. The services is typically provided by a
partner and are out of scope for this document.

Cue Indication of time and parameters of the upcoming ad break.


Note that cues can indicate a pending switch to an ad break,
pending switch to the next ad within an ad break, and
pending switch from an ad break to the main content.

Packager The Azure Media Services "Streaming Endpoint" provides


dynamic packaging capabilities for DASH and HLS and is
referred to as a "Packager" in the media industry.

Presentation Time The time that an event is presented to a viewer. The time
represents the moment on the media timeline that a viewer
would see the event. For example, the presentation time of a
SCTE-35 splice_info() command message is the splice_time().
T ERM DEF IN IT IO N

Arrival Time The time that an event message arrives. The time is typically
distinct from the presentation time of the event, since event
messages are sent ahead of the presentation time of the
event.

Sparse track media track that is not continuous, and is time synchronized
with a parent or control track.

Origin The Azure Media Streaming Service

Channel Sink The Azure Media Live Streaming Service

HLS Apple HTTP Live Streaming protocol

DASH Dynamic Adaptive Streaming Over HTTP

Smooth Smooth Streaming Protocol

MPEG2-TS MPEG 2 Transport Streams

RTMP Real-Time Multimedia Protocol

uimsbf Unsigned integer, most significant bit first.

1.2 Normative References


The following documents contain provisions, which, through reference in this text, constitute provisions of this
document. All documents are subject to revision by the standards bodies, and readers are encouraged to
investigate the possibility of applying the most recent editions of the documents listed below. Readers are also
reminded that newer editions of the referenced documents might not be compatible with this version of the timed
metadata specification for Azure Media Services.

STA N DA RD DEF IN IT IO N

[Adobe-Primetime] Primetime Digital Program Insertion Signaling Specification 1.2

[Adobe-Flash-AS] FLASH ActionScript Language Reference

[AMF0] "Action Message Format AMF0"

[DASH-IF-IOP] DASH Industry Forum Interop Guidance v 4.2 https://dashif-


documents.azurewebsites.net/DASH-IF-IOP/master/DASH-IF-
IOP.html

[HLS-TMD] Timed Metadata for HTTP Live Streaming -


https://developer.apple.com/streaming

[CMAF-ID3] Timed Metadata in the Common Media Application Format


(CMAF)

[ID3v2] ID3 Tag version 2.4.0 http://id3.org/id3v2.4.0-structure


STA N DA RD DEF IN IT IO N

[ISO-14496-12] ISO/IEC 14496-12: Part 12 ISO base media file format,


FourthEdition 2012-07-15

[MPEGDASH] Information technology -- Dynamic adaptive streaming over


HTTP (DASH) -- Part 1: Media presentation description and
segment formats. May 2014. Published. URL:
https://www.iso.org/standard/65274.html

[MPEGCMAF] Information technology -- Multimedia application format


(MPEG-A) -- Part 19: Common media application format
(CMAF) for segmented media. January 2018. Published. URL:
https://www.iso.org/standard/71975.html

[MPEGCENC] Information technology -- MPEG systems technologies -- Part


7: Common encryption in ISO base media file format files.
February 2016. Published. URL:
https://www.iso.org/standard/68042.html

[MS-SSTR] "Microsoft Smooth Streaming Protocol", May 15, 2014

[MS-SSTR-Ingest] Azure Media Services Fragmented MP4 Live Ingest


Specification

[RFC8216] R. Pantos, Ed.; W. May. HTTP Live Streaming. August 2017.


Informational. https://tools.ietf.org/html/rfc8216

[RFC4648] The Base16, Base32, and Base64 Data Encodings -


https://tools.ietf.org/html/rfc4648

[RTMP] "Adobe's Real-Time Messaging Protocol", December 21, 2012

[SCTE-35-2019] SCTE 35: 2019 - Digital Program Insertion Cueing Message


for Cable -
https://www.scte.org/SCTEDocs/Standards/ANSI_SCTE%2035%
202019r1.pdf

[SCTE-214-1] SCTE 214-1 2016 – MPEG DASH for IP-Based Cable Services
Part 1: MPD Constraints and Extensions

[SCTE-214-3] SCTE 214-3 2015 MPEG DASH for IP-Based Cable Services
Part 3: DASH/FF Profile

[SCTE-224] SCTE 224 2018r1 – Event Scheduling and Notification


Interface

[SCTE-250] Event and Signaling Management API (ESAM)

2. Timed Metadata Ingest


Azure Media Services supports real-time in-band metadata for both [RTMP] and Smooth Streaming [MS-SSTR-
Ingest] protocols. Real-time metadata can be used to define custom events, with your own unique custom schemas
(JSON, Binary, XML), as well as industry defined formats like ID3, or SCTE-35 for ad signaling in a broadcast stream.
This article provides the details for how to send custom timed metadata signals using the supported ingest
protocols of Azure Media Services. The article also explains how the manifests for HLS, DASH, and Smooth
Streaming are decorated with the timed metadata signals, as well as how it is carried in-band when the content is
delivered using CMAF (MP4 fragments) or Transport Stream (TS) segments for HLS.
Common use case scenarios for timed metadata include:
SCTE-35 ad signals to trigger ad breaks in a live event or linear broadcast
Custom ID3 metadata that can trigger events at a client application (browser, iOS, or Android)
Custom defined JSON, Binary, or XML metadata to trigger events at a client application
Telemetry from a live encoder, IP Camera or Drone
Events from an IP Camera like Motion, face detection, etc.
Geographic position information from an action camera, drone, or moving device
Song lyrics
Program boundaries on a linear live feed
Images or augmented metadata to be displayed on a live feed
Sports scores or game-clock information
Interactive advertising packages to be displayed alongside the video in the browser
Quizzes or polls
Azure Media Services Live Events and Packager are capable of receiving these timed metadata signals and
converting them into a stream of metadata that can reach client applications using standards-based protocols like
HLS and DASH.

2.1 RTMP Timed Metadata


The [RTMP] protocol allows for timed metadata signals to be sent for various scenarios including custom metadata,
and SCTE-35 ad signals.
Advertising signals (cue messages) are sent as [AMF0] cue messages embedded within the [RTMP] stream. The cue
messages may be sent sometime before the actual event or [SCTE35] ad splice signal needs to occur. To support
this scenario, the actual presentation timestamp of the event is sent within the cue message. For more information,
see [AMF0].
The following [AMF0] commands are supported by Azure Media Services for RTMP ingest:
onUserDataEvent - used for custom metadata or [ID3v2] timed metadata
onAdCue - used primarily for signaling an advertisement placement opportunity in the live stream. Two forms
of the cue are supported, a simple mode and a "SCTE-35" mode.
onCuePoint - supported by certain on-premises hardware encoders, like the Elemental Live encoder, to signal
[SCTE35] messages.
The following table describes the format of the AMF message payload that Media Services will ingest for both
"simple" and [SCTE35] message modes.
The name of the [AMF0] message can be used to differentiate multiple event streams of the same type. For both
[SCTE-35] messages and "simple" mode, the name of the AMF message MUST be "onAdCue" as required in the
[Adobe-Primetime] specification. Any fields not listed below SHALL be ignored by Azure Media Services at ingest.

2.1.1 RTMP with custom metadata using "onUserDataEvent"


If you want to provide custom metadata feeds from your upstream encoder, IP Camera, Drone, or device using the
RTMP protocol, use the "onUserDataEvent" [AMF0] data message command type.
The "onUserDataEvent" data message command MUST carry a message payload with the following definition to
be captured by Media Services and packaged into the in-band file format as well as the manifests for HLS, DASH
and Smooth Streaming. It is recommended to send timed-metadata messages no more frequently than once every
0.5 seconds (500ms) or stability issues with the live stream may occur. Each message could aggregate metadata
from multiple frames if you need to provide frame-level metadata. If you are sending multi-bitrate streams, it is
recommended that you also provide the metadata on a single bitrate only to reduce the bandwidth and avoid
interference with video/audio processing.
The payload for the "onUserDataEvent" should be an [MPEGDASH] EventStream XML format message. This
makes it easy to pass in custom defined schemas that can be carried in 'emsg' payloads in-band for CMAF
[MPEGCMAF] content that is delivered over HLS or DASH protocols. Each DASH Event Stream message contains a
schemeIdUri that functions as a URN message scheme identifier and defines the payload of the message. Some
schemes such as "https://aomedia.org/emsg/ID3" for [ID3v2], or urn:scte:scte35:2013:bin for [SCTE-35] are
standardized by industry consortia for interoperability. Any application provider can define their own custom
scheme using a URL that they control (owned domain) and may provide a specification at that URL if they choose. If
a player has a handler for the defined scheme, then that is the only component that needs to understand the
payload and protocol.
The schema for the [MPEG-DASH] EventStream XML payload is defined as (excerpt from DASH ISO-IEC-23009-1-
3rd Edition). Note that only one "EventType" per "EventStream" is supported at this time. Only the first Event
element will be processed if multiple events are provided in the EventStream .

<!-- Event Stream -->


<xs:complexType name="EventStreamType">
<xs:sequence>
<xs:element name="Event" type="EventType" minOccurs="0" maxOccurs="unbounded"/>
<xs:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute ref="xlink:href"/>
<xs:attribute ref="xlink:actuate" default="onRequest"/>
<xs:attribute name="schemeIdUri" type="xs:anyURI" use="required"/>
<xs:attribute name="value" type="xs:string"/>
<xs:attribute name="timescale" type="xs:unsignedInt"/>
</xs:complexType>
<!-- Event -->
<xs:complexType name="EventType">
<xs:sequence>
<xs:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="presentationTime" type="xs:unsignedLong" default="0"/>
<xs:attribute name="duration" type="xs:unsignedLong"/>
<xs:attribute name="id" type="xs:unsignedInt"/>
<xs:attribute name="contentEncoding" type="ContentEncodingType"/>
<xs:attribute name="messageData" type="xs:string"/>
<xs:anyAttribute namespace="##other" processContents="lax"/>
</xs:complexType>

Example XML Event Stream with ID3 schema ID and base64-encoded data payload.

<?xml version="1.0" encoding="UTF-8"?>


<EventStream schemeIdUri="https://aomedia.org/emsg/ID3">
<Event contentEncoding="Base64">
-- base64 encoded ID3v2 full payload here per [CMAF-TMD] --
</Event>
<EventStream>

Example Event Stream with custom schema ID and base64-encoded binary data
<?xml version="1.0" encoding="UTF-8"?>
<EventStream schemeIdUri="urn:example.org:custom:binary">
<Event contentEncoding="Base64">
-- base64 encoded custom binary data message --
</Event>
<EventStream>

Example Event Stream with custom schema ID and custom JSON

<?xml version="1.0" encoding="UTF-8"?>


<EventStream schemeIdUri="urn:example.org:custom:JSON">
<Event>
[
{"key1" : "value1"},
{"key2" : "value2"}
]
</Event>
<EventStream>

Built-in supported Scheme ID URIs


SC H EM E ID URI DESC RIP T IO N

https://aomedia.org/emsg/ID3 Describes how [ID3v2] metadata can be carried as timed


metadata in a CMAF-compatible [MPEGCMAF] fragmented
MP4. For more information see the Timed Metadata in the
Common Media Application Format (CMAF)

Event processing and manifest signaling


On receipt of a valid "onUserDataEvent" event, Azure Media Services will look for a valid XML payload that
matches the EventStreamType (defined in [MPEGDASH] ), parse the XML payload and convert it into an
[MPEGCMAF] MP4 fragment 'emsg' version 1 box for storage in the live archive and transmission to the Media
Services Packager. The Packager will detect the 'emsg' box in the live stream and:
(a) "dynamically package" it into TS segments for delivery to HLS clients in compliance with the HLS timed
metadata specification [HLS-TMD], or
(b) pass it through for delivery in CMAF fragments via HLS or DASH, or
(c) convert it into a sparse track signal for delivery via Smooth Streaming [MS-SSTR].
In addition to the in-band 'emsg' format CMAF or TS PES packets for HLS, the manifests for DASH (MPD), and
Smooth Streaming will contain a reference to the in-band event streams (also known as sparse stream track in
Smooth Streaming).
Individual events or their data payloads are NOT output directly in the HLS, DASH, or Smooth manifests.
Additional informational constraints and defaults for onUserDataEvent events
If the timescale is not set in the EventStream element, the RTMP 1 kHz timescale is used by default
Delivery of an onUserDataEvent message is limited to once every 500ms max. If you send events more
frequently, it can impact the bandwidth and the stability of the live feed

2.1.2 RTMP ad cue signaling with "onAdCue"


Azure Media Services can listen and respond to several [AMF0] message types which can be used to signal various
real time synchronized metadata in the live stream. The [Adobe-Primetime] specification defines two cue types
called "simple" and "SCTE-35" mode. For "simple" mode, Media Services supports a single AMF cue message called
"onAdCue" using a payload that matches the table below defined for the "Simple Mode" signal.
The following section shows RTMP "simple" mode" payload, which can be used to signal a basic "spliceOut" ad
signal that will be carried through to the client manifest for HLS, DASH, and Microsoft Smooth Streaming. This is
very useful for scenarios where the customer does not have a complex SCTE-35 based ad signaling deployment or
insertion system, and is using a basic on-premises encoder to send in the cue message via an API. Typically the on-
premises encoder will support a REST-based API to trigger this signal, which will also "splice-condition" the video
stream by inserting an IDR frame into the video, and starting a new GOP.

2.1.3 RTMP ad cue signaling with "onAdCue" - Simple Mode


F IEL D N A M E F IEL D T Y P E REQ UIRED? DESC RIP T IO N S

type String Required The event message. Shall be


"SpliceOut" to designate a
simple mode splice.

id String Required A unique identifier describing


the splice or segment.
Identifies this instance of the
message

duration Number Required The duration of the splice.


Units are fractional seconds.

elapsed Number Optional When the signal is being


repeated in order to support
tune in, this field shall be the
amount of presentation time
that has elapsed since the
splice began. Units are
fractional seconds. When
using simple mode, this
value should not exceed the
original duration of the
splice.

time Number Required Shall be the time of the


splice, in presentation time.
Units are fractional seconds.

Example MPEG DASH manifest output when using Adobe RTMP simple mode
See example 3.3.2.1 MPEG DASH .mpd EventStream using Adobe simple mode
See example 3.3.3.1 DASH manifest with single period and Adobe simple mode
Example HLS manifest output when using Adobe RTMP simple mode
See example 3.2.2 HLS manifest using Adobe simple mode and EXT-X-CUE tag

2.1.4 RTMP ad cue signaling with "onAdCue" - SCTE-35 Mode


When you are working with a more advanced broadcast production workflow that requires the full SCTE-35
payload message to be carried through to the HLS or DASH manifest, it is best to use the "SCTE-35 Mode" of the
[Adobe-Primetime] specification. This mode supports in-band SCTE-35 signals being sent directly into an on-
premises live encoder, which then encodes the signals out into the RTMP stream using the "SCTE-35 Mode"
specified in the [Adobe-Primetime] specification.
Typically SCTE-35 messages can appear only in MPEG-2 transport stream (TS) inputs on an on-premises encoder.
Check with your encoder manufacturer for details on how to configure a transport stream ingest that contains
SCTE-35 and enable it for pass-through to RTMP in Adobe SCTE-35 mode.
In this scenario, the following payload MUST be sent from the on-premises encoder using the "onAdCue" [AMF0]
message type.

F IEL D N A M E F IEL D T Y P E REQ UIRED? DESC RIP T IO N S

cue String Required The event message. For


[SCTE-35] messages, this
MUST be the base64-
encoded [RFC4648] binary
splice_info_section() in order
for messages to be sent to
HLS, Smooth, and Dash
clients.

type String Required A URN or URL identifying


the message scheme. For
[SCTE-35] messages, this
SHOULD be "scte35" in
order for messages to be
sent to HLS, Smooth, and
Dash clients, in compliance
with [Adobe-Primetime].
Optionally, the URN
"urn:scte:scte35:2013:bin"
may also be used to signal a
[SCTE-35] message.

id String Required A unique identifier describing


the splice or segment.
Identifies this instance of the
message. Messages with
equivalent semantics shall
have the same value.

duration Number Required The duration of the event or


ad splice-segment, if known.
If unknown, the value
SHOULD be 0.

elapsed Number Optional When the [SCTE-35] ad


signal is being repeated in
order to tune in, this field
shall be the amount of
presentation time that has
elapsed since the splice
began. Units are fractional
seconds. In [SCTE-35] mode,
this value may exceed the
original specified duration of
the splice or segment.
F IEL D N A M E F IEL D T Y P E REQ UIRED? DESC RIP T IO N S

time Number Required The presentation time of the


event or ad splice. The
presentation time and
duration SHOULD align
with Stream Access Points
(SAP) of type 1 or 2, as
defined in [ISO-14496-12]
Annex I. For HLS egress, time
and duration SHOULD align
with segment boundaries.
The presentation time and
duration of different event
messages within the same
event stream MUST not
overlap. Units are fractional
seconds.

Example HLS manifest .m3u8 with SCTE-35 mode signal


See Section 3.2.1.1 example HLS manifest with SCTE-35

2.1.5 RTMP Ad signaling with "onCuePoint" for Elemental Live


The Elemental Live on-premises encoder supports ad markers in the RTMP signal. Azure Media Services currently
only supports the "onCuePoint" Ad Marker type for RTMP. This can be enabled in the Adobe RTMP Group Settings
in the Elemental Media Live encoder settings or API by setting the "ad_markers " to "onCuePoint". Please refer to
the Elemental Live documentation for details. Enabling this feature in the RTMP Group will pass SCTE-35 signals to
the Adobe RTMP outputs to be processed by Azure Media Services.
The "onCuePoint" message type is defined in [Adobe-Flash-AS] and has the following payload structure when sent
from the Elemental Live RTMP output.

P RO P ERT Y DESC RIP T IO N

name The name SHOULD be 'scte35 ' by Elemental Live.

time The time in seconds at which the cue point occurred in the
video file during timeline

type The type of cue point SHOULD be set to "event ".

parameters An associative array of name/value pair strings containing the


information from the SCTE-35 message, including Id and
duration. These values are parsed out by Azure Media Services
and included in the manifest decoration tag.

When this mode of ad marker is used, the HLS manifest output is similar to Adobe "Simple" mode.
Example MPEG DASH MPD, single period, Adobe Simple mode signals
<?xml version="1.0" encoding="utf-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" profiles="urn:mpeg:dash:profile:isoff-live:2011"
type="dynamic" publishTime="2020-01-07T18:58:03Z" minimumUpdatePeriod="PT0S" timeShiftBufferDepth="PT58M56S"
availabilityStartTime="2020-01-07T17:44:47Z" minBufferTime="PT7S">
<Period start="PT0S">
<EventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="scte35" timescale="10000000">
<Event presentationTime="1583497601000000" duration="300000000" id="1085900"/>
<Event presentationTime="1583500901666666" duration="300000000" id="1415966"/>
<Event presentationTime="1583504202333333" duration="300000000" id="1746033"/>
<Event presentationTime="1583507502666666" duration="300000000" id="2076066"/>
<Event presentationTime="1583510803333333" duration="300000000" id="2406133"/>
<Event presentationTime="1583514104000000" duration="300000000" id="2736200"/>
<Event presentationTime="1583517404666666" duration="300000000" id="3066266"/>
<Event presentationTime="1583520705333333" duration="300000000" id="3396333"/>
<Event presentationTime="1583524006000000" duration="300000000" id="3726400"/>
<Event presentationTime="1583527306666666" duration="300000000" id="4056466"/>
<Event presentationTime="1583530607333333" duration="300000000" id="4386533"/>
</EventStream>
<AdaptationSet id="1" group="1" profiles="ccff" bitstreamSwitching="false" segmentAlignment="true"
contentType="video" mimeType="video/mp4" codecs="avc1.4D400C" maxWidth="256" maxHeight="144" startWithSAP="1">
<InbandEventStream schemeIdUri="urn:mpeg:dash:event:2012" value="1"/>
<InbandEventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="scte35"/>
<SegmentTemplate timescale="10000000" presentationTimeOffset="1583486678426666"
media="QualityLevels($Bandwidth$)/Fragments(video=$Time$,format=mpd-time-csf)"
initialization="QualityLevels($Bandwidth$)/Fragments(video=i,format=mpd-time-csf)">
<SegmentTimeline>
<S t="1583495318000000" d="64000000" r="34"/>
<S d="43000000"/>
<S d="21000000"/>
<!-- ... Truncated for brevity of sample-->

</SegmentTimeline>
</SegmentTemplate>
<ProducerReferenceTime id="1583495318000000" type="0" wallClockTime="2020-01-07T17:59:10.957Z"
presentationTime="1583495318000000"/>
<Representation id="1_V_video_3750956353252827751" bandwidth="149952" width="256" height="144"/>
</AdaptationSet>
<AdaptationSet id="2" group="5" profiles="ccff" bitstreamSwitching="false" segmentAlignment="true"
contentType="audio" mimeType="audio/mp4" codecs="mp4a.40.2" lang="en">
<InbandEventStream schemeIdUri="urn:mpeg:dash:event:2012" value="1"/>
<InbandEventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="scte35"/>
<Label>ambient</Label>
<SegmentTemplate timescale="10000000" presentationTimeOffset="1583486678426666"
media="QualityLevels($Bandwidth$)/Fragments(ambient=$Time$,format=mpd-time-csf)"
initialization="QualityLevels($Bandwidth$)/Fragments(ambient=i,format=mpd-time-csf)">
<SegmentTimeline>
<S t="1583495254426666" d="64000000" r="35"/>
<S d="43093334"/>
<S d="20906666"/>
<!-- ... Truncated for brevity of sample-->

</SegmentTimeline>
</SegmentTemplate>
<ProducerReferenceTime id="1583495254426666" type="0" wallClockTime="2020-01-07T17:59:04.600Z"
presentationTime="1583495254426666"/>
<Representation id="5_A_ambient_9125670592623055209" bandwidth="96000" audioSamplingRate="48000"/>
</AdaptationSet>
</Period>
</MPD>

Example HLS playlist, Adobe Simple mode signals using EXT-X-CUE tag (truncated "..." for brevity )
The following example shows the output from the Media Services dynamic packager for an RTMP ingest stream
using Adobe "simple" mode signals and the legacy [Adobe-Primetime] EXT-X-CUE tag.
#EXTM3U
#EXT-X-VERSION:8
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:7
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-PROGRAM-DATE-TIME:2020-01-07T17:44:47Z
#EXTINF:6.400000,no-desc
Fragments(video=1583486742000000,format=m3u8-aapl-v8)
#EXTINF:6.400000,no-desc
Fragments(video=1583486806000000,format=m3u8-aapl-v8)
...
#EXTINF:6.166667,no-desc
Fragments(video=1583487638000000,format=m3u8-aapl-v8)
#EXT-X-CUE:ID=95766,TYPE="SpliceOut",DURATION=30.000000,TIME=158348769.966667
#EXTINF:0.233333,no-desc
Fragments(video=1583487699666666,format=m3u8-aapl-v8)
#EXT-X-CUE:ID=95766,TYPE="SpliceOut",DURATION=30.000000,TIME=158348769.966667,ELAPSED=0.233333
#EXTINF:6.400000,no-desc
Fragments(video=1583487702000000,format=m3u8-aapl-v8)
#EXT-X-CUE:ID=95766,TYPE="SpliceOut",DURATION=30.000000,TIME=158348769.966667,ELAPSED=6.633333
#EXTINF:6.400000,no-desc
Fragments(video=1583487766000000,format=m3u8-aapl-v8)
#EXT-X-CUE:ID=95766,TYPE="SpliceOut",DURATION=30.000000,TIME=158348769.966667,ELAPSED=13.033333
#EXTINF:6.400000,no-desc
Fragments(video=1583487830000000,format=m3u8-aapl-v8)
#EXT-X-CUE:ID=95766,TYPE="SpliceOut",DURATION=30.000000,TIME=158348769.966667,ELAPSED=19.433333
#EXTINF:6.400000,no-desc
Fragments(video=1583487894000000,format=m3u8-aapl-v8)
#EXT-X-CUE:ID=95766,TYPE="SpliceOut",DURATION=30.000000,TIME=158348769.966667,ELAPSED=25.833333
#EXTINF:4.166667,no-desc
Fragments(video=1583487958000000,format=m3u8-aapl-v8)
#EXTINF:2.233333,no-desc
Fragments(video=1583487999666666,format=m3u8-aapl-v8)
#EXTINF:6.400000,no-desc
Fragments(video=1583488022000000,format=m3u8-aapl-v8)
...

2.1.6 Cancellation and Updates


Messages can be canceled or updated by sending multiple messages with the same presentation time and ID. The
presentation time and ID uniquely identify the event, and the last message received for a specific presentation time
that meets pre-roll constraints is the message that is acted upon. The updated event replaces any previously
received messages. The pre-roll constraint is four seconds. Messages received at least four seconds prior to the
presentation time will be acted upon.

2.2 Fragmented MP4 Ingest (Smooth Streaming)


Refer to [MS-SSTR-Ingest] for requirements on live stream ingest. The following sections provide details regarding
ingest of timed presentation metadata. Timed presentation metadata is ingested as a sparse track, which is defined
in both the Live Server Manifest Box (see MS-SSTR) and the Movie Box ('moov').
Each sparse fragment consists of a Movie Fragment Box ('moof') and Media Data Box ('mdat'), where the 'mdat' box
is the binary message.
In order to achieve frame-accurate insertion of ads, the encoder MUST split the fragment at the presentation time
where the cue is required to be inserted. A new fragment MUST be created that begins with a newly created IDR
frame, or Stream Access Points (SAP) of type 1 or 2, as defined in [ISO-14496-12] Annex I.
2.2.1 Live Server Manifest Box
The sparse track MUST be declared in the Live Server Manifest box with a <textstream> entry and it MUST have
the following attributes set:
AT T RIB UT E N A M E F IEL D T Y P E REQ UIRED? DESC RIP T IO N

systemBitrate Number Required MUST be "0", indicating a


track with unknown, variable
bitrate.

parentTrackName String Required MUST be the name of the


parent track, to which the
sparse track time codes are
timescale aligned. The parent
track cannot be a sparse
track.

manifestOutput Boolean Required MUST be "true", to indicate


that the sparse track will be
embedded in the Smooth
client manifest.

Subtype String Required MUST be the four character


code "DATA".

Scheme String Required MUST be a URN or URL


identifying the message
scheme. For [SCTE-35]
messages, this MUST be
"urn:scte:scte35:2013:bin" in
order for messages to be
sent to HLS, Smooth, and
Dash clients in compliance
with [SCTE-35].

trackName String Required MUST be the name of the


sparse track. The trackName
can be used to differentiate
multiple event streams with
the same scheme. Each
unique event stream MUST
have a unique track name.

timescale Number Optional MUST be the timescale of


the parent track.

2.2.2 Movie Box


The Movie Box ('moov') follows the Live Server Manifest Box as part of the stream header for a sparse track.
The 'moov' box SHOULD contain a TrackHeaderBox ('tkhd') box as defined in [ISO-14496-12] with the
following constraints:

F IEL D N A M E F IEL D T Y P E REQ UIRED? DESC RIP T IO N

duration 64-bit unsigned integer Required SHOULD be 0, since the


track box has zero samples
and the total duration of the
samples in the track box is 0.

The 'moov' box SHOULD contain a HandlerBox ('hdlr') as defined in [ISO-14496-12] with the following
constraints:
F IEL D N A M E F IEL D T Y P E REQ UIRED? DESC RIP T IO N

handler_type 32-bit unsigned integer Required SHOULD be 'meta'.

The 'stsd' box SHOULD contain a MetaDataSampleEntry box with a coding name as defined in [ISO-14496-12]. For
example, for SCTE-35 messages the coding name SHOULD be 'scte'.
2.2.3 Movie Fragment Box and Media Data Box
Sparse track fragments consist of a Movie Fragment Box ('moof') and a Media Data Box ('mdat').

NOTE
In order to achieve frame-accurate insertion of ads, the encoder MUST split the fragment at the presentation time where the
cue is required to be inserted. A new fragment MUST be created that begins with a newly created IDR frame, or Stream
Access Points (SAP) of type 1 or 2, as defined in [ISO-14496-12] Annex I

The MovieFragmentBox ('moof') box MUST contain a TrackFragmentExtendedHeaderBox ('uuid') box as


defined in [MS-SSTR] with the following fields:

F IEL D N A M E F IEL D T Y P E REQ UIRED? DESC RIP T IO N

fragment_absolute_time 64-bit unsigned integer Required MUST be the arrival time of


the event. This value aligns
the message with the parent
track.

fragment_duration 64-bit unsigned integer Required MUST be the duration of


the event. The duration can
be zero to indicate that the
duration is unknown.

The MediaDataBox ('mdat') box MUST have the following format:

F IEL D N A M E F IEL D T Y P E REQ UIRED? DESC RIP T IO N

version 32-bit unsigned integer Required Determines the format of


(uimsbf) the contents of the 'mdat'
box. Unrecognized versions
will be ignored. Currently the
only supported version is 1.

id 32-bit unsigned integer Required Identifies this instance of the


(uimsbf) message. Messages with
equivalent semantics shall
have the same value; that is,
processing any one event
message box with the same
id is sufficient.
F IEL D N A M E F IEL D T Y P E REQ UIRED? DESC RIP T IO N

presentation_time_delta 32-bit unsigned integer Required The sum of the


(uimsbf) fragment_absolute_time,
specified in the
TrackFragmentExtendedHead
erBox, and the
presentation_time_delta
MUST be the presentation
time of the event. The
presentation time and
duration SHOULD align
with Stream Access Points
(SAP) of type 1 or 2, as
defined in [ISO-14496-12]
Annex I. For HLS egress, time
and duration SHOULD align
with segment boundaries.
The presentation time and
duration of different event
messages within the same
event stream MUST not
overlap.

message byte array Required The event message. For


[SCTE-35] messages, the
message is the binary
splice_info_section(). For
[SCTE-35] messages, this
MUST be the
splice_info_section() in order
for messages to be sent to
HLS, Smooth, and Dash
clients in compliance with
[SCTE-35]. For [SCTE-35]
messages, the binary
splice_info_section() is the
payload of the 'mdat' box,
and it is NOT base64
encoded.

2.2.4 Cancellation and Updates


Messages can be canceled or updated by sending multiple messages with the same presentation time and ID. The
presentation time and ID uniquely identify the event. The last message received for a specific presentation time, that
meets pre-roll constraints, is the message that is acted upon. The updated message replaces any previously
received messages. The pre-roll constraint is four seconds. Messages received at least four seconds prior to the
presentation time will be acted upon.

3 Timed Metadata Delivery


Event stream data is opaque to Media Services. Media Services merely passes three pieces of information between
the ingest endpoint and the client endpoint. The following properties are delivered to the client, in compliance with
[SCTE-35] and/or the client's streaming protocol:
1. Scheme – a URN or URL identifying the scheme of the message.
2. Presentation Time – the presentation time of the event on the media timeline.
3. Duration – the duration of the event.
4. ID – an optional unique identifier for the event.
5. Message – the event data.

3.1 Microsoft Smooth Streaming Manifest


Refer to sparse track handling [MS-SSTR] for details on how to format a sparse message track. For [SCTE35]
messages, Smooth Streaming will output the base64-encoded splice_info_section() into a sparse fragment. The
StreamIndex MUST have a Subtype of "DATA", and the CustomAttributes MUST contain an Attribute with
Name="Schema" and Value="urn:scte:scte35:2013:bin".
Smooth Client Manifest Example showing base64-encoded [SCTE35] splice_info_section ()

<?xml version="1.0" encoding="utf-8"?>


<SmoothStreamingMedia MajorVersion="2" MinorVersion="0" TimeScale="10000000" IsLive="true" Duration="0"
LookAheadFragmentCount="2" DVRWindowLength="6000000000">

<StreamIndex Type="video" Name="video" Subtype="" Chunks="0" TimeScale="10000000"


Url="QualityLevels({bitrate})/Fragments(video={start time})">
<QualityLevel Index="0" Bitrate="230000"
CodecPrivateData="250000010FC3460B50878A0B5821FF878780490800704704DC0000010E5A67F840" FourCC="WVC1"
MaxWidth="364" MaxHeight="272"/>
<QualityLevel Index="1" Bitrate="305000"
CodecPrivateData="250000010FC3480B50878A0B5821FF87878049080894E4A7640000010E5A67F840" FourCC="WVC1"
MaxWidth="364" MaxHeight="272"/>
<c t="0" d="20000000" r="300" />
</StreamIndex>
<StreamIndex Type="audio" Name="audio" Subtype="" Chunks="0" TimeScale="10000000"
Url="QualityLevels({bitrate})/Fragments(audio={start time})">
<QualityLevel Index="0" Bitrate="96000" CodecPrivateData="1000030000000000000000000000E00042C0"
FourCC="WMAP" AudioTag="354" Channels="2" SamplingRate="44100" BitsPerSample="16" PacketSize="4459"/>
<c t="0" d="20000000" r="300" />
</StreamIndex>
<StreamIndex Type="text" Name="scte35-sparse-stream" Subtype="DATA" Chunks="0" TimeScale="10000000"
ParentStreamIndex="video" ManifestOutput="true"
Url="QualityLevels({bitrate})/Fragments(captions={start time})">
<QualityLevel Index="0" Bitrate="0" CodecPrivateData="" FourCC="">
<CustomAttributes>
<Attribute Name="Scheme" Value="urn:scte:scte35:2013:bin"/>
</CustomAttributes>
</QualityLevel>
<!-- The following <c> and <f> fragments contains the base64-encoded [SCTE35] splice_info_section() message
-->
<c t="600000000" d="300000000">
<f>PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz48QWNxdWlyZWRTaWduYWwgeG1sbnM9InVybjpjYWJsZWxhYnM6bWQ6eHNk
OnNpZ25hbGluZzozLjAiIGFjcXVpc2l0aW9uUG9pbnRJZGVudGl0eT0iRVNQTl9FYXN0X0FjcXVpc2l0aW9uX1BvaW50XzEiIGFjcXVpc2l0aW9
uU2lnbmFsSUQ9IjRBNkE5NEVFLTYyRkExMUUxQjFDQTg4MkY0ODI0MDE5QiIgYWNxdWlzaXRpb25UaW1lPSIyMDEyLTA5LTE4VDEwOjE0OjI2Wi
I+PFVUQ1BvaW50IHV0Y1BvaW50PSIyMDEyLTA5LTE4VDEwOjE0OjM0WiIvPjxTQ1RFMzVQb2ludERlc2NyaXB0b3Igc3BsaWNlQ29tbWFuZFR5c
GU9IjUiPjxTcGxpY2VJbnNlcnQgc3BsaWNlRXZlbnRJRD0iMzQ0NTY4NjkxIiBvdXRPZk5ldHdvcmtJbmRpY2F0b3I9InRydWUiIHVuaXF1ZVBy
b2dyYW1JRD0iNTUzNTUiIGR1cmF0aW9uPSJQVDFNMFMiIGF2YWlsTnVtPSIxIiBhdmFpbHNFeHBlY3RlZD0iMTAiLz48L1NDVEUzNVBvaW50RGV
zY3JpcHRvcj48U3RyZWFtVGltZXM+PFN0cmVhbVRpbWUgdGltZVR5cGU9IkhTUyIgdGltZVZhbHVlPSI1MTUwMDAwMDAwMDAiLz48L1N0cmVhbV
RpbWVzPjwvQWNxdWlyZWRTaWduYWw+</f>
</c>
<c t="1200000000" d="400000000">
<f>PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz48QWNxdWlyZWRTaWduYWwgeG1sbnM9InVybjpjYWJsZWxhYnM6bWQ6eHNk
OnNpZ25hbGluZzozLjAiIGFjcXVpc2l0aW9uUG9pbnRJZGVudGl0eT0iRVNQTl9FYXN0X0FjcXVpc2l0aW9uX1BvaW50XzEiIGFjcXVpc2l0aW9
uU2lnbmFsSUQ9IjRBNkE5NEVFLTYyRkExMUUxQjFDQTg4MkY0ODI0MDE5QiIgYWNxdWlzaXRpb25UaW1lPSIyMDEyLTA5LTE4VDEwOjE0OjI2Wi
I+PFVUQ1BvaW50IHV0Y1BvaW50PSIyMDEyLTA5LTE4VDEwOjE0OjM0WiIvPjxTQ1RFMzVQb2ludERlc2NyaXB0b3Igc3BsaWNlQ29tbWFuZFR5c
GU9IjUiPjxTcGxpY2VJbnNlcnQgc3BsaWNlRXZlbnRJRD0iMzQ0NTY4NjkxIiBvdXRPZk5ldHdvcmtJbmRpY2F0b3I9InRydWUiIHVuaXF1ZVBy
b2dyYW1JRD0iNTUzNTUiIGR1cmF0aW9uPSJQVDFNMFMiIGF2YWlsTnVtPSIxIiBhdmFpbHNFeHBlY3RlZD0iMTAiLz48L1NDVEUzNVBvaW50RGV
zY3JpcHRvcj48U3RyZWFtVGltZXM+PFN0cmVhbVRpbWUgdGltZVR5cGU9IkhTUyIgdGltZVZhbHVlPSI1MTYyMDAwMDAwMDAiLz48L1N0cmVhbV
RpbWVzPjwvQWNxdWlyZWRTaWduYWw+</f>
</c>
</StreamIndex>
</SmoothStreamingMedia>
3.2 Apple HLS Manifest Decoration
Azure Media Services supports the following HLS manifest tags for signaling ad avail information during a live or
on-demand event.
EXT-X-CUE as defined in [Adobe-Primetime]
The data output to each tag will vary based on the ingest signal mode used. For example, RTMP ingest with Adobe
Simple mode does not contain the full SCTE-35 base64-encoded payload.

3.2.1.1 Example HLS manifest .m3u8 showing EXT-X-CUE signaling of


SCTE-35
The following example HLS manifest output from the Media Services dynamic packager shows EXT-X-CUE tag for
[Adobe-Primetime] in SCTE35 mode.

#EXTM3U
#EXT-X-VERSION:8
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:2
#EXT-X-INDEPENDENT-SEGMENTS
#EXT-X-PROGRAM-DATE-TIME:2020-01-07T19:40:50Z
#EXTINF:1.501500,no-desc
Fragments(video=22567545,format=m3u8-aapl-v8)
#EXTINF:1.501500,no-desc
Fragments(video=22702680,format=m3u8-aapl-v8)
#EXTINF:1.501500,no-desc
Fragments(video=22837815,format=m3u8-aapl-v8)
#EXTINF:1.501500,no-desc
Fragments(video=22972950,format=m3u8-aapl-v8)
#EXTINF:1.501500,no-desc
Fragments(video=23108085,format=m3u8-aapl-v8)
#EXTINF:1.234567,no-desc
Fragments(video=23243220,format=m3u8-aapl-v8)
#EXTINF:0.016689,no-desc
Fragments(video=23354331,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=0.000022
#EXTINF:0.250244,no-desc
Fragments(video=23355833,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=0.250267
#EXTINF:0.850856,no-desc
Fragments(video=23378355,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=1.101122
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=0.000000,TIME=260.610344,CUE="/DAgAAAAAAXdAP/wDwUAAAPqf0/+AWXk0wABAQEAAGB8
6Fo="
#EXTINF:0.650644,no-desc
Fragments(video=23454932,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=1.751767
#EXTINF:0.050044,no-desc
Fragments(video=23513490,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=1.801811
#EXTINF:1.451456,no-desc
Fragments(video=23517994,format=m3u8-aapl-v8)
#EXT-X-
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=3.253267
#EXTINF:1.501500,no-desc
Fragments(video=23648625,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=4.754767
#EXTINF:1.501500,no-desc
Fragments(video=23783760,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=6.256267
#EXTINF:1.501500,no-desc
Fragments(video=23918895,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=7.757767
#EXTINF:1.501500,no-desc
Fragments(video=24054030,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=9.259267
#EXTINF:1.501500,no-desc
Fragments(video=24189165,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=10.760767
#EXTINF:1.501500,no-desc
Fragments(video=24324300,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=12.262267
#EXTINF:1.501500,no-desc
Fragments(video=24459435,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=13.763767
#EXTINF:1.501500,no-desc
Fragments(video=24594570,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=15.265267
#EXTINF:1.501500,no-desc
Fragments(video=24729705,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=16.766767
#EXTINF:1.501500,no-desc
Fragments(video=24864840,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=18.268267
#EXTINF:1.501500,no-desc
Fragments(video=24999975,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=19.769767
#EXTINF:1.501500,no-desc
Fragments(video=25135110,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=21.271267
#EXTINF:1.501500,no-desc
Fragments(video=25270245,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=22.772767
#EXTINF:1.501500,no-desc
Fragments(video=25405380,format=m3u8-aapl-v8)
Fragments(video=25405380,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=24.274267
#EXTINF:1.501500,no-desc
Fragments(video=25540515,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=25.775767
#EXTINF:1.501500,no-desc
Fragments(video=25675650,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=27.277267
#EXTINF:1.501500,no-desc
Fragments(video=25810785,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=28.778767
#EXTINF:1.501500,no-desc
Fragments(video=25945920,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=30.280267
#EXTINF:1.501500,no-desc
Fragments(video=26081055,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=31.781767
#EXTINF:1.501500,no-desc
Fragments(video=26216190,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=33.283267
#EXTINF:1.501500,no-desc
Fragments(video=26351325,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=34.784767
#EXTINF:1.501500,no-desc
Fragments(video=26486460,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=36.286267
#EXTINF:1.501500,no-desc
Fragments(video=26621595,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=37.787767
#EXTINF:1.501500,no-desc
Fragments(video=26756730,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=39.289267
#EXTINF:1.501500,no-desc
Fragments(video=26891865,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=40.790767
#EXTINF:1.501500,no-desc
Fragments(video=27027000,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=42.292267
#EXTINF:1.501500,no-desc
Fragments(video=27162135,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=43.793767
#EXTINF:1.501500,no-desc
Fragments(video=27297270,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=45.295267
#EXTINF:1.501500,no-desc
Fragments(video=27432405,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=46.796767
#EXTINF:1.501500,no-desc
Fragments(video=27567540,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=48.298267
#EXTINF:1.501500,no-desc
Fragments(video=27702675,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=49.799767
#EXTINF:1.501500,no-desc
Fragments(video=27837810,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=51.301267
#EXTINF:1.501500,no-desc
Fragments(video=27972945,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=52.802767
#EXTINF:1.501500,no-desc
Fragments(video=28108080,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=54.304267
#EXTINF:1.501500,no-desc
Fragments(video=28243215,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=55.805767
#EXTINF:1.501500,no-desc
Fragments(video=28378350,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=57.307267
#EXTINF:1.501500,no-desc
Fragments(video=28513485,format=m3u8-aapl-v8)
#EXT-X-
CUE:ID="1002",TYPE="scte35",DURATION=59.993278,TIME=259.509244,CUE="/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAE
BAQAA8g1eNw==",ELAPSED=58.808767
#EXTINF:1.501500,no-desc
Fragments(video=28648620,format=m3u8-aapl-v8)

3.2.2 Apple HLS with Adobe Primetime EXT-X-CUE


Media Services (version 2 and 3 API) supports the output of the EXT-X-CUE tag as defined in [Adobe-Primetime]
"SCTE-35 Mode". In this mode, Azure Media Services will embed base64-encoded [SCTE-35] splice_info_section() in
the EXT-X-CUE tag.
The "legacy" EXT-X-CUE tag is defined as below and also can be normative referenced in the [Adobe-Primetime]
specification. This should only be used for legacy SCTE35 signaling where needed, otherwise the recommended tag
is defined in [RFC8216] as EXT-X-DATERANGE.
AT T RIB UT E N A M E TYPE REQ UIRED? DESC RIP T IO N

CUE quoted string Required The message encoded as a


base64-encoded string as
described in [RFC4648]. For
[SCTE-35] messages, this is
the base64-encoded
splice_info_section().

TYPE quoted string Required A URN or URL identifying


the message scheme. For
[SCTE-35] messages, the
type takes the special value
"scte35".

ID quoted string Required A unique identifier for the


event. If the ID is not
specified when the message
is ingested, Azure Media
Services will generate a
unique id.

DURATION decimal floating point Required The duration of the event. If


number unknown, the value
SHOULD be 0. Units are
factional seconds.

ELAPSED decimal floating point Optional, but Required for When the signal is being
number sliding window repeated to support a sliding
presentation window, this
field MUST be the amount
of presentation time that has
elapsed since the event
began. Units are fractional
seconds. This value may
exceed the original specified
duration of the splice or
segment.

TIME decimal floating point Required The presentation time of the


number event. Units are fractional
seconds.

The HLS player application layer will use the TYPE to identify the format of the message, decode the message, apply
the necessary time conversions, and process the event. The events are time synchronized in the segment playlist of
the parent track, according to the event timestamp. They are inserted before the nearest segment (#EXTINF tag).
3.2.3 HLS .m3u8 manifest example using Adobe Primetime EXT -X -CUE
The following example shows HLS manifest decoration using the Adobe Primetime EXT-X-CUE tag. The "CUE"
parameter contains only the TYPE and Duration properties which means that this was an RTMP source using Adobe
"simple" mode signaling.
#EXTM3U
#EXT-X-VERSION:4
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-ALLOW-CACHE:NO
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:11
#EXT-X-PROGRAM-DATE-TIME:2019-12-10T09:18:14Z
#EXTINF:10.010000,no-desc
Fragments(video=4011540820,format=m3u8-aapl)
#EXTINF:10.010000,no-desc
Fragments(video=4011550830,format=m3u8-aapl)
#EXTINF:10.010000,no-desc
Fragments(video=4011560840,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000
#EXTINF:8.008000,no-desc
Fragments(video=4011570850,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=0.593000
#EXTINF:4.170000,no-desc
Fragments(video=4011578858,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=4.763000
#EXTINF:9.844000,no-desc
Fragments(video=4011583028,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=14.607000
#EXTINF:10.010000,no-desc
Fragments(video=4011592872,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=24.617000
#EXTINF:10.010000,no-desc
Fragments(video=4011602882,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=34.627000
#EXTINF:10.010000,no-desc
Fragments(video=4011612892,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=44.637000
#EXTINF:10.010000,no-desc
Fragments(video=4011622902,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=54.647000
#EXTINF:10.010000,no-desc
Fragments(video=4011632912,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=64.657000
#EXTINF:10.010000,no-desc
Fragments(video=4011642922,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=74.667000
#EXTINF:10.010000,no-desc
Fragments(video=4011652932,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=84.677000
#EXTINF:10.010000,no-desc
Fragments(video=4011662942,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=94.687000
#EXTINF:10.010000,no-desc
Fragments(video=4011672952,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=104.697000
#EXTINF:10.010000,no-desc
Fragments(video=4011682962,format=m3u8-aapl)
#EXT-X-CUE:ID=4011578265,TYPE="SpliceOut",DURATION=119.987000,TIME=4011578.265000,ELAPSED=114.707000
#EXTINF:10.010000,no-desc
Fragments(video=4011692972,format=m3u8-aapl)
#EXTINF:8.008000,no-desc
Fragments(video=4011702982,format=m3u8-aapl)

3.2.4 HLS Message Handling for Adobe Primetime EXT -X -CUE


Events are signaled in the segment playlist of each video and audio track. The position of the EXT-X-CUE tag MUST
always be either immediately before the first HLS segment (for splice out or segment start) or immediately after
the last HLS segment (for splice in or segment end) to which its TIME and DURATION attributes refer, as required by
[Adobe-Primetime].
When a sliding presentation window is enabled, the EXT-X-CUE tag MUST be repeated often enough that the splice
or segment is always fully described in the segment playlist, and the ELAPSED attribute MUST be used to indicate
the amount of time the splice or segment has been active, as required by [Adobe-Primetime].
When a sliding presentation window is enabled, the EXT-X-CUE tags are removed from the segment playlist when
the media time that they refer to has rolled out of the sliding presentation window.

3.3 DASH Manifest Decoration (MPD)


[MPEGDASH] provides three ways to signal events:
1. Events signaled in the MPD EventStream
2. Events signaled in-band using the Event Message Box ('emsg')
3. A combination of both 1 and 2
Events signaled in the MPD EventStream are useful for VOD streaming because clients have access to all the events,
immediately when the MPD is downloaded. It is also useful for SSAI signaling, where the downstream SSAI vendor
needs to parse the signals from the MPD manifest, and insert ad content dynamically. The in-band ('emsg')solution
is useful for live streaming where clients do not need to download the MPD again, or there is no SSAI manifest
manipulation happening between the client and the origin.
Azure Media Services default behavior for DASH is to signal both in the MPD EventStream and in-band using the
Event Message Box ('emsg').
Cue messages ingested over [RTMP] or [MS-SSTR-Ingest] are mapped into DASH events, using in-band 'emsg'
boxes and/or in-MPD EventStreams.
In-band SCTE-35 signaling for DASH follows the definition and requirements defined in [SCTE-214-3] and also in
[DASH-IF-IOP] section 13.12.2 ('SCTE35 Events').
For in-band [SCTE-35] carriage, the Event Message box ('emsg') uses the schemeId = "urn:scte:scte35:2013:bin". For
MPD manifest decoration the EventStream schemeId uses "urn:scte:scte35:2014:xml+bin". This format is an XML
representation of the event which includes a binary base64-encoded output of the complete SCTE-35 message that
arrived at ingest.
Normative reference definitions of carriage of [SCTE-35] cue messages in DASH are available in [SCTE-214-1] sec
6.7.4 (MPD) and [SCTE-214-3] sec 7.3.2 (Carriage of SCTE 35 cue messages).
3.3.1 MPEG DASH (MPD) EventStream Signaling
Manifest (MPD) decoration of events will be signaled in the MPD using the EventStream element, which appears
within the Period element. The schemeId used is "urn:scte:scte35:2014:xml+bin".

NOTE
For brevity purposes [SCTE-35] allows use of the base64-encoded section in Signal.Binary element (rather than the
Signal.SpliceInfoSection element) as an alternative to carriage of a completely parsed cue message. Azure Media Services uses
this 'xml+bin' approach to signaling in the MPD manifest. This is also the recommended method used in the [DASH-IF-IOP] -
see section titled 'Ad insertion event streams' of the DASH IF IOP guideline

The EventStream element has the following attributes:

AT T RIB UT E N A M E TYPE REQ UIRED? DESC RIP T IO N


AT T RIB UT E N A M E TYPE REQ UIRED? DESC RIP T IO N

scheme_id_uri string Required Identifies the scheme of the


message. The scheme is set
to the value of the Scheme
attribute in the Live Server
Manifest box. The value
SHOULD be a URN or URL
identifying the message
scheme; The supported
output schemeId should be
"urn:scte:scte35:2014:xml+bi
n" per [SCTE-214-1] sec
6.7.4 (MPD), as the service
supports only "xml+bin" at
this time for brevity in the
MPD.

value string Optional An additional string value


used by the owners of the
scheme to customize the
semantics of the message. In
order to differentiate
multiple event streams with
the same scheme, the value
MUST be set to the name of
the event stream (trackName
for [MS-SSTR-Ingest] or AMF
message name for [RTMP]
ingest).

Timescale 32-bit unsigned integer Required The timescale, in ticks per


second.

3.3.2 Example Event Streams for MPEG DASH


3.3.2.1 Example MPEG DASH .mpd manifest signaling of RTMP streaming using Adobe simple mode
The following example shows an excerpt EventStream from the Media Services dynamic packager for an RTMP
stream using Adobe "simple" mode signaling.

<!-- Example EventStream element using "urn:com:adobe:dpi:simple:2015" Adobe simple signaling per [Adobe-
Primetime] -->
<EventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="simplesignal" timescale="10000000">
<Event presentationTime="1583497601000000" duration="300000000" id="1085900"/>
<Event presentationTime="1583500901666666" duration="300000000" id="1415966"/>
<Event presentationTime="1583504202333333" duration="300000000" id="1746033"/>
<Event presentationTime="1583507502666666" duration="300000000" id="2076066"/>
<Event presentationTime="1583510803333333" duration="300000000" id="2406133"/>
<Event presentationTime="1583514104000000" duration="300000000" id="2736200"/>
<Event presentationTime="1583517404666666" duration="300000000" id="3066266"/>
<Event presentationTime="1583520705333333" duration="300000000" id="3396333"/>
<Event presentationTime="1583524006000000" duration="300000000" id="3726400"/>
<Event presentationTime="1583527306666666" duration="300000000" id="4056466"/>
<Event presentationTime="1583530607333333" duration="300000000" id="4386533"/>
</EventStream>

3.3.2.2 Example MPEG DASH .mpd manifest signaling of an RTMP stream using Adobe SCTE-35 mode
The following example shows an excerpt EventStream from the Media Services dynamic packager for an RTMP
stream using Adobe SCTE-35 mode signaling.
Example EventStream element using xml+bin style signaling per [SCTE-214-1]
<EventStream schemeIdUri="urn:scte:scte35:2014:xml+bin" value="scte35" timescale="10000000">
<Event presentationTime="2595092444" duration="11011000" id="1002">
<Signal xmlns="http://www.scte.org/schemas/35/2016">
<Binary>/DAlAAAAAAXdAP/wFAUAAAPqf+/+AWRhuP4AUmNjAAEBAQAA8g1eNw==</Binary>
</Signal>
</Event>
<Event presentationTime="2606103444" id="1002">
<Signal xmlns="http://www.scte.org/schemas/35/2016">
<Binary>/DAgAAAAAAXdAP/wDwUAAAPqf0/+AWXk0wABAQEAAGB86Fo=</Binary>
</Signal>
</Event>
</EventStream>

IMPORTANT
Note that presentationTime is the presentation time of the [SCTE-35] event translated to be relative to the Period Start time,
not the arrival time of the message. [MPEGDASH] defines the Event@presentationTime as "Specifies the presentation time of
the event relative to the start of the Period. The value of the presentation time in seconds is the division of the value of this
attribute and the value of the EventStream@timescale attribute. If not present, the value of the presentation time is 0.

3.3.3.1 Example MPEG DASH manifest (MPD) with single-period, EventStream, using Adobe simple mode signals
The following example shows the output from the Media Services dynamic packager for a source RTMP stream
using the Adobe "simple" mode ad signal method. The output is a single period manifest showing an EventStream
using the schemeId Uri set to "urn:com:adobe:dpi:simple:2015" and value property set to "simplesignal". Each
simple signal is provided in an Event element with the @presentationTime, @duration, and @id properties
populated based on the incoming simple signals.
<?xml version="1.0" encoding="utf-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" profiles="urn:mpeg:dash:profile:isoff-live:2011"
type="static" mediaPresentationDuration="PT28M1.680S" minBufferTime="PT3S">
<Period>
<EventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="simplesignal" timescale="1000">
<Event presentationTime="4011578265" duration="119987" id="4011578265"/>
</EventStream>
<AdaptationSet id="1" group="1" profiles="ccff" bitstreamSwitching="false" segmentAlignment="true"
contentType="video" mimeType="video/mp4" codecs="avc1.4D4028" maxWidth="1920" maxHeight="1080"
startWithSAP="1">
<InbandEventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="simplesignal"/>
<ProducerReferenceTime id="4011460740" type="0" wallClockTime="2020-01-25T19:35:54.740Z"
presentationTime="4011460740"/>
<SegmentTemplate timescale="1000" presentationTimeOffset="4011460740"
media="QualityLevels($Bandwidth$)/Fragments(video=$Time$,format=mpd-time-csf)"
initialization="QualityLevels($Bandwidth$)/Fragments(video=i,format=mpd-time-csf)">
<SegmentTimeline>
<S t="4011460740" d="2002" r="57"/>
<S d="1401"/>
<S d="601"/>
<S d="2002"/>

<!--> ... video segments truncated for sample brevity </-->

</SegmentTimeline>
</SegmentTemplate>
<Representation id="1_V_video_14759481473095519504" bandwidth="6000000" width="1920"
height="1080"/>
<Representation id="1_V_video_1516803357996956148" bandwidth="4000000" codecs="avc1.4D401F"
width="1280" height="720"/>
<Representation id="1_V_video_5430608182379669372" bandwidth="2600000" codecs="avc1.4D401F"
width="960" height="540"/>
<Representation id="1_V_video_3780180650986497347" bandwidth="1000000" codecs="avc1.4D401E"
width="640" height="360"/>
<Representation id="1_V_video_13759117363700265707" bandwidth="699000" codecs="avc1.4D4015"
width="480" height="270"/>
<Representation id="1_V_video_6140004908920393176" bandwidth="400000" codecs="avc1.4D4015"
width="480" height="270"/>
<Representation id="1_V_video_10673801877453424365" bandwidth="200000" codecs="avc1.4D400D"
width="320" height="180"/>
</AdaptationSet>
<AdaptationSet id="2" group="5" profiles="ccff" bitstreamSwitching="false" segmentAlignment="true"
contentType="audio" mimeType="audio/mp4" codecs="mp4a.40.2">
<InbandEventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="simplesignal"/>
<ProducerReferenceTime id="4011460761" type="0" wallClockTime="2020-01-25T19:35:54.761Z"
presentationTime="4011460761"/>
<Label>audio</Label>
<SegmentTemplate timescale="1000" presentationTimeOffset="4011460740"
media="QualityLevels($Bandwidth$)/Fragments(audio=$Time$,format=mpd-time-csf)"
initialization="QualityLevels($Bandwidth$)/Fragments(audio=i,format=mpd-time-csf)">
<SegmentTimeline>
<S t="4011460761" d="1984"/>
<S d="2005" r="1"/>
<S d="2006"/>

<!--> ... audio segments truncated for example brevity </-->

</SegmentTimeline>
</SegmentTemplate>
<Representation id="5_A_audio_17504386117102112482" bandwidth="128000" audioSamplingRate="48000"/>
</AdaptationSet>
</Period>
</MPD>
3.3.4 MPEG DASH In-band Event Message Box Signaling
An in-band event stream requires the MPD to have an InbandEventStream element at the Adaptation Set level. This
element has a mandatory schemeIdUri attribute and optional timescale attribute, which also appear in the Event
Message Box ('emsg'). Event message boxes with scheme identifiers that are not defined in the MPD SHOULD not
be present.
For in-band [SCTE-35] carriage, signals MUST use the schemeId = "urn:scte:scte35:2013:bin". Normative definitions
of carriage of [SCTE-35] in-band messages are defined in [SCTE-214-3] sec 7.3.2 (Carriage of SCTE 35 cue
messages).
The following details outline the specific values the client should expect in the 'emsg' in compliance with [SCTE-
214-3]:

F IEL D N A M E F IEL D T Y P E REQ UIRED? DESC RIP T IO N

scheme_id_uri string Required Identifies the scheme of the


message. The scheme is set
to the value of the Scheme
attribute in the Live Server
Manifest box. The value
MUST be a URN identifying
the message scheme. For
[SCTE-35] messages, this
MUST be
"urn:scte:scte35:2013:bin" in
compliance with [SCTE-214-
3]

Value string Required An additional string value


used by the owners of the
scheme to customize the
semantics of the message. In
order to differentiate
multiple event streams with
the same scheme, the value
will be set to the name of
the event stream (trackName
for Smooth ingest or AMF
message name for RTMP
ingest).

Timescale 32-bit unsigned integer Required The timescale, in ticks per


second, of the times and
duration fields within the
'emsg' box.

Presentation_time_delta 32-bit unsigned integer Required The media presentation time


delta of the presentation
time of the event and the
earliest presentation time in
this segment. The
presentation time and
duration SHOULD align
with Stream Access Points
(SAP) of type 1 or 2, as
defined in [ISO-14496-12]
Annex I.
F IEL D N A M E F IEL D T Y P E REQ UIRED? DESC RIP T IO N

event_duration 32-bit unsigned integer Required The duration of the event, or


0xFFFFFFFF to indicate an
unknown duration.

Id 32-bit unsigned integer Required Identifies this instance of the


message. Messages with
equivalent semantics shall
have the same value. If the
ID is not specified when the
message is ingested, Azure
Media Services will generate
a unique id.

Message_data byte array Required The event message. For


[SCTE-35] messages, the
message data is the binary
splice_info_section() in
compliance with [SCTE-214-
3]

Example InBandEvenStream entity for Adobe Simple mode

<InbandEventStream schemeIdUri="urn:com:adobe:dpi:simple:2015" value="amssignal"/>

3.3.5 DASH Message Handling


Events are signaled in-band, within the 'emsg' box, for both video and audio tracks. The signaling occurs for all
segment requests for which the presentation_time_delta is 15 seconds or less.
When a sliding presentation window is enabled, event messages are removed from the MPD when the sum of the
time and duration of the event message is less than the time of the media data in the manifest. In other words, the
event messages are removed from the manifest when the media time to which they refer has rolled out of the
sliding presentation window.

4. SCTE-35 Ingest Implementation Guidance for Encoder Vendors


The following guidelines are common issues that can impact an encoder vendor's implementation of this
specification. The guidelines below have been collected based on real world partner feedback to make it easier to
implement this specification for others.
[SCTE-35] messages are ingested in binary format using the Scheme "urn:scte:scte35:2013:bin" for [MS-SSTR-
Ingest] and the type "scte35" for [RTMP] ingest. To facilitate the conversion of [SCTE-35] timing, which is based on
MPEG-2 transport stream presentation time stamps (PTS), a mapping between PTS (pts_time + pts_adjustment of
the splice_time()) and the media timeline is provided by the event presentation time (the fragment_absolute_time
field for Smooth ingest and the time field for RTMP ingest). The mapping is necessary because the 33-bit PTS value
rolls over approximately every 26.5 hours.
Smooth Streaming ingest [MS-SSTR-Ingest] requires that the Media Data Box ('mdat') MUST contain the
splice_info_section() defined in [SCTE-35].
For RTMP ingest,the cue attribute of the AMF message is set to the base64-encoded splice_info_section() defined
in [SCTE-35].
When the messages have the format described above, they are sent to HLS, Smooth, and DASH clients as defined
above.
When testing your implementation with the Azure Media Services platform, please start testing with a "pass-
through" LiveEvent first, before moving to testing on an encoding LiveEvent.

Change History
DAT E C H A N GES

07/2/19 Revised RTMP ingest support, added RTMP "onCuePoint" for


Elemental Live

08/22/19 Updated to add OnUserDataEvent to RTMP for custom


metadata

1/08/20 Fixed error on RTMP Simple and RTMP SCTE35 mode.


Changed from "onCuePoint" to "onAdCue". Updated Simple
mode table.

08/4/20 Removed support for DATERANGE tag to match the


implementation in production service.

Next steps
View Media Services learning paths.
Media Services v3 (latest)
Check out the latest version of Azure Media Services!
Overview
Concepts
Start developing
Migration guidance from v2 to v3
Media Services v2 (legacy)
Overview
Create account
Deliver on-demand
Deliver live

Provide feedback
Use the User Voice forum to provide feedback and make suggestions on how to improve Azure Media Services. You
also can go directly to one of the following categories:
Azure Media Player
Client SDK libraries
Encoding and processing
Live streaming
Media Analytics
Azure portal
REST API and platform
Video-on-demand streaming
Smooth Streaming Protocol (MS-SSTR) Amendment
for HEVC
9/22/2020 • 11 minutes to read • Edit Online

1 Introduction
This article provides detailed amendments to be applied to the Smooth Streaming Protocol specification [MS-SSTR]
to enable Smooth Streaming of HEVC encoded video. In this specification, we outline only the changes required to
deliver the HEVC video codec. The article follows the same numbering schema as the [MS-SSTR] specification. The
empty headlines presented throughout the article are provided to orient the reader to their position in the [MS-
SSTR] specification. “(No Change)” indicates text is copied for clarification purposes only.
The article provides technical implementation requirements for the signaling of HEVC video codec (using either
'hev1' or 'hvc1' format tracks) in a Smooth Streaming manifest and normative references are updated to reference
the current MPEG standards that include HEVC, Common Encryption of HEVC, and box names for ISO Base Media
File Format have been updated to be consistent with the latest specifications.
The referenced Smooth Streaming Protocol specification [MS-SSTR] describes the wire format used to deliver live
and on-demand digital media, such as audio and video, in the following manners: from an encoder to a web server,
from a server to another server, and from a server to an HTTP client. The use of an MPEG-4 ([MPEG4-RA])-based
data structure delivery over HTTP allows seamless switching in near real time between different quality levels of
compressed media content. The result is a constant playback experience for the HTTP client end user, even if
network and video rendering conditions change for the client computer or device.

1.1 Glossary
The following terms are defined in [MS-GLOS]:

globally unique identifier (GUID) universally unique identifier (UUID)

The following terms are specific to this document:

composition time: The time a sample is presented at the client, as defined in [ISO/IEC-14496-12].
CENC : Common Encryption, as defined in [ISO/IEC 23001-7] Second Edition.
decode time: The time a sample is required to be decoded on the client, as defined in [ISO/IEC 14496-
12:2008].

fragment: An independently downloadable unit of media that comprises one or more samples .

HEVC: High Efficiency Video Coding, as defined in [ISO/IEC 23008-2]


manifest: Metadata about the presentation that allows a client to make requests for media . media:
Compressed audio, video, and text data used by the client to play a presentation . media format: A well-
defined format for representing audio or video as a compressed sample .
presentation: The set of all streams and related metadata needed to play a single movie. request: An HTTP
message sent from the client to the server, as defined in [RFC2616] response: An HTTP message sent from the
server to the client, as defined in [RFC2616]
sample: The smallest fundamental unit (such as a frame) in which media is stored and processed.
MAY, SHOULD, MUST, SHOULD NOT, MUST NOT: These terms (in all caps) are used as described in
[RFC2119] All statements of optional behavior use either MAY, SHOULD, or SHOULD NOT.

1.2 References
References to Microsoft Open Specifications documentation do not include a publishing year because links are
to the latest version of the documents, which are updated frequently. References to other documents include a
publishing year when one is available.

1.2.1 Normative References


[MS-SSTR] Smooth Streaming Protocol v20140502 https://msdn.microsoft.com/library/ff469518.aspx
[ISO/IEC 14496-12] International Organization for Standardization, "Information technology -- Coding of
audio-visual objects -- Part 12: ISO Base Media File Format", ISO/IEC 14496-12:2014, Edition 4, Plus
Corrigendum 1, Amendments 1 & 2.
https://standards.iso.org/ittf/PubliclyAvailableStandards/c061988_ISO_IEC_14496-12_2012.zip
[ISO/IEC 14496-15] International Organization for Standardization, "Information technology -- Coding of
audio-visual objects -- Part 15: Carriage of NAL unit structured video in the ISO Base Media File Format", ISO
14496-15:2015, Edition 3. https://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?
csnumber=65216
[ISO/IEC 23008-2] Information technology -- High efficiency coding and media delivery in heterogeneous
environments -- Part 2: High efficiency video coding: 2013 or newest edition
https://standards.iso.org/ittf/PubliclyAvailableStandards/c035424_ISO_IEC_23008-2_2013.zip
[ISO/IEC 23001-7] Information technology — MPEG systems technologies — Part 7: Common encryption in
ISO base media file format files, CENC Edition 2:2015 https://www.iso.org/iso/catalogue_detail.htm?
csnumber=65271
[RFC-6381] IETF RFC-6381, “The 'Codecs' and 'Profiles' Parameters for "Bucket" Media Types”
https://tools.ietf.org/html/rfc6381
[MPEG4-RA] The MP4 Registration Authority, "MP4REG", http://www.mp4ra.org
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March
1997, https://www.rfc-editor.org/rfc/rfc2119.txt

1.2.2 Informative References


[MS-GLOS] Microsoft Corporation, "Windows Protocols Master Glossary."
[RFC3548] Josefsson, S., Ed., "The Base16, Base32, and Base64 Data Encodings", RFC 3548, July 2003,
https://www.ietf.org/rfc/rfc3548.txt
[RFC5234] Crocker, D., Ed., and Overell, P., "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234,
January 2008, https://www.rfc-editor.org/rfc/rfc5234.txt

1.3 Overview
Only changes to the Smooth Streaming specification required for the delivery of HEVC are specified below.
Unchanged section headers are listed to maintain location in the referenced Smooth Streaming specification
[MS-SSTR].
1.4 Relationship to Other Protocols
1.5 Prerequisites/Preconditions
1.6 Applicability Statement
1.7 Versioning and Capability Negotiation
1.8 Vendor-Extensible Fields
The following method SHALL be used identify streams using the HEVC video format:
Custom Descriptive Codes for Media Formats: This capability is provided by the FourCC field, as
specified in section 2.2.2.5. Implementers can ensure that extensions do not conflict by registering extension
codes with the MPEG4-RA, as specified in [ISO/IEC-14496-12]

1.9 Standards Assignments


2 Messages
2.1 Transport
2.2 Message Syntax
2.2.1 Manifest Request
2.2.2 Manifest Response
2.2.2.1 SmoothStreamingMedia

MinorVersion (variable): The minor version of the Manifest Response message. MUST be set to 2. (No
Change)
TimeScale (variable): The time scale of the Duration attribute, specified as the number of increments in one
second. The default value is
1. (No Change)
The recommended value is 90000 for representing the exact duration of video frames and fragments
containing fractional framerate video (for example, 30/1.001 Hz).

2.2.2.2 ProtectionElement
The ProtectionElement SHALL be present when Common Encryption (CENC) has been applied to video or audio
streams. HEVC encrypted streams SHALL conform to Common Encryption 2nd Edition [ISO/IEC 23001-7]. Only
slice data in VCL NAL Units SHALL be encrypted.
2.2.2.3 StreamElement

StreamTimeScale (variable): The time scale for duration and time values in this stream, specified as the
number of increments in one second. A value of 90000 is recommended for HEVC streams. A value matching
the waveform sample frequency (for example, 48000 or 44100) is recommended for audio streams.

2 .2 .2 .3 .1 St r e a m P r o t e c t i o n El e m e n t
2.2.2.4 UrlPattern
2.2.5 TrackElement

FourCC (variable): A four-character code that identifies which media format is used for each sample. The
following range of values is reserved with the following semantic meanings:
"hev1”: Video samples for this track use HEVC video, using the ‘hev1’ sample description format
specified in [ISO/IEC-14496-15].
"hvc1”: Video samples for this track use HEVC video, using the ‘hvc1’ sample description format
specified in [ISO/IEC-14496-15].
CodecPrivateData (variable): Data that specifies parameters specific to the media format and
common to all samples in the track, represented as a string of hex-coded bytes. The format and semantic
meaning of byte sequence varies with the value of the FourCC field as follows:
When a TrackElement describes HEVC video, the FourCC field SHALL equal "hev1" or "hvc1"
The CodecPrivateData field SHALL contain a hex-coded string representation of the following byte
sequence, specified in ABNF [RFC5234]: (no change from MS-SSTR)
%x00 %x00 %x00 %x01 SPSField %x00 %x00 %x00 %x01 PPSField
SPSField contains the Sequence Parameter Set (SPS).
PPSField contains the Slice Parameter Set (PPS).
Note: The Video Parameter Set (VPS) is not contained in CodecPrivateData, but should be contained in
the file header of stored files in the ‘hvcC’ box. Systems using Smooth Streaming Protocol must signal
additional decoding parameters (for example, HEVC Tier) using the Custom Attribute “codecs.”

2 .2 .2 .5 .1 C u st o m A t t r i b u t e s El e m e n t
2.2.6 StreamFragmentElement

The SmoothStreamingMedia’s MajorVersion field MUST be set to 2, and MinorVersion field MUST be set
to 2. (No Change)

2 .2 .2 .6 .1 T r a c k F r a g m e n t El e m e n t

2.2.3 Fragment Request


Note : The default media format requested for MinorVersion 2 and ‘hev1’ or 'hvc1' is ‘iso8’ brand ISO Base
Media File Format specified in [ISO/IEC 14496-12] ISO Base Media File Format Fourth Edition, and [ISO/IEC
23001-7] Common Encryption Second Edition.

2.2.4 Fragment Response


2.2.4.1 MoofBox
2.2.4.2 MfhdBox
2.2.4.3 TrafBox
2.2.4.4 TfxdBox

The TfxdBox is deprecated, and its function replaced by the Track Fragment Decode Time Box (‘tfdt’) specified in
[ISO/IEC 14496-12] section 8.8.12.
Note : A client may calculate the duration of a fragment by summing the sample durations listed in the Track
Run Box (‘trun’) or multiplying the number of samples times the default sample duration. The
baseMediaDecodeTime in ‘tfdt’ plus fragment duration equals the URL time parameter for the next fragment.
A Producer Reference Time Box (‘prft’) SHOULD be inserted prior to a Movie Fragment Box (‘moof’) as needed,
to indicate the UTC time corresponding to the Track Fragment Decode Time of the first sample referenced by
the Movie Fragment Box, as specified in [ISO/IEC 14496-12] section 8.16.5.
2.2.4.5 TfrfBox

The TfrfBox is deprecated, and its function replaced by the Track Fragment Decode Time Box (‘tfdt’) specified in
[ISO/IEC 14496-12] section 8.8.12.
Note : A client may calculate the duration of a fragment by summing the sample durations listed in the Track
Run Box (‘trun’) or multiplying the number of samples times the default sample duration. The
baseMediaDecodeTime in ‘tfdt’ plus fragment duration equals the URL time parameter for the next fragment.
Look ahead addresses are deprecated because they delay live streaming.

2.2.4.6 TfhdBox

The TfhdBox and related fields encapsulate defaults for per sample metadata in the fragment. The syntax of the
TfhdBox field is a strict subset of the syntax of the Track Fragment Header Box defined in [ISO/IEC-14496-12]
section 8.8.7.
BaseDataOffset (8 bytes): The offset, in bytes, from the beginning of the MdatBox field to the sample field
in the MdatBox field. To signal this restriction, the default-base-is-moof flag (0x020000) must be set.

2.2.4.7 TrunBox

The TrunBox and related fields encapsulate per sample metadata for the requested fragment. The syntax of
TrunBox is a strict subset of the Version 1 Track Fragment Run Box defined in [ISO/IEC-14496-12] section 8.8.8.
SampleCompositionTimeOffset (4 bytes): The Sample Composition Time offset of each sample adjusted
so that the presentation time of the first presented sample in the fragment is equal to the decode time of the
first decoded sample. Negative video sample composition offsets SHALL be used,
as defined in [ISO/IEC-14496-12].
Note: This avoids a video synchronization error caused by video lagging audio equal to the largest decoded
picture buffer removal delay, and maintains presentation timing between alternative fragments that may have
different removal delays.
The syntax of the fields defined in this section, specified in ABNF [RFC5234], remains the same, except as
follows:
SampleCompositionTimeOffset = SIGNED_INT32

2.2.4.8 MdatBox
2.2.4.9 Fragment Response Common Fields

2.2.5 Sparse Stream Pointer


2.2.6 Fragment Not Yet Available
2.2.7 Live Ingest
2.2.7.1 FileType

FileType (variable): specifies the subtype and intended use of the MPEG-4 ([MPEG4-RA]) file, and high-level
attributes.
MajorBrand (variable): The major brand of the media file. MUST be set to "isml."
MinorVersion (variable): The minor version of the media file. MUST be set to 1.
CompatibleBrands (variable): Specifies the supported brands of MPEG-4. MUST include "ccff" and "iso8."
The syntax of the fields defined in this section, specified in ABNF [RFC5234], is as follows:
FileType = MajorBrand MinorVersion CompatibleBrands
MajorBrand = STRING_UINT32
MinorVersion = STRING_UINT32
CompatibleBrands = "ccff" "iso8" 0\*(STRING_UINT32)

Note : The compatibility brands ‘ccff’ and ‘iso8’ indicate that fragments conform to “Common Container File
Format” and Common Encryption [ISO/IEC 23001-7] and ISO Base Media File Format Edition 4 [ISO/IEC 14496-12].
2.2.7.2 StreamManifestBox
2 .2 .7.2 .1 St r e a m SM I L
2.2.7.3 LiveServerManifestBox
2 .2 .7.3 .1 L i v e SM I L
2.2.7.4 MoovBox
2.2.7.5 Fragment
2 .2 .7.5 .1 T r a c k F r a g m e n t Ex t e n d e d H e a d e r

2.2.8 Server-to -Server Ingest

3 Protocol Details
3.1 Client Details
3.1.1 Abstract Data Model
3.1.1.1 Presentation Description

The Presentation Description data element encapsulates all metadata for the presentation.
Presentation Metadata: A set of metadata that is common to all streams in the presentation. Presentation
Metadata comprises the following fields, specified in section 2.2.2.1:
MajorVersion
MinorVersion
TimeScale
Duration
IsLive
LookaheadCount
DVRWindowLength
Presentations containing HEVC Streams SHALL set:

MajorVersion = 2
MinorVersion = 2

LookaheadCount = 0 (Note: Boxes deprecated)


Presentations SHOULD also set:

TimeScale = 90000

Stream Collection: A collection of Stream Description data elements, as specified in section 3.1.1.1.2.
Protection Description: A collection of Protection System Metadata Description data elements, as specified in
section 3.1.1.1.1.

3 .1 .1 .1 .1 P r o t e c t i o n Sy st e m M e t a d a t a D e sc r i p t i o n

The Protection System Metadata Description data element encapsulates metadata specific to a single Content
Protection System. (No Change)
Protection Header Description: Content protection metadata that pertains to a single Content Protection
System. Protection Header Description comprises the following fields, specified in section 2.2.2.2:
SystemID
ProtectionHeaderContent

3 .1 .1 .1 .2 St r e a m D e sc r i p t i o n
3.1.1.1.2.1 Tra c k De s c ri p t i o n
3.1.1.1.2.1.1 C u s t o m A t t ri b u t e De s c ri p t i o n

3 .1 .1 .3 F r a g m e n t R e fe r e n c e D e sc r i p t i o n
3.1.1.3.1 Tra c k -Sp e c i f i c F ra g me n t R e f e re n c e De s c ri p t i o n

3.1.1.2 Fragment Description


3 .1 .1 .2 .1 Sa m p l e D e sc r i p t i o n

3.1.2 Timers
3.1.3 Initialization
3.1.4 Higher-Layer Triggered Events
3.1.4.1 Open Presentation
3.1.4.2 Get Fragment
3.1.4.3 Close Presentation

3.1.5 Processing Events and Sequencing Rules


3.1.5.1 Manifest Request and Manifest Response
3.1.5.2 Fragment Request and Fragment Response

3.2 Server Details


3.3 Live Encoder Details
4 Protocol Examples
5 Security
5.1 Security Considerations for Implementers
If the content transported using this protocol has high commercial value, a Content Protection System should
be used to prevent unauthorized use of the content. The ProtectionElement can be used to carry metadata
related to the use of a Content Protection System. Protected audio and video content SHALL be encrypted as
specified by MPEG Common Encryption Second Edition: 2015 [ISO/IEC 23001-7].
Note : For HEVC video, only slice data in VCL NALs is encrypted. Slice headers and other NALs are accessible to
presentation applications prior to decryption. in a secure video path, encrypted information is not available to
presentation applications.

5.2 Index of Security Parameters


SEC URIT Y PA RA M ET ER SEC T IO N

ProtectionElement 2.2.2.2

Common Encryption Boxes [ISO/IEC 23001-7]

5.3 Common Encryption Boxes


The following boxes may be present in fragment responses when Common Encryption is applied, and are specified
in [ISO/IEC 23001-7] or [ISO/IEC 14496-12]:
1. Protection System Specific Header Box (‘pssh’)
2. Sample Encryption Box (‘senc’)
3. Sample Auxiliary Information Offset Box (‘saio’)
4. Sample Auxiliary Information Size Box (‘saiz’)
5. Sample Group Description Box (‘sgpd’)
6. Sample to Group Box (‘sbgp’)

Media Services learning paths


Media Services v3 (latest)
Check out the latest version of Azure Media Services!
Overview
Concepts
Start developing
Migration guidance from v2 to v3
Media Services v2 (legacy)
Overview
Create account
Deliver on-demand
Deliver live

Provide feedback
Use the User Voice forum to provide feedback and make suggestions on how to improve Azure Media Services. You
also can go directly to one of the following categories:
Azure Media Player
Client SDK libraries
Encoding and processing
Live streaming
Media Analytics
Azure portal
REST API and platform
Video-on-demand streaming
Azure Media Services fragmented MP4 live ingest
specification
9/22/2020 • 17 minutes to read • Edit Online

This specification describes the protocol and format for fragmented MP4-based live streaming ingestion for Azure
Media Services. Media Services provides a live streaming service that customers can use to stream live events and
broadcast content in real time by using Azure as the cloud platform. This document also discusses best practices
for building highly redundant and robust live ingest mechanisms.

1. Conformance notation
The key words "MUST," "MUST NOT," "REQUIRED," "SHALL," "SHALL NOT," "SHOULD," "SHOULD NOT,"
"RECOMMENDED," "MAY," and "OPTIONAL" in this document are to be interpreted as they are described in RFC
2119.

2. Service diagram
The following diagram shows the high-level architecture of the live streaming service in Media Services:
1. A live encoder pushes live feeds to channels that are created and provisioned via the Azure Media Services SDK.
2. Channels, programs, and streaming endpoints in Media Services handle all the live streaming functionalities,
including ingest, formatting, cloud DVR, security, scalability, and redundancy.
3. Optionally, customers can choose to deploy an Azure Content Delivery Network layer between the streaming
endpoint and the client endpoints.
4. Client endpoints stream from the streaming endpoint by using HTTP Adaptive Streaming protocols. Examples
include Microsoft Smooth Streaming, Dynamic Adaptive Streaming over HTTP (DASH, or MPEG-DASH), and
Apple HTTP Live Streaming (HLS).

3. Bitstream format – ISO 14496-12 fragmented MP4


The wire format for live streaming ingest discussed in this document is based on [ISO-14496-12]. For a detailed
explanation of fragmented MP4 format and extensions both for video-on-demand files and live streaming
ingestion, see [MS-SSTR].
Live ingest format definitions
The following list describes special format definitions that apply to live ingest into Azure Media Services:
1. The ftyp , Live Ser ver Manifest Box , and moov boxes MUST be sent with each request (HTTP POST). These
boxes MUST be sent at the beginning of the stream and any time the encoder must reconnect to resume stream
ingest. For more information, see Section 6 in [1].
2. Section 3.3.2 in [1] defines an optional box called StreamManifestBox for live ingest. Due to the routing logic
of the Azure load balancer, using this box is deprecated. The box SHOULD NOT be present when ingesting into
Media Services. If this box is present, Media Services silently ignores it.
3. The TrackFragmentExtendedHeaderBox box defined in 3.2.3.2 in [1] MUST be present for each fragment.
4. Version 2 of the TrackFragmentExtendedHeaderBox box SHOULD be used to generate media segments that
have identical URLs in multiple datacenters. The fragment index field is REQUIRED for cross-datacenter failover
of index-based streaming formats such as Apple HLS and index-based MPEG-DASH. To enable cross-datacenter
failover, the fragment index MUST be synced across multiple encoders and be increased by 1 for each
successive media fragment, even across encoder restarts or failures.
5. Section 3.3.6 in [1] defines a box called MovieFragmentRandomAccessBox (mfra ) that MAY be sent at the
end of live ingestion to indicate end-of-stream (EOS) to the channel. Due to the ingest logic of Media Services,
using EOS is deprecated, and the mfra box for live ingestion SHOULD NOT be sent. If sent, Media Services
silently ignores it. To reset the state of the ingest point, we recommend that you use Channel Reset. We also
recommend that you use Program Stop to end a presentation and stream.
6. The MP4 fragment duration SHOULD be constant, to reduce the size of the client manifests. A constant MP4
fragment duration also improves client download heuristics through the use of repeat tags. The duration MAY
fluctuate to compensate for non-integer frame rates.
7. The MP4 fragment duration SHOULD be between approximately 2 and 6 seconds.
8. MP4 fragment timestamps and indexes (TrackFragmentExtendedHeaderBox fragment_ absolute_ time and
fragment_index ) SHOULD arrive in increasing order. Although Media Services is resilient to duplicate
fragments, it has limited ability to reorder fragments according to the media timeline.

4. Protocol format – HTTP


ISO fragmented MP4-based live ingest for Media Services uses a standard long-running HTTP POST request to
transmit encoded media data that is packaged in fragmented MP4 format to the service. Each HTTP POST sends a
complete fragmented MP4 bitstream ("stream"), starting from the beginning with header boxes (ftyp , Live Ser ver
Manifest Box , and moov boxes), and continuing with a sequence of fragments (moof and mdat boxes). For URL
syntax for the HTTP POST request, see section 9.2 in [1]. An example of the POST URL is:
http://customer.channel.mediaservices.windows.net/ingest.isml/streams(720p)

Requirements
Here are the detailed requirements:
1. The encoder SHOULD start the broadcast by sending an HTTP POST request with an empty “body” (zero content
length) by using the same ingestion URL. This can help the encoder quickly detect whether the live ingestion
endpoint is valid, and if there are any authentication or other conditions required. Per HTTP protocol, the server
can't send back an HTTP response until the entire request, including the POST body, is received. Given the long-
running nature of a live event, without this step, the encoder might not be able to detect any error until it
finishes sending all the data.
2. The encoder MUST handle any errors or authentication challenges because of (1). If (1) succeeds with a 200
response, continue.
3. The encoder MUST start a new HTTP POST request with the fragmented MP4 stream. The payload MUST start
with the header boxes, followed by fragments. Note that the ftyp , Live Ser ver Manifest Box , and moov
boxes (in this order) MUST be sent with each request, even if the encoder must reconnect because the previous
request was terminated prior to the end of the stream.
4. The encoder MUST use chunked transfer encoding for uploading, because it’s impossible to predict the entire
content length of the live event.
5. When the event is over, after sending the last fragment, the encoder MUST gracefully end the chunked transfer
encoding message sequence (most HTTP client stacks handle it automatically). The encoder MUST wait for the
service to return the final response code, and then terminate the connection.
6. The encoder MUST NOT use the Events() noun as described in 9.2 in [1] for live ingestion into Media Services.
7. If the HTTP POST request terminates or times out with a TCP error prior to the end of the stream, the encoder
MUST issue a new POST request by using a new connection, and follow the preceding requirements.
Additionally, the encoder MUST resend the previous two MP4 fragments for each track in the stream, and
resume without introducing a discontinuity in the media timeline. Resending the last two MP4 fragments for
each track ensures that there is no data loss. In other words, if a stream contains both an audio and a video
track, and the current POST request fails, the encoder must reconnect and resend the last two fragments for the
audio track, which were previously successfully sent, and the last two fragments for the video track, which were
previously successfully sent, to ensure that there is no data loss. The encoder MUST maintain a “forward” buffer
of media fragments, which it resends when it reconnects.

5. Timescale
[MS-SSTR] describes the usage of timescale for SmoothStreamingMedia (Section 2.2.2.1), StreamElement
(Section 2.2.2.3), StreamFragmentElement (Section 2.2.2.6), and LiveSMIL (Section 2.2.7.3.1). If the timescale
value is not present, the default value used is 10,000,000 (10 MHz). Although the Smooth Streaming format
specification doesn’t block usage of other timescale values, most encoder implementations use this default value
(10 MHz) to generate Smooth Streaming ingest data. Due to the Azure Media Dynamic Packaging feature, we
recommend that you use a 90-KHz timescale for video streams and 44.1 KHz or 48.1 KHz for audio streams. If
different timescale values are used for different streams, the stream-level timescale MUST be sent. For more
information, see [MS-SSTR].

6. Definition of “stream”
Stream is the basic unit of operation in live ingestion for composing live presentations, handling streaming failover,
and redundancy scenarios. Stream is defined as one unique, fragmented MP4 bitstream that might contain a single
track or multiple tracks. A full live presentation might contain one or more streams, depending on the
configuration of the live encoders. The following examples illustrate various options of using streams to compose a
full live presentation.
Example:
A customer wants to create a live streaming presentation that includes the following audio/video bitrates:
Video – 3000 kbps, 1500 kbps, 750 kbps
Audio – 128 kbps
Option 1: All tracks in one stream
In this option, a single encoder generates all audio/video tracks, and then bundles them into one fragmented MP4
bitstream. The fragmented MP4 bitstream is then sent via a single HTTP POST connection. In this example, there is
only one stream for this live presentation.

Option 2: Each track in a separate stream


In this option, the encoder puts one track into each fragment MP4 bitstream, and then posts all of the streams over
separate HTTP connections. This can be done with one encoder or with multiple encoders. The live ingestion sees
this live presentation as composed of four streams.
Option 3: Bundle audio track with the lowest bitrate video track into one stream
In this option, the customer chooses to bundle the audio track with the lowest-bitrate video track in one fragment
MP4 bitstream, and leave the other two video tracks as separate streams.

Summary
This is not an exhaustive list of all possible ingestion options for this example. As a matter of fact, any grouping of
tracks into streams is supported by live ingestion. Customers and encoder vendors can choose their own
implementations based on engineering complexity, encoder capacity, and redundancy and failover considerations.
However, in most cases, there is only one audio track for the entire live presentation. So, it’s important to ensure
the healthiness of the ingest stream that contains the audio track. This consideration often results in putting the
audio track in its own stream (as in Option 2) or bundling it with the lowest-bitrate video track (as in Option 3).
Also, for better redundancy and fault tolerance, sending the same audio track in two different streams (Option 2
with redundant audio tracks) or bundling the audio track with at least two of the lowest-bitrate video tracks (Option
3 with audio bundled in at least two video streams) is highly recommended for live ingest into Media Services.

7. Service failover
Given the nature of live streaming, good failover support is critical for ensuring the availability of the service.
Media Services is designed to handle various types of failures, including network errors, server errors, and storage
issues. When used in conjunction with proper failover logic from the live encoder side, customers can achieve a
highly reliable live streaming service from the cloud.
In this section, we discuss service failover scenarios. In this case, the failure happens somewhere within the service,
and it manifests itself as a network error. Here are some recommendations for the encoder implementation for
handling service failover:
1. Use a 10-second timeout for establishing the TCP connection. If an attempt to establish the connection takes
longer than 10 seconds, abort the operation and try again.
2. Use a short timeout for sending the HTTP request message chunks. If the target MP4 fragment duration is N
seconds, use a send timeout between N and 2 N seconds; for example, if the MP4 fragment duration is 6
seconds, use a timeout of 6 to 12 seconds. If a timeout occurs, reset the connection, open a new connection,
and resume stream ingest on the new connection.
3. Maintain a rolling buffer that has the last two fragments for each track that were successfully and
completely sent to the service. If the HTTP POST request for a stream is terminated or times out prior to the
end of the stream, open a new connection and begin another HTTP POST request, resend the stream
headers, resend the last two fragments for each track, and resume the stream without introducing a
discontinuity in the media timeline. This reduces the chance of data loss.
4. We recommend that the encoder does NOT limit the number of retries to establish a connection or resume
streaming after a TCP error occurs.
5. After a TCP error:
a. The current connection MUST be closed, and a new connection MUST be created for a new HTTP POST
request.
b. The new HTTP POST URL MUST be the same as the initial POST URL.
c. The new HTTP POST MUST include stream headers (ftyp , Live Ser ver Manifest Box , and moov boxes)
that are identical to the stream headers in the initial POST.
d. The last two fragments sent for each track must be resent, and streaming must resume without
introducing a discontinuity in the media timeline. The MP4 fragment timestamps must increase
continuously, even across HTTP POST requests.
6. The encoder SHOULD terminate the HTTP POST request if data is not being sent at a rate commensurate
with the MP4 fragment duration. An HTTP POST request that does not send data can prevent Media Services
from quickly disconnecting from the encoder in the event of a service update. For this reason, the HTTP
POST for sparse (ad signal) tracks SHOULD be short-lived, terminating as soon as the sparse fragment is
sent.

8. Encoder failover
Encoder failover is the second type of failover scenario that needs to be addressed for end-to-end live streaming
delivery. In this scenario, the error condition occurs on the encoder side.

The following expectations apply from the live ingestion endpoint when encoder failover happens:
1. A new encoder instance SHOULD be created to continue streaming, as illustrated in the diagram (Stream for
3000k video, with dashed line).
2. The new encoder MUST use the same URL for HTTP POST requests as the failed instance.
3. The new encoder’s POST request MUST include the same fragmented MP4 header boxes as the failed instance.
4. The new encoder MUST be properly synced with all other running encoders for the same live presentation to
generate synced audio/video samples with aligned fragment boundaries.
5. The new stream MUST be semantically equivalent with the previous stream, and interchangeable at the header
and fragment levels.
6. The new encoder SHOULD try to minimize data loss. The fragment_absolute_time and fragment_index of media
fragments SHOULD increase from the point where the encoder last stopped. The fragment_absolute_time and
fragment_index SHOULD increase in a continuous manner, but it is permissible to introduce a discontinuity, if
necessary. Media Services ignores fragments that it has already received and processed, so it's better to err on
the side of resending fragments than to introduce discontinuities in the media timeline.

9. Encoder redundancy
For certain critical live events that demand even higher availability and quality of experience, we recommended
that you use active-active redundant encoders to achieve seamless failover with no data loss.

As illustrated in this diagram, two groups of encoders push two copies of each stream simultaneously into the live
service. This setup is supported because Media Services can filter out duplicate fragments based on stream ID and
fragment timestamp. The resulting live stream and archive is a single copy of all the streams that is the best
possible aggregation from the two sources. For example, in a hypothetical extreme case, as long as there is one
encoder (it doesn’t have to be the same one) running at any given point in time for each stream, the resulting live
stream from the service is continuous without data loss.
The requirements for this scenario are almost the same as the requirements in the "Encoder failover" case, with the
exception that the second set of encoders are running at the same time as the primary encoders.

10. Service redundancy


For highly redundant global distribution, sometimes you must have cross-region backup to handle regional
disasters. Expanding on the “Encoder redundancy” topology, customers can choose to have a redundant service
deployment in a different region that's connected with the second set of encoders. Customers also can work with a
Content Delivery Network provider to deploy a Global Traffic Manager in front of the two service deployments to
seamlessly route client traffic. The requirements for the encoders are the same as the “Encoder redundancy” case.
The only exception is that the second set of encoders needs to be pointed to a different live ingest endpoint. The
following diagram shows this setup:

11. Special types of ingestion formats


This section discusses special types of live ingestion formats that are designed to handle specific scenarios.
Sparse track
When delivering a live streaming presentation with a rich client experience, often it's necessary to transmit time-
synced events or signals in-band with the main media data. An example of this is dynamic live ad insertion. This
type of event signaling is different from regular audio/video streaming because of its sparse nature. In other words,
the signaling data usually does not happen continuously, and the interval can be hard to predict. The concept of
sparse track was designed to ingest and broadcast in-band signaling data.
The following steps are a recommended implementation for ingesting sparse track:
1. Create a separate fragmented MP4 bitstream that contains only sparse tracks, without audio/video tracks.
2. In the Live Ser ver Manifest Box as defined in Section 6 in [1], use the parentTrackName parameter to
specify the name of the parent track. For more information, see section 4.2.1.2.1.2 in [1].
3. In the Live Ser ver Manifest Box , manifestOutput MUST be set to true .
4. Given the sparse nature of the signaling event, we recommended the following:
a. At the beginning of the live event, the encoder sends the initial header boxes to the service, which allows
the service to register the sparse track in the client manifest.
b. The encoder SHOULD terminate the HTTP POST request when data is not being sent. A long-running
HTTP POST that does not send data can prevent Media Services from quickly disconnecting from the
encoder in the event of a service update or server reboot. In these cases, the media server is temporarily
blocked in a receive operation on the socket.
c. During the time when signaling data is not available, the encoder SHOULD close the HTTP POST request.
While the POST request is active, the encoder SHOULD send data.
d. When sending sparse fragments, the encoder can set an explicit content-length header, if it’s available.
e. When sending sparse fragments with a new connection, the encoder SHOULD start sending from the
header boxes, followed by the new fragments. This is for cases in which failover happens in-between, and
the new sparse connection is being established to a new server that has not seen the sparse track before.
f. The sparse track fragment becomes available to the client when the corresponding parent track fragment
that has an equal or larger timestamp value is made available to the client. For example, if the sparse
fragment has a timestamp of t=1000, it is expected that after the client sees "video" (assuming the parent
track name is "video") fragment timestamp 1000 or beyond, it can download the sparse fragment t=1000.
Note that the actual signal could be used for a different position in the presentation timeline for its
designated purpose. In this example, it’s possible that the sparse fragment of t=1000 has an XML payload,
which is for inserting an ad in a position that’s a few seconds later.
g. The payload of sparse track fragments can be in different formats (such as XML, text, or binary),
depending on the scenario.
Redundant audio track
In a typical HTTP adaptive streaming scenario (for example, Smooth Streaming or DASH), often, there's only one
audio track in the entire presentation. Unlike video tracks, which have multiple quality levels for the client to choose
from in error conditions, the audio track can be a single point of failure if the ingestion of the stream that contains
the audio track is broken.
To solve this problem, Media Services supports live ingestion of redundant audio tracks. The idea is that the same
audio track can be sent multiple times in different streams. Although the service only registers the audio track once
in the client manifest, it can use redundant audio tracks as backups for retrieving audio fragments if the primary
audio track has issues. To ingest redundant audio tracks, the encoder needs to:
1. Create the same audio track in multiple fragment MP4 bitstreams. The redundant audio tracks MUST be
semantically equivalent, with the same fragment timestamps, and be interchangeable at the header and
fragment levels.
2. Ensure that the “audio” entry in the Live Server Manifest (Section 6 in [1]) is the same for all redundant audio
tracks.
The following implementation is recommended for redundant audio tracks:
1. Send each unique audio track in a stream by itself. Also, send a redundant stream for each of these audio track
streams, where the second stream differs from the first only by the identifier in the HTTP POST URL:
{protocol}://{server address}/{publishing point path}/Streams({identifier}).
2. Use separate streams to send the two lowest video bitrates. Each of these streams SHOULD also contain a copy
of each unique audio track. For example, when multiple languages are supported, these streams SHOULD
contain audio tracks for each language.
3. Use separate server (encoder) instances to encode and send the redundant streams mentioned in (1) and (2).

Media Services learning paths


Media Services v3 (latest)
Check out the latest version of Azure Media Services!
Overview
Concepts
Start developing
Migration guidance from v2 to v3
Media Services v2 (legacy)
Overview
Create account
Deliver on-demand
Deliver live

Provide feedback
Use the User Voice forum to provide feedback and make suggestions on how to improve Azure Media Services.
You also can go directly to one of the following categories:
Azure Media Player
Client SDK libraries
Encoding and processing
Live streaming
Media Analytics
Azure portal
REST API and platform
Video-on-demand streaming

You might also like