You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/341706804

When they go high, we go low: low-latency live streaming in dash.js with LoL

Conference Paper · May 2020


DOI: 10.1145/3339825.3397043

CITATIONS READS
10 203

5 authors, including:

Abdelhak Bentaleb Ali C. Begen


National University of Singapore Ozyegin University
40 PUBLICATIONS   630 CITATIONS    112 PUBLICATIONS   3,134 CITATIONS   

SEE PROFILE SEE PROFILE

Roger Zimmermann
National University of Singapore
413 PUBLICATIONS   5,193 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Code Runtime Complexity Prediction View project

Lecture Video Segmentation View project

All content following this page was uploaded by Abdelhak Bentaleb on 20 May 2021.

The user has requested enhancement of the downloaded file.


When They Go High, We Go Low:
Low-Latency Live Streaming in dash.js with LoL
May Lim★, Mehmet N. Akcay+ , Abdelhak Bentaleb★, Ali C. Begen+ and Roger Zimmermann★
University of Singapore, + Ozyegin University
★National

{maylim,bentaleb,rogerz}@comp.nus.edu.sg, necmettin.akcay@ozu.edu.tr, ali.begen@ozyegin.edu.tr

ABSTRACT To satisfy the low-latency requirements, many platforms adopted


Live streaming remains a challenge in the adaptive streaming variants of HTTP adaptive streaming (HAS) using the Common
space due to the stringent requirements for not just quality and Media Application Format (CMAF) standard [2] and HTTP chunked
rebuffering, but also latency. Many solutions have been proposed transfer encoding (CTE) [12]. The idea behind CTE, which is a
to tackle streaming in general, but only few have looked into standard feature of HTTP/1.1 (RFC 7230) is to divide a segment into
better catering to the more challenging low-latency live streaming a number of chunks and then deliver them in near real time to the
scenarios. In this paper, we re-visit and extend several important client, even before the segment is fully available.
components (collectively called Low-on-Latency, LoL) in adaptive CMAF and CTE have shown good potential but they also have
streaming systems to enhance the low-latency performance. LoL their own limitations and challenges. In the case of CTE, throughput
includes bitrate adaptation (both heuristic and learning-based), measurement becomes a non-trivial task since the basic formula
playback control and throughput measurement modules. of dividing the segment size by the download duration does not
work anymore. This is because the communication is now based
CCS CONCEPTS on smaller chunks within a segment and there may be idle times
between consecutive chunks’ transfers due to discrete times the
• Multimedia information systems → Multimedia streaming.
chunks are produced at. If the download duration also includes the
idle times, the throughput measurements become inaccurate.
KEYWORDS For HAS, many existing Adaptive Bitrate (ABR) algorithms
HAS; ABR; DASH; CMAF; low-latency; HTTP chunked transfer ranging from heuristic ([4] and [19]) to learning-based ([6] and [13])
encoding; adaptive playout; SOM; learning. have been widely studied. However, due to the multitude of factors
ACM Reference Format: that affect users’ streaming experience across different streaming
May Lim, Mehmet N. Akcay, Abdelhak Bentaleb, Ali C. Begen and Roger scenarios, it is clear that there is no one-size-fits-all algorithm
Zimmermann. 2020. When They Go High, We Go Low: Low-Latency Live that would cater well to all scenarios. In this paper, we tackle the
Streaming in dash.js with LoL. In 11th ACM Multimedia Systems Conference aforementioned challenges by developing an ABR selection module
(MMSys’20), June 8–11, 2020, Istanbul, Turkey. ACM, New York, NY, USA, and a playback control module that optimize users’ QoE for LLL
6 pages. https://doi.org/10.1145/3339825.3397043 streaming as well as an accurate throughput measurement module
for CTE. These modules are collectively called Low-on-Latency
1 INTRODUCTION (LoL) and available to download from [3]. While we propose LoL as
Currently, one popular area of development in our field is low- a solution in response to the challenge offered by Twitch [17] and
latency live (LLL) streaming, where the goal is to achieve near it is integrated with the dash.js client, its concepts can be easily
one-second latency when delivering live content. Making latency, applied to other streaming clients using CTE [15].
i.e., the time difference between video capture and playback, short The rest of the paper is organized as follows. Section 2 covers
is not easy because in that period, the video has to be captured, the related work. Section 3 describes the details and rationale of the
encoded, packaged and transferred from a server to a client, which three main modules of our solution — the ABR selection, playback
may be far away from each other and subject to unpredictable control and throughput measurement modules. Section 4 evaluates
network conditions. What makes this even more difficult is that our solution and Section 5 concludes the paper.
latency is oftentimes not the only consideration in users’ desired
quality of experience (QoE) — other conflicting factors such as high
video quality, low rebuffering rate and fewer quality switches also
2 RELATED WORK
need to be considered. 2.1 Heuristic-Based ABR Schemes
One class of ABR schemes [4] employs heuristics (e.g., based on
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed the available bandwidth and/or buffer occupancy) to find a good
for profit or commercial advantage and that copies bear this notice and the full citation bitrate level at each ABR decision step, often using some sort of
on the first page. Copyrights for components of this work owned by others than ACM mathematical rules and/or approximations. The general approach
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a is to make some measurements and react in a way that increases
fee. Request permissions from permissions@acm.org. the positive QoE metrics (e.g., selected bitrate level) and reduces the
MMSys’20, June 8–11, 2020, Istanbul, Turkey negative ones (e.g., startup delay, the number of bitrate switches and
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-6845-2/20/06. . . $15.00 rebuffering events). For example, if the current playback buffer is
https://doi.org/10.1145/3339825.3397043 getting emptier, the ABR logic selects a lower bitrate representation
MMSys’20, June 8–11, 2020, Istanbul, Turkey May Lim, Mehmet N. Akcay, Abdelhak Bentaleb, Ali C. Begen and Roger Zimmermann

to replenish the buffer sooner so as to avoid a rebuffer. Heuristic- Table 1: Notations of the QoE model.
based schemes tend to be further categorized into buffer-based, rate-
Notation Meaning
based or hybrid methods, of which the latter ones are increasingly
adopted due to the advancements in client-side technologies that 𝑠 Segment
enable more complex and faster computing. An example is the 𝑅 Bitrate selected (Kbps)
popular reference client dash.js [1]. In LLL streaming, a heuristic- 𝐸 Rebuffering time (seconds)
based scheme requires many optimizations to fit the low-latency 𝐿 Live latency (seconds)
requirements and this is one of the primary goals for our paper. 𝑃 Playback speed (e.g., 1, 0.95, 1.05)
𝑆 Total number of segments
2.2 Learning-Based ABR Schemes 𝛼 Bitrate reward factor
𝛽 Rebuffering penalty factor
Continually selecting the right bitrate in LLL streaming is a
𝛾 Live latency penalty factor
complex task due to dynamic factors such as the network and
𝜎 Playback speed penalty factor
client conditions. Hence, learning-based schemes may be suitable
𝜇 Bitrate switch penalty factor
for LLL streaming. Rather than relying on fixed heuristics, learning-
based schemes try to learn from the system environment and create
policies based on past data, and thus, adapt to the system dynamics. We note that the playback speed is usually: (𝑖) normal: 1×, (𝑖𝑖)
Different machine learning techniques [16] can be applied to fast: e.g., 1.5×, or (𝑖𝑖𝑖) slow: e.g., 0.95×. Thus, if the playback speed
improve LLL streaming. Reinforcement learning (RL) is one of the is normal then the playback speed penalty will be zero. Otherwise,
popular techniques relying on a reward mechanism [7], [6], [8] the penalty should be set to the minimum encoding bitrate level.
and [13]. RL is used to create an adaptive learning algorithm. We fixed the factors of each metric as: 𝛼 = segment duration; 𝛽 =
Other supervised and unsupervised learning techniques have been maximum encoding bitrate; 𝛾 = { if 𝐿 ≤ 1.1 seconds then = 0.05 ×
proposed as well. In this paper, we use Self Organizing Maps minimum encoding bitrate, otherwise = 0.05 × maximum encoding
(SOM) [11], which were designed for unsupervised classification bitrate }; 𝜎 = minimum encoding bitrate; and 𝜇 = 1 second.
problems. The selection of the right bitrate in LLL streaming can
be modelled as a classification problem, as described in detail later. 3.2 ABR Selection Module
This module is invoked at each segment download boundary and its
2.3 Throughput Measurement goal is to select the best bitrate for the next segment that maximizes
There exist a number ABR schemes [4] at various levels of the streaming session’s QoE for the duration of the optimization
sophistication that work well for video-on-demand (VoD) services, horizon. Below, we detail both our heuristic and learning-based
where the content is encoded/packaged in advance and the ABR algorithms.
playback buffer can accommodate several tens of seconds of content. 3.2.1 Heuristic-Based Algorithm. For this method, we build upon
However, these schemes do not work well in LLL streaming, the contributions of FastMPC [19] and adapt its approach to fit our
where the client fetches the live-edge segment that is still being LLL streaming scenario. FastMPC is an ideal starting point due to
encoded/packaged at the time of the request. Upon a request, the its extensible and practical nature, which allows us to update the
HTTP server bursts the available chunks and the client receives heuristic factors and it is also lightweight enough to run smoothly
them at the network speed, whereas the remaining chunks (yet in browser-based clients. We first present an overview of how our
to be produced) are sent to the client as they become available, algorithm works and then discuss our improvements over FastMPC.
resulting in idle periods between these chunks. These idle periods The general idea of this algorithm is to calculate the potential
are not easy to detect and they confuse the client when computing QoE across the available options and then select the one with the
the throughput it can achieve. Refer to [5] for details. highest value. The advantage is that it is sensitive to the QoE model,
and hence, can ensure a high QoE value for the session. Some
3 SOLUTION DETAILS additional improvements include using a look-ahead window of 𝑁
We now describe the details of our solution starting with our future segments (we use 𝑁 = 5) when considering our options, as
selected QoE model and followed by the ABR selection, playback well as a harmonic mean of the past 𝑀 throughput measurements
control and throughput measurement modules. (we use 𝑀 = 5) to approximate the future throughput [9].
There are three main steps in our heuristic-based algorithm,
3.1 QoE Model Module which are repeated at each segment download decision epoch:
Our QoE model is inspired by the Yin et al. [19] and Yi et al. [18] (1) Permute Options: Based on the look-ahead window size 𝑁 , we
models. At each segment download, the model considers five permute all possible options of bitrate selection. For example,
essential metrics: bitrate selected, bitrate switches, rebuffering time, with 𝑁 = 5 and three bitrate representations {200, 600, 1000}
live latency and playback speed. The QoE model is expressed as: Kbps, the options include [200, 200, 200, 200, 200], [200, 200,
200, 200, 600] and so on (until [1000, 1000, 1000, 1000, 1000]).
𝑆
Õ  𝑆−1
Õ (2) Calculate the Potential QoE: Based on the QoE model, we
QoE = 𝛼𝑅𝑠 − 𝛽𝐸𝑠 − 𝛾𝐿𝑠 − 𝜎 |1 − 𝑃𝑠 | − 𝜇|𝑅𝑠+1 − 𝑅𝑠 |, (1) calculate the estimated QoE value for each option. Note that
𝑠=1 𝑠=1
some QoE factors are easy to calculate (e.g., bitrate factor)
where the list of notations is given in Table 1. while others are more complex, such as the rebuffering factor,
When They Go High, We Go Low: Low-Latency Live Streaming in dash.js with LoL MMSys’20, June 8–11, 2020, Istanbul, Turkey

which requires estimation of the future download time. Hence, computation and transfer time (leading to startup delay) as well as
accurate throughput measurement becomes critical. memory space (which is an expensive resource for clients).
(3) Evaluate Options: Based on the QoE estimations, we select the
3.2.2 Learning-Based Algorithm. The Self Organizing Maps (SOM)
option with the highest QoE as the next bitrate.
proposed by Kohonen [11] is one of the widely used techniques for
FastMPC, in its original form, does not perform well for the LLL unsupervised classification problems. This technique has even been
streaming scenario as it makes certain assumptions and design applied to some NP-hard problems like the travelling salesman [10].
choices that we seek to improve upon, which are elaborated below. ABR decisions for LLL streaming can also be thought of as a similar
First, we adapt the QoE objective to consider more important classification problem given that we do not know the data points
factors in the LLL streaming scenario. Details of our QoE model (i.e., variables of the system environment) to select a suitable bitrate
have been covered in Section 3.1, and as a summary, here are the for the next segment to download. Based on these similarities, it is
key improvements in our QoE model: possible to initiate an SOM as shown in Figure 1. For each of the
encoded video streams, a corresponding SOM neuron is created
• Replacing startup delay with (live) latency: Startup delay is not as and each neuron is initialized with the bitrate of the corresponding
critical in live streaming as it naturally tends to be short. As the video stream. The algorithm will evaluate the current state and
name suggests, live streaming aims to stream as close to real time consider that as the data point that has to be classified. Each time
as possible and the target latency is in the range of few seconds. the algorithm is called, it will gather the current state and classify
• Introducing playback speed: Due to the strong preference for that point to one of the neurons to find the best matching unit
low latency, some clients may adopt adaptive playback strategies (BMU) in the SOM algorithm.
to further reduce latency. For example, the client could speed The features of the data points in our LLL streaming context
up the playback (when the buffer is full) to catch up with are a quadruplet set of measured throughput, live latency, buffer
the stream’s real-time edge, and hence, reduce the latency. occupancy and QoE. Initially, at the beginning of a live session,
However, extreme playback speed changes could also lead to the throughput feature is initialized with the minimum encoding
a poorer experience. Hence, we include the playback speed (more bitrate level. For more accurate ABR decisions, we normalize all the
specifically, deviation from playback speed = 1) as an additional data points to between zero and one. Every state (white circles in
penalty factor to balance it. Figure 1) comprises a quadruplet set of features that are calculated
after each segment download. In our SOM algorithm optimization,
Next, in terms of the implementation approach, we omit the
we also include the targets as target latency (≤ 1 second), target
lookup table in FastMPC’s offline enumeration step to prioritize
buffer occupancy (≤ 1 second) and target QoE (highest possible
accuracy and performance by computing the options online at each
value 1, after normalization in the range of 0–1). Then, we use a
segment download boundary instead. Note that the decision space
distance function 𝑑 () to find the best matching (2) that represents
for this approach is much smaller than the lookup table. For example,
the euclidean distance of the four features used in each neuron.
with a bitrate list of three representations and a look-ahead window
Distance function 𝑑 () is defined as follows:
size of five segments, the lookup table would have 243 thousand
v
entries (assuming 100 bins for throughput values and 10 bins for u
t 4
Õ
buffer values in line with [19]), while the decision space for our 𝑑 (𝑎, 𝑏) = 𝑤𝑖 × (𝑎[𝑖] − 𝑏 [𝑖]) 2 . (2)
approach is merely 35 = 243. Although this number increases 𝑖=1
significantly with a larger representation count and look-ahead
The weight matrix 𝑤 = [𝑤𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 , 𝑤𝑙𝑎𝑡𝑒𝑛𝑐𝑦 , 𝑤𝑏𝑢 𝑓 𝑓 𝑒𝑟 , 𝑤𝑞𝑜𝑒 ] for
window size, we believe that further improvements can be made
the four features is defined as follows:
to manage this computation load, should it become unmanageable. (
For example, one can consider certain search heuristics instead of 4, if neuron.bitrate > measured throughput
𝑤𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 =
permuting every option. 0.4, otherwise,
This approach promotes accuracy and performance for several
𝑤𝑙𝑎𝑡𝑒𝑛𝑐𝑦 = 0.4,
reasons. (𝑖) In live streaming mode, we do not know future segment
sizes, hence, it is prudent to approximate them dynamically instead 𝑤𝑏𝑢 𝑓 𝑓 𝑒𝑟 = 0.1,
(
of pre-computing them. We use a simple approximation as an 1, if neuron.QoE < minimum allowed QoE
assumption is made for Constant Bitrate (CBR) encoded videos, 𝑤𝑞𝑜𝑒 =
0.4, otherwise.
i.e., we take the segment size to be the bitrate level multiplied
by the segment duration. For other scenarios, where appropriate, Usually, after finding the BMU, the corresponding neuron and its
we could consider using other approximations such as Kaufman’s neighbours should be updated. In our context, the update function is
Adaptive Moving Average (KAMA) [14] or arithmetic average slightly different since we only have two known SOM neurons. The
of past few segment sizes. (𝑖𝑖) As we saw earlier, this algorithm first one is the current neuron (that represents the current selected
calculates potential QoE of N future segments at each segment bitrate level) and the second neuron is the next best one to transit
download boundary. With this approach, we could dynamically to. That is, we update the current SOM neuron with the newly
update the QoE model based on the ongoing changes in the video reported values of the features (i.e., the client reports its status at
and/or environment, such as potentially increasing the weight for every segment download), and then we update its neighbours as
video quality for video segments that are deemed more important. well using the same values. Next, the BMU neuron is calculated
(𝑖𝑖𝑖) Pre-computing the lookup table incurs high cost in terms of using (2) with target latency = 1, target buffer occupancy = 1
MMSys’20, June 8–11, 2020, Istanbul, Turkey May Lim, Mehmet N. Akcay, Abdelhak Bentaleb, Ali C. Begen and Roger Zimmermann

Feature Map
State
• Significance of the QoE model: In systems where the desired
QoE model is not clear or available, choosing the learning-based
Measured Throughput algorithm may be a better choice as the heuristic-based algorithm
Live Latency
Buffer Occupancy
is highly dependent on the QoE model.
QoE • Comparing the performance of both algorithms: At times,
the choice (and performance) of the algorithms may not be
Weight Matrix clear. Hence, if both algorithms could be evaluated extensively
based on the desired setup, then this would be the ideal way
Bitrate Levels of understanding their performances and determining which
algorithm to use. One such example is shown in Section 4.
Figure 1: An example of an SOM feature map for three • Possibility of combining both rules: Another approach is to use
different bitrate levels. both rules and switch amongst them dynamically depending on
which algorithm performs better under the given conditions. This
and target QoE = 1. The best matching neuron that achieves the possibility remains to be further explored in the future.
target values wins. After finding the winner neuron, it becomes
the current one and will be updated with the reported values of 3.3 Playback Control Module
the features, and so on. The process is repeated for each segment The main goal of this module is to determine a desired playback
download until the live streaming session ends. rendering speed that could further reduce the latency while taking
a calculated risk of rebuffering. We adopt a heuristic-based logic to
Algorithm 1 SOM Bitrate Selection (Pseudocode) determine the desired playback speed in this module. Specifically,
UPDATE(current Neuron, Measured Throughput, Live Latency, Buffer Occupancy, the module considers both the target and current latencies, and the
QoE) current buffer level. If the latency has increased beyond the target
Minimum Distance ← 0
Minimum Allowed QoE ← 50 // Heuristically determined; non-normalized and the buffer level is sufficiently large, then the playback is sped
BMU ← ∅ up to reduce the buffer and bring down the latency. Conversely, if
Target Latency ← 1 the latency is very low but there is a risk of stalling because of a
Target Buffer Occupancy ← 1
Target QoE ← 1 critically low buffer, playback is slowed to reduce data drain. Note
for all Neuron ∈ SOM Neurons do that LoL can be configured for several parameters such as the range
𝑑 = GETDistance(Measured Throughput, Target Latency, of the catch-up playback speed allowable (e.g., 0.5 − 1.5×), target
Target Buffer Occupancy, Target QoE)
if 𝑑 < Minimum Distance then latency and min/max buffer levels.
Minimum Distance ← 𝑑
BMU ← Neuron
end if
3.4 Throughput Measurement Module
end for As explained by Bentaleb et al. in [5], the basic equation of (segment
UPDATE(BMU, Measured Throughput, Target Latency, Target Buffer Occupancy,
Target QoE) size / segment download time) always produces a throughput value
equal to (or slightly smaller than) the segment encoding bitrate
(because of inter-chunk idle periods), which prevents the client
Algorithm 1 shows a pseudocode of our bitrate selection from switching to higher bitrate levels. To address this issue, [5]
technique. It performs two main updates. The first one updates the developed a solution, called ACTE, which shows good performance,
SOM neurons from the fresh status reported by the client, and the but sometimes suffers from throughput overestimation when the
second update is to move the BMU neuron to the optimum solution. duration of the inter-chunk idle periods increases. Here, we address
So, there is a balance between those two updates where one of them this matter and propose a throughput measurement algorithm
moves out of the optimum space and the other is trying to move to that works for both short and long inter-chunk idle periods. Our
the optimum point. At each iteration the algorithm selects the one algorithm consists of two stages:
closer to the optimal point and updates all neighbours accordingly.
(1) Chunk Boundary Identification: The algorithm identifies the
The neighbourhood function used in the algorithm is the Gaussian
start and end time of each chunk download by capturing the
distribution function (3). The learning rate is fixed to 0.01 as in the
‘moof’ box (or atom) in fragmented MP4 data. Our algorithm
original paper [11].
leverages the fact that the chunks are transmitted as separate
−𝑑 (𝑎,𝑏) 2 HTTP chunks through CTE. Thanks to the Fetch API, our
ℎ(𝑎, 𝑏) = 𝑒 2𝜎 2 (3)
algorithm uses Streaming Response Body1 , which allows
3.2.3 Choice of Algorithm. Both heuristic and learning-based tracking the progress of the chunk downloads and parsing the
algorithms have their strengths and limitations. Choosing which chunk payloads in real time. When a ‘moof’ box of a chunk
algorithm to use depends on several factors: is captured, the algorithm stores the time as the start time of
• Predictability of media or environment factors: In systems where the chunk download using new Date(). Then, it stores the end
these factors, e.g., segment sizes and throughput measurements, time of the chunk and its size (in bytes) when the chunk is fully
are difficult to predict, choosing the learning-based algorithm downloaded.
may be a better choice as the heuristic-based algorithm would 1 https://developer.mozilla.org/en-
be sensitive to prediction errors. US/docs/Web/API/Streams_API/Using_readable_streams
When They Go High, We Go Low: Low-Latency Live Streaming in dash.js with LoL MMSys’20, June 8–11, 2020, Istanbul, Turkey

Table 2: QoE values for each algorithm/network profile. Table 3: QoE metrics for each algorithm/network profile.
Alg. QoE𝑡𝑜𝑡𝑎𝑙 QoE𝑏𝑖𝑡𝑟𝑎𝑡𝑒 QoE𝑟𝑒𝑏𝑢 𝑓 QoE𝑙𝑎𝑡𝑒𝑛𝑐𝑦 QoE𝑠𝑤𝑖𝑡𝑐ℎ QoE𝑝𝑙𝑎𝑦𝑏𝑎𝑐𝑘 Alg. Avg. Bitrate Total Rebuffer Avg. Latency # of
PROFILE_CASCADE (Kbps) (s) (s) Switches
L 28,446.59 101,142.86 -32,948.45 -25,400.38 -3,942.86 -10,404.59 PROFILE_CASCADE (150s)
H 18,878.55 100,040.00 -31,871.59 -27,869.65 -11,200.00 -10,220.20 L 672.53 33.23 1.25 10.86
PROFILE_INTRA_CASCADE H 661.96 32.12 1.33 28.80
L -53,595.28 70,040.00 -52,679.92 -51,703.80 -1,600.00 -17,651.56 PROFILE_INTRA_CASCADE (135s)
H -74,035.15 72,080.00 -51,786.94 -69,810.60 -6,720.00 -17,797.60 L 521.58 53.24 2.04 5.00
PROFILE_SPIKE H 534.76 52.39 2.75 16.80
L -4,139.75 19,114.29 -9,714.00 -9,098.45 -1,200.00 -3,241.59 PROFILE_SPIKE (30s)
H -7,952.59 18,920.00 -9,733.26 -12,525.02 -1,520.00 -3,094.30 L 646.97 10.45 1.77 4.00
PROFILE_SLOW_JITTERS H 667.17 10.47 2.42 4.40
L -17,425.02 17,942.86 -14,737.62 -12,119.00 -4,057.14 -4,454.11 PROFILE_SLOW_JITTERS (30s)
H -25,797.99 21,000.00 -16,332.64 -16,743.03 -8,640.00 -5,082.32 L 642.82 15.44 2.13 10.71
PROFILE_FAST_JITTERS H 780.50 17.34 2.97 19.80
L 11,337.56 11,600.00 0.00 -231.85 0.00 -30.59 PROFILE_FAST_JITTERS (11.6s)
H 11,241.86 11,500.00 0.00 -230.09 0.00 -28.05 L 1,000.00 0.00 1.00 1.00
H 1,000.00 0.00 1.00 1.00

(2) Idle Period Filtering: After receiving the chunks of a segment,


the filtering process is triggered to remove the idle periods and that contributed to these QoE values. In both tables, L indicates the
noise. Our algorithm uses the maximum HTTP chunk size in learning-based and H the heuristic-based algorithm.
contrast to ACTE [5], which uses the chunk download times. From these evaluations, we see that the learning-based algorithm
The filtering works as follows: The first and the last chunks are performs better than the heuristic-based one across all network
ignored, in order to avoid any transient outliers. The data that profiles, with some profiles achieving significantly better QoE
is received in between is taken into account for the filtering results. For example, the learning-based algorithm achieved a
calculation. total QoE value that is 50.7% higher under PROFILE_CASCADE.
To understand the difference in performance better, we refer to
4 EVALUATION Table 3, which suggests that under cascade-like network behavior
We present our evaluation approach and results, which seek to (i.e., PROFILE_CASCADE and PROFILE_INTRA_CASCADE), the
answer two questions: (1) How do the heuristic and learning-based learning-based algorithm is able to provide a more stable
algorithms perform based on a particular LLL streaming setup? (2) streaming session with far less bitrate switches than heuristic-based.
Does the throughput measurement module perform well? Intuitively, this is likely because of the continual measurement of
the distances to the optimum states with four dimensions and four
4.1 Setup different weights, which is more sensitive to changes. It is also
important that the QoE weight in the learning-based algorithm is
In our setup, we follow the guidelines from [17]. For the video,
increased when the QoE is lower than the minimum allowed QoE
we use the “Big Buck Bunny” encoded at bitrates of {200,
value. Meanwhile, because our newly proposed module provides
600, 1000} Kbps. For the network traces, we use five different
accurate throughput measurements, we can confidently increase the
profiles that simulate different scenarios of challenging network
throughput weight in (2) for video streams that have higher bitrates
conditions, namely { PROFILE_CASCADE, PROFILE_INTRA
than the available bandwidth, and hence, avoid their selection by
_CASCADE, PROFILE_SPIKE, PROFILE_SLOW_JITTERS and
the learning algorithm.
PROFILE_FAST_JITTERS }. We execute five runs for each network
As for the other network profiles, we see that the learning-based
profile and take their average for the results shown.
algorithm is generally able to achieve lower rebuffering duration
and latency values, which are critical in LLL streaming.
4.2 Implementation
We implemented LoL in dash.js (v3.0.1) and our code is available 4.4 Throughput Measurement Accuracy
at [3]. The implementation details can be found in the following
Next, we conduct experiments to evaluate the accuracy of the
files:
throughput measurement module by calculating the measurement
• QoE model: LoL_qoeEvaluator.js (new) error (|measured_value-true_value|/true_value) for each of
• ABR selection: the five network profiles and then averaging them to get an
– Heuristic-based: LoL_heuristicRule.js (new) overall measurement error value. We compare the accuracy of our
– Learning-based: LoL_learningRule.js (new) module against the default throughput measurement provided by
• Throughput measurement: LoL_tputMeasurement.js (new) dash.js. Results show that our module is able to reduce the overall
• Playback control: playbackController.js (modified) measurement error to 0.21, as compared to the overall measurement
error of 2.53 when using the one in dash.js. Figure 2 shows how
4.3 QoE Performance the true and measured values vary across time using one sample
We evaluate the performance of both heuristic and learning-based run for each network profile.
ABR algorithms in the setup above. Table 2 provides the average As we see from Figure 2, the dash.js throughput measurement
values of total QoE as well as individual QoE factors for each of tends to err at high bandwidth regions, which is where source-
the five network profiles. Table 3 provides some of the key metrics limited and CTE-enabled transfers are more likely to occur. In
MMSys’20, June 8–11, 2020, Istanbul, Turkey May Lim, Mehmet N. Akcay, Abdelhak Bentaleb, Ali C. Begen and Roger Zimmermann

Throughput (Mbps) PROFILE_CASCADE Throughput (Mbps) PROFILE_INTRA_CASCADE Throughput (Mbps) PROFILE_SPIKE


5 5

4
4 4

3 3
3

2 2 2

1 1
1

0
0 25 50 75 100 125 150 0 25 50 75 100 125 0 5 10 15 20 25 30
Time (s) Time (s) Time (s)

Throughput (Mbps) PROFILE_SLOW_JITTERS Throughput (Mbps) PROFILE_FAST_JITTERS


7

4 6

5
3
4

2 3

2
1
1

0 5 10 15 20 25 30 0 2 4 6 8 10
Time (s) Time (s)

Figure 2: Throughput measurements given by dash.js (dotted red line) and our implementation (dashed green line) compared
against the true values (solid blue line) for each network profile. Note that measured values above 5 Mbps (7 Mbps for
PROFILE_FAST_JITTERS) have been omitted in the graphs for better visualisation.

such scenarios, the idle periods between chunks of a segment [3] Low-on-Latency (LoL). [Online] Available: https://github.com/NUStreaming/
increase and when these periods are included in the calculation acm-mmsys-2020-grand-challenge.
[4] A. Bentaleb, B. Taani, A. C. Begen, C. Timmerer, and R. Zimmermann. A
of the segment download time, it leads to significant throughput survey on bitrate adaptation schemes for streaming media over HTTP. IEEE
measurement errors. While earlier versions of dash.js (i.e., v2.9.x) Communications Surveys Tutorials, 21(1):562–585, 2019.
[5] A. Bentaleb, C. Timmerer, A. C. Begen, and R. Zimmermann. Bandwidth
did not perform any chunk filtering and thus often underestimated prediction in low-latency chunked streaming. In ACM NOSSDAV, 2019.
the measured throughput, the more recent versions (i.e., v3.x) [6] M. Gadaleta, F. Chiariotti, M. Rossi, and A. Zanella. D-DASH: A deep q-learning
include chunk filtering, but they now seem to err in the opposite framework for DASH video streaming. IEEE Trans. Cognitive Communications
and Networking, 3(4):703–718, 2017.
direction and sometimes overestimate the measured throughput. [7] R. Hong, Q. Shen, L. Zhang, and J. Wang. Continuous bitrate & latency control
with deep reinforcement learning for live video streaming. In ACM Multimedia,
5 CONCLUSIONS 2019.
[8] T. Huang, R.-X. Zhang, C. Zhou, and L. Sun. QARC: Video quality aware rate
We presented the LoL solution that performs well in LLL streaming. control for real-time video streaming based on deep reinforcement learning. In
The LoL solution enhances several important modules in streaming ACM Multimedia, 2018.
[9] J. Jiang, V. Sekar, and H. Zhang. Improving fairness, efficiency, and stability in
systems, namely the QoE model, ABR algorithms (both heuristic HTTP-based adaptive video streaming with FESTIVE. In 8th Int. Conf. Emerging
and learning-based), playback control and throughput measurement Networking Experiments and Technologies, 2012.
[10] H. Jin, K. Leung, and M. L. Wong. An Integrated Self-organizing Map for the
modules. Our evaluations show that the learning-based algorithm Traveling Salesman Problem. Advances in Neural Networks and Appl., 2001.
currently performs better than the heuristic-based one in this [11] T. Kohonen, M. R. Schroeder, and T. S. Huang. Self-Organizing Maps. Springer-
particular setup and that our throughput measurement module Verlag, Berlin, Heidelberg, 3rd edition, 2001.
[12] W. L. Ultra-Low-Latency Streaming Using Chunked-Encoded and Chunked-
performs better than the default implementation provided by Transferred CMAF. Akamai White paper. Online; accessed 10 January 2019.
dash.js. In the future, we seek to continue to explore the use [13] H. Mao, R. Netravali, and M. Alizadeh. Neural adaptive video streaming with
of heuristic versus learning-based algorithms in LLL streaming. Pensieve. In ACM SIGCOMM, 2017.
[14] H. Peng, Y. Zhang, Y. Yang, and J. Yan. A hybrid control scheme for adaptive live
streaming. In ACM Multimedia, 2019.
Acknowledgments [15] Periscope Code. Introducing LHLS Media Streaming. [Online] Available:
https://medium.com/@periscopecode/introducing-lhls-media-streaming-
This research is supported by Singapore Ministry of Education eb6212948bef. Online; accessed 5 April 2020.
Academic Research Fund Tier 2 under MOE’s official grant number [16] S. Shalev-Shwartz and S. Ben-David. Understanding machine learning: from theory
to algorithms. Cambridge University Press, 2014.
MOE2018-T2-1-103. [17] Twitch. Grand Challenge on Adaptation Algorithms for Near-Second Latency.
In ACM MMSys, 2020.
REFERENCES [18] G. Yi, D. Yang, A. Bentaleb, W. Li, Y. Li, K. Zheng, J. Liu, W. T. Ooi, and Y. Cui.
The ACM Multimedia 2019 Live Video Streaming Grand Challenge. In ACM
[1] DASH Reference Player. [Online] Available: https://reference.dashif.org/dash.js/.
Multimedia, 2019.
[2] ISO/IEC 23000-19:2020 Information technology – Multimedia application format
[19] X. Yin, A. Jindal, V. Sekar, and B. Sinopoli. A control-theoretic approach for
(MPEG-A) – Part 19: Common media application format (CMAF) for segmented
dynamic adaptive video streaming over HTTP. In ACM SIGCOMM, 2015.
media. [Online] Available: https://www.iso.org/standard/79106.html.

View publication stats

You might also like