Professional Documents
Culture Documents
net/publication/341706804
When they go high, we go low: low-latency live streaming in dash.js with LoL
CITATIONS READS
10 203
5 authors, including:
Roger Zimmermann
National University of Singapore
413 PUBLICATIONS 5,193 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Abdelhak Bentaleb on 20 May 2021.
to replenish the buffer sooner so as to avoid a rebuffer. Heuristic- Table 1: Notations of the QoE model.
based schemes tend to be further categorized into buffer-based, rate-
Notation Meaning
based or hybrid methods, of which the latter ones are increasingly
adopted due to the advancements in client-side technologies that 𝑠 Segment
enable more complex and faster computing. An example is the 𝑅 Bitrate selected (Kbps)
popular reference client dash.js [1]. In LLL streaming, a heuristic- 𝐸 Rebuffering time (seconds)
based scheme requires many optimizations to fit the low-latency 𝐿 Live latency (seconds)
requirements and this is one of the primary goals for our paper. 𝑃 Playback speed (e.g., 1, 0.95, 1.05)
𝑆 Total number of segments
2.2 Learning-Based ABR Schemes 𝛼 Bitrate reward factor
𝛽 Rebuffering penalty factor
Continually selecting the right bitrate in LLL streaming is a
𝛾 Live latency penalty factor
complex task due to dynamic factors such as the network and
𝜎 Playback speed penalty factor
client conditions. Hence, learning-based schemes may be suitable
𝜇 Bitrate switch penalty factor
for LLL streaming. Rather than relying on fixed heuristics, learning-
based schemes try to learn from the system environment and create
policies based on past data, and thus, adapt to the system dynamics. We note that the playback speed is usually: (𝑖) normal: 1×, (𝑖𝑖)
Different machine learning techniques [16] can be applied to fast: e.g., 1.5×, or (𝑖𝑖𝑖) slow: e.g., 0.95×. Thus, if the playback speed
improve LLL streaming. Reinforcement learning (RL) is one of the is normal then the playback speed penalty will be zero. Otherwise,
popular techniques relying on a reward mechanism [7], [6], [8] the penalty should be set to the minimum encoding bitrate level.
and [13]. RL is used to create an adaptive learning algorithm. We fixed the factors of each metric as: 𝛼 = segment duration; 𝛽 =
Other supervised and unsupervised learning techniques have been maximum encoding bitrate; 𝛾 = { if 𝐿 ≤ 1.1 seconds then = 0.05 ×
proposed as well. In this paper, we use Self Organizing Maps minimum encoding bitrate, otherwise = 0.05 × maximum encoding
(SOM) [11], which were designed for unsupervised classification bitrate }; 𝜎 = minimum encoding bitrate; and 𝜇 = 1 second.
problems. The selection of the right bitrate in LLL streaming can
be modelled as a classification problem, as described in detail later. 3.2 ABR Selection Module
This module is invoked at each segment download boundary and its
2.3 Throughput Measurement goal is to select the best bitrate for the next segment that maximizes
There exist a number ABR schemes [4] at various levels of the streaming session’s QoE for the duration of the optimization
sophistication that work well for video-on-demand (VoD) services, horizon. Below, we detail both our heuristic and learning-based
where the content is encoded/packaged in advance and the ABR algorithms.
playback buffer can accommodate several tens of seconds of content. 3.2.1 Heuristic-Based Algorithm. For this method, we build upon
However, these schemes do not work well in LLL streaming, the contributions of FastMPC [19] and adapt its approach to fit our
where the client fetches the live-edge segment that is still being LLL streaming scenario. FastMPC is an ideal starting point due to
encoded/packaged at the time of the request. Upon a request, the its extensible and practical nature, which allows us to update the
HTTP server bursts the available chunks and the client receives heuristic factors and it is also lightweight enough to run smoothly
them at the network speed, whereas the remaining chunks (yet in browser-based clients. We first present an overview of how our
to be produced) are sent to the client as they become available, algorithm works and then discuss our improvements over FastMPC.
resulting in idle periods between these chunks. These idle periods The general idea of this algorithm is to calculate the potential
are not easy to detect and they confuse the client when computing QoE across the available options and then select the one with the
the throughput it can achieve. Refer to [5] for details. highest value. The advantage is that it is sensitive to the QoE model,
and hence, can ensure a high QoE value for the session. Some
3 SOLUTION DETAILS additional improvements include using a look-ahead window of 𝑁
We now describe the details of our solution starting with our future segments (we use 𝑁 = 5) when considering our options, as
selected QoE model and followed by the ABR selection, playback well as a harmonic mean of the past 𝑀 throughput measurements
control and throughput measurement modules. (we use 𝑀 = 5) to approximate the future throughput [9].
There are three main steps in our heuristic-based algorithm,
3.1 QoE Model Module which are repeated at each segment download decision epoch:
Our QoE model is inspired by the Yin et al. [19] and Yi et al. [18] (1) Permute Options: Based on the look-ahead window size 𝑁 , we
models. At each segment download, the model considers five permute all possible options of bitrate selection. For example,
essential metrics: bitrate selected, bitrate switches, rebuffering time, with 𝑁 = 5 and three bitrate representations {200, 600, 1000}
live latency and playback speed. The QoE model is expressed as: Kbps, the options include [200, 200, 200, 200, 200], [200, 200,
200, 200, 600] and so on (until [1000, 1000, 1000, 1000, 1000]).
𝑆
Õ 𝑆−1
Õ (2) Calculate the Potential QoE: Based on the QoE model, we
QoE = 𝛼𝑅𝑠 − 𝛽𝐸𝑠 − 𝛾𝐿𝑠 − 𝜎 |1 − 𝑃𝑠 | − 𝜇|𝑅𝑠+1 − 𝑅𝑠 |, (1) calculate the estimated QoE value for each option. Note that
𝑠=1 𝑠=1
some QoE factors are easy to calculate (e.g., bitrate factor)
where the list of notations is given in Table 1. while others are more complex, such as the rebuffering factor,
When They Go High, We Go Low: Low-Latency Live Streaming in dash.js with LoL MMSys’20, June 8–11, 2020, Istanbul, Turkey
which requires estimation of the future download time. Hence, computation and transfer time (leading to startup delay) as well as
accurate throughput measurement becomes critical. memory space (which is an expensive resource for clients).
(3) Evaluate Options: Based on the QoE estimations, we select the
3.2.2 Learning-Based Algorithm. The Self Organizing Maps (SOM)
option with the highest QoE as the next bitrate.
proposed by Kohonen [11] is one of the widely used techniques for
FastMPC, in its original form, does not perform well for the LLL unsupervised classification problems. This technique has even been
streaming scenario as it makes certain assumptions and design applied to some NP-hard problems like the travelling salesman [10].
choices that we seek to improve upon, which are elaborated below. ABR decisions for LLL streaming can also be thought of as a similar
First, we adapt the QoE objective to consider more important classification problem given that we do not know the data points
factors in the LLL streaming scenario. Details of our QoE model (i.e., variables of the system environment) to select a suitable bitrate
have been covered in Section 3.1, and as a summary, here are the for the next segment to download. Based on these similarities, it is
key improvements in our QoE model: possible to initiate an SOM as shown in Figure 1. For each of the
encoded video streams, a corresponding SOM neuron is created
• Replacing startup delay with (live) latency: Startup delay is not as and each neuron is initialized with the bitrate of the corresponding
critical in live streaming as it naturally tends to be short. As the video stream. The algorithm will evaluate the current state and
name suggests, live streaming aims to stream as close to real time consider that as the data point that has to be classified. Each time
as possible and the target latency is in the range of few seconds. the algorithm is called, it will gather the current state and classify
• Introducing playback speed: Due to the strong preference for that point to one of the neurons to find the best matching unit
low latency, some clients may adopt adaptive playback strategies (BMU) in the SOM algorithm.
to further reduce latency. For example, the client could speed The features of the data points in our LLL streaming context
up the playback (when the buffer is full) to catch up with are a quadruplet set of measured throughput, live latency, buffer
the stream’s real-time edge, and hence, reduce the latency. occupancy and QoE. Initially, at the beginning of a live session,
However, extreme playback speed changes could also lead to the throughput feature is initialized with the minimum encoding
a poorer experience. Hence, we include the playback speed (more bitrate level. For more accurate ABR decisions, we normalize all the
specifically, deviation from playback speed = 1) as an additional data points to between zero and one. Every state (white circles in
penalty factor to balance it. Figure 1) comprises a quadruplet set of features that are calculated
after each segment download. In our SOM algorithm optimization,
Next, in terms of the implementation approach, we omit the
we also include the targets as target latency (≤ 1 second), target
lookup table in FastMPC’s offline enumeration step to prioritize
buffer occupancy (≤ 1 second) and target QoE (highest possible
accuracy and performance by computing the options online at each
value 1, after normalization in the range of 0–1). Then, we use a
segment download boundary instead. Note that the decision space
distance function 𝑑 () to find the best matching (2) that represents
for this approach is much smaller than the lookup table. For example,
the euclidean distance of the four features used in each neuron.
with a bitrate list of three representations and a look-ahead window
Distance function 𝑑 () is defined as follows:
size of five segments, the lookup table would have 243 thousand
v
entries (assuming 100 bins for throughput values and 10 bins for u
t 4
Õ
buffer values in line with [19]), while the decision space for our 𝑑 (𝑎, 𝑏) = 𝑤𝑖 × (𝑎[𝑖] − 𝑏 [𝑖]) 2 . (2)
approach is merely 35 = 243. Although this number increases 𝑖=1
significantly with a larger representation count and look-ahead
The weight matrix 𝑤 = [𝑤𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 , 𝑤𝑙𝑎𝑡𝑒𝑛𝑐𝑦 , 𝑤𝑏𝑢 𝑓 𝑓 𝑒𝑟 , 𝑤𝑞𝑜𝑒 ] for
window size, we believe that further improvements can be made
the four features is defined as follows:
to manage this computation load, should it become unmanageable. (
For example, one can consider certain search heuristics instead of 4, if neuron.bitrate > measured throughput
𝑤𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 =
permuting every option. 0.4, otherwise,
This approach promotes accuracy and performance for several
𝑤𝑙𝑎𝑡𝑒𝑛𝑐𝑦 = 0.4,
reasons. (𝑖) In live streaming mode, we do not know future segment
sizes, hence, it is prudent to approximate them dynamically instead 𝑤𝑏𝑢 𝑓 𝑓 𝑒𝑟 = 0.1,
(
of pre-computing them. We use a simple approximation as an 1, if neuron.QoE < minimum allowed QoE
assumption is made for Constant Bitrate (CBR) encoded videos, 𝑤𝑞𝑜𝑒 =
0.4, otherwise.
i.e., we take the segment size to be the bitrate level multiplied
by the segment duration. For other scenarios, where appropriate, Usually, after finding the BMU, the corresponding neuron and its
we could consider using other approximations such as Kaufman’s neighbours should be updated. In our context, the update function is
Adaptive Moving Average (KAMA) [14] or arithmetic average slightly different since we only have two known SOM neurons. The
of past few segment sizes. (𝑖𝑖) As we saw earlier, this algorithm first one is the current neuron (that represents the current selected
calculates potential QoE of N future segments at each segment bitrate level) and the second neuron is the next best one to transit
download boundary. With this approach, we could dynamically to. That is, we update the current SOM neuron with the newly
update the QoE model based on the ongoing changes in the video reported values of the features (i.e., the client reports its status at
and/or environment, such as potentially increasing the weight for every segment download), and then we update its neighbours as
video quality for video segments that are deemed more important. well using the same values. Next, the BMU neuron is calculated
(𝑖𝑖𝑖) Pre-computing the lookup table incurs high cost in terms of using (2) with target latency = 1, target buffer occupancy = 1
MMSys’20, June 8–11, 2020, Istanbul, Turkey May Lim, Mehmet N. Akcay, Abdelhak Bentaleb, Ali C. Begen and Roger Zimmermann
Feature Map
State
• Significance of the QoE model: In systems where the desired
QoE model is not clear or available, choosing the learning-based
Measured Throughput algorithm may be a better choice as the heuristic-based algorithm
Live Latency
Buffer Occupancy
is highly dependent on the QoE model.
QoE • Comparing the performance of both algorithms: At times,
the choice (and performance) of the algorithms may not be
Weight Matrix clear. Hence, if both algorithms could be evaluated extensively
based on the desired setup, then this would be the ideal way
Bitrate Levels of understanding their performances and determining which
algorithm to use. One such example is shown in Section 4.
Figure 1: An example of an SOM feature map for three • Possibility of combining both rules: Another approach is to use
different bitrate levels. both rules and switch amongst them dynamically depending on
which algorithm performs better under the given conditions. This
and target QoE = 1. The best matching neuron that achieves the possibility remains to be further explored in the future.
target values wins. After finding the winner neuron, it becomes
the current one and will be updated with the reported values of 3.3 Playback Control Module
the features, and so on. The process is repeated for each segment The main goal of this module is to determine a desired playback
download until the live streaming session ends. rendering speed that could further reduce the latency while taking
a calculated risk of rebuffering. We adopt a heuristic-based logic to
Algorithm 1 SOM Bitrate Selection (Pseudocode) determine the desired playback speed in this module. Specifically,
UPDATE(current Neuron, Measured Throughput, Live Latency, Buffer Occupancy, the module considers both the target and current latencies, and the
QoE) current buffer level. If the latency has increased beyond the target
Minimum Distance ← 0
Minimum Allowed QoE ← 50 // Heuristically determined; non-normalized and the buffer level is sufficiently large, then the playback is sped
BMU ← ∅ up to reduce the buffer and bring down the latency. Conversely, if
Target Latency ← 1 the latency is very low but there is a risk of stalling because of a
Target Buffer Occupancy ← 1
Target QoE ← 1 critically low buffer, playback is slowed to reduce data drain. Note
for all Neuron ∈ SOM Neurons do that LoL can be configured for several parameters such as the range
𝑑 = GETDistance(Measured Throughput, Target Latency, of the catch-up playback speed allowable (e.g., 0.5 − 1.5×), target
Target Buffer Occupancy, Target QoE)
if 𝑑 < Minimum Distance then latency and min/max buffer levels.
Minimum Distance ← 𝑑
BMU ← Neuron
end if
3.4 Throughput Measurement Module
end for As explained by Bentaleb et al. in [5], the basic equation of (segment
UPDATE(BMU, Measured Throughput, Target Latency, Target Buffer Occupancy,
Target QoE) size / segment download time) always produces a throughput value
equal to (or slightly smaller than) the segment encoding bitrate
(because of inter-chunk idle periods), which prevents the client
Algorithm 1 shows a pseudocode of our bitrate selection from switching to higher bitrate levels. To address this issue, [5]
technique. It performs two main updates. The first one updates the developed a solution, called ACTE, which shows good performance,
SOM neurons from the fresh status reported by the client, and the but sometimes suffers from throughput overestimation when the
second update is to move the BMU neuron to the optimum solution. duration of the inter-chunk idle periods increases. Here, we address
So, there is a balance between those two updates where one of them this matter and propose a throughput measurement algorithm
moves out of the optimum space and the other is trying to move to that works for both short and long inter-chunk idle periods. Our
the optimum point. At each iteration the algorithm selects the one algorithm consists of two stages:
closer to the optimal point and updates all neighbours accordingly.
(1) Chunk Boundary Identification: The algorithm identifies the
The neighbourhood function used in the algorithm is the Gaussian
start and end time of each chunk download by capturing the
distribution function (3). The learning rate is fixed to 0.01 as in the
‘moof’ box (or atom) in fragmented MP4 data. Our algorithm
original paper [11].
leverages the fact that the chunks are transmitted as separate
−𝑑 (𝑎,𝑏) 2 HTTP chunks through CTE. Thanks to the Fetch API, our
ℎ(𝑎, 𝑏) = 𝑒 2𝜎 2 (3)
algorithm uses Streaming Response Body1 , which allows
3.2.3 Choice of Algorithm. Both heuristic and learning-based tracking the progress of the chunk downloads and parsing the
algorithms have their strengths and limitations. Choosing which chunk payloads in real time. When a ‘moof’ box of a chunk
algorithm to use depends on several factors: is captured, the algorithm stores the time as the start time of
• Predictability of media or environment factors: In systems where the chunk download using new Date(). Then, it stores the end
these factors, e.g., segment sizes and throughput measurements, time of the chunk and its size (in bytes) when the chunk is fully
are difficult to predict, choosing the learning-based algorithm downloaded.
may be a better choice as the heuristic-based algorithm would 1 https://developer.mozilla.org/en-
be sensitive to prediction errors. US/docs/Web/API/Streams_API/Using_readable_streams
When They Go High, We Go Low: Low-Latency Live Streaming in dash.js with LoL MMSys’20, June 8–11, 2020, Istanbul, Turkey
Table 2: QoE values for each algorithm/network profile. Table 3: QoE metrics for each algorithm/network profile.
Alg. QoE𝑡𝑜𝑡𝑎𝑙 QoE𝑏𝑖𝑡𝑟𝑎𝑡𝑒 QoE𝑟𝑒𝑏𝑢 𝑓 QoE𝑙𝑎𝑡𝑒𝑛𝑐𝑦 QoE𝑠𝑤𝑖𝑡𝑐ℎ QoE𝑝𝑙𝑎𝑦𝑏𝑎𝑐𝑘 Alg. Avg. Bitrate Total Rebuffer Avg. Latency # of
PROFILE_CASCADE (Kbps) (s) (s) Switches
L 28,446.59 101,142.86 -32,948.45 -25,400.38 -3,942.86 -10,404.59 PROFILE_CASCADE (150s)
H 18,878.55 100,040.00 -31,871.59 -27,869.65 -11,200.00 -10,220.20 L 672.53 33.23 1.25 10.86
PROFILE_INTRA_CASCADE H 661.96 32.12 1.33 28.80
L -53,595.28 70,040.00 -52,679.92 -51,703.80 -1,600.00 -17,651.56 PROFILE_INTRA_CASCADE (135s)
H -74,035.15 72,080.00 -51,786.94 -69,810.60 -6,720.00 -17,797.60 L 521.58 53.24 2.04 5.00
PROFILE_SPIKE H 534.76 52.39 2.75 16.80
L -4,139.75 19,114.29 -9,714.00 -9,098.45 -1,200.00 -3,241.59 PROFILE_SPIKE (30s)
H -7,952.59 18,920.00 -9,733.26 -12,525.02 -1,520.00 -3,094.30 L 646.97 10.45 1.77 4.00
PROFILE_SLOW_JITTERS H 667.17 10.47 2.42 4.40
L -17,425.02 17,942.86 -14,737.62 -12,119.00 -4,057.14 -4,454.11 PROFILE_SLOW_JITTERS (30s)
H -25,797.99 21,000.00 -16,332.64 -16,743.03 -8,640.00 -5,082.32 L 642.82 15.44 2.13 10.71
PROFILE_FAST_JITTERS H 780.50 17.34 2.97 19.80
L 11,337.56 11,600.00 0.00 -231.85 0.00 -30.59 PROFILE_FAST_JITTERS (11.6s)
H 11,241.86 11,500.00 0.00 -230.09 0.00 -28.05 L 1,000.00 0.00 1.00 1.00
H 1,000.00 0.00 1.00 1.00
4
4 4
3 3
3
2 2 2
1 1
1
0
0 25 50 75 100 125 150 0 25 50 75 100 125 0 5 10 15 20 25 30
Time (s) Time (s) Time (s)
4 6
5
3
4
2 3
2
1
1
0 5 10 15 20 25 30 0 2 4 6 8 10
Time (s) Time (s)
Figure 2: Throughput measurements given by dash.js (dotted red line) and our implementation (dashed green line) compared
against the true values (solid blue line) for each network profile. Note that measured values above 5 Mbps (7 Mbps for
PROFILE_FAST_JITTERS) have been omitted in the graphs for better visualisation.
such scenarios, the idle periods between chunks of a segment [3] Low-on-Latency (LoL). [Online] Available: https://github.com/NUStreaming/
increase and when these periods are included in the calculation acm-mmsys-2020-grand-challenge.
[4] A. Bentaleb, B. Taani, A. C. Begen, C. Timmerer, and R. Zimmermann. A
of the segment download time, it leads to significant throughput survey on bitrate adaptation schemes for streaming media over HTTP. IEEE
measurement errors. While earlier versions of dash.js (i.e., v2.9.x) Communications Surveys Tutorials, 21(1):562–585, 2019.
[5] A. Bentaleb, C. Timmerer, A. C. Begen, and R. Zimmermann. Bandwidth
did not perform any chunk filtering and thus often underestimated prediction in low-latency chunked streaming. In ACM NOSSDAV, 2019.
the measured throughput, the more recent versions (i.e., v3.x) [6] M. Gadaleta, F. Chiariotti, M. Rossi, and A. Zanella. D-DASH: A deep q-learning
include chunk filtering, but they now seem to err in the opposite framework for DASH video streaming. IEEE Trans. Cognitive Communications
and Networking, 3(4):703–718, 2017.
direction and sometimes overestimate the measured throughput. [7] R. Hong, Q. Shen, L. Zhang, and J. Wang. Continuous bitrate & latency control
with deep reinforcement learning for live video streaming. In ACM Multimedia,
5 CONCLUSIONS 2019.
[8] T. Huang, R.-X. Zhang, C. Zhou, and L. Sun. QARC: Video quality aware rate
We presented the LoL solution that performs well in LLL streaming. control for real-time video streaming based on deep reinforcement learning. In
The LoL solution enhances several important modules in streaming ACM Multimedia, 2018.
[9] J. Jiang, V. Sekar, and H. Zhang. Improving fairness, efficiency, and stability in
systems, namely the QoE model, ABR algorithms (both heuristic HTTP-based adaptive video streaming with FESTIVE. In 8th Int. Conf. Emerging
and learning-based), playback control and throughput measurement Networking Experiments and Technologies, 2012.
[10] H. Jin, K. Leung, and M. L. Wong. An Integrated Self-organizing Map for the
modules. Our evaluations show that the learning-based algorithm Traveling Salesman Problem. Advances in Neural Networks and Appl., 2001.
currently performs better than the heuristic-based one in this [11] T. Kohonen, M. R. Schroeder, and T. S. Huang. Self-Organizing Maps. Springer-
particular setup and that our throughput measurement module Verlag, Berlin, Heidelberg, 3rd edition, 2001.
[12] W. L. Ultra-Low-Latency Streaming Using Chunked-Encoded and Chunked-
performs better than the default implementation provided by Transferred CMAF. Akamai White paper. Online; accessed 10 January 2019.
dash.js. In the future, we seek to continue to explore the use [13] H. Mao, R. Netravali, and M. Alizadeh. Neural adaptive video streaming with
of heuristic versus learning-based algorithms in LLL streaming. Pensieve. In ACM SIGCOMM, 2017.
[14] H. Peng, Y. Zhang, Y. Yang, and J. Yan. A hybrid control scheme for adaptive live
streaming. In ACM Multimedia, 2019.
Acknowledgments [15] Periscope Code. Introducing LHLS Media Streaming. [Online] Available:
https://medium.com/@periscopecode/introducing-lhls-media-streaming-
This research is supported by Singapore Ministry of Education eb6212948bef. Online; accessed 5 April 2020.
Academic Research Fund Tier 2 under MOE’s official grant number [16] S. Shalev-Shwartz and S. Ben-David. Understanding machine learning: from theory
to algorithms. Cambridge University Press, 2014.
MOE2018-T2-1-103. [17] Twitch. Grand Challenge on Adaptation Algorithms for Near-Second Latency.
In ACM MMSys, 2020.
REFERENCES [18] G. Yi, D. Yang, A. Bentaleb, W. Li, Y. Li, K. Zheng, J. Liu, W. T. Ooi, and Y. Cui.
The ACM Multimedia 2019 Live Video Streaming Grand Challenge. In ACM
[1] DASH Reference Player. [Online] Available: https://reference.dashif.org/dash.js/.
Multimedia, 2019.
[2] ISO/IEC 23000-19:2020 Information technology – Multimedia application format
[19] X. Yin, A. Jindal, V. Sekar, and B. Sinopoli. A control-theoretic approach for
(MPEG-A) – Part 19: Common media application format (CMAF) for segmented
dynamic adaptive video streaming over HTTP. In ACM SIGCOMM, 2015.
media. [Online] Available: https://www.iso.org/standard/79106.html.