You are on page 1of 10

Rate-Distortion Optimized Video

Peer-to-Peer Multicast Streaming



Invited Paper
Eric Setton Jeonghun Noh Bernd Girod
esetton@stanford.edu jhnoh@stanford.edu bgirod@stanford.edu
Information Systems Laboratory, Department of Electrical Engineering
Stanford University, Stanford, CA94305-9510, USA

ABSTRACT 1. INTRODUCTION
We study peer-to-peer multicast streaming, where a source As IP multicast is not universally supported, distribution
distributes real-time video to a large population of hosts of media streams in the public Internet to a large audience
by making use of their forwarding capacity rather than re- (“multicasting”) is typically realized by a large number of
lying on dedicated media servers. Hosts which may dis- unicast connections. If the maximum number of streams of
connect at any time, therefore a robust control protocol is an individual media server (typically between a few hundred
needed to maintain connectivity among peers. This work and a few thousand) is exceeded, additional server capacity
presents a new peer-to-peer multicast protocol and analyzes must be provided by a suitable content-delivery infrastruc-
the gains that video coding and prioritized packet scheduling ture, e.g., in the form of a network of replication servers.
at the application layer can bring to the overall streaming Peer-to-peer (P2P) multicasting is an elegant alternative
performance. A rate-distortion model which predicts end- in which each end-host may act as a potential server for other
to-end video quality in throughput limited environments is clients. This avoids dedicated replication servers altogether.
presented and used to determine the over-provisioning nec- The approach is self-scaling, as the number of peer “servers”
essary to avoid self-inflicted congestion. The video stream and peer clients increases at the same rate, hence it avoids
transmitted by the source contains H.264 SP and SI frames, the bottleneck of a central server (or dedicated replication
which are used to adaptively stop error propagation due to server). The approach, in principle, would allow a highly
packet loss. Distortion-optimized retransmission requests dynamic support of changing multicast demand at very low
are issued by receiving hosts to recover the most impor- cost. The major challenge, however, is the complete lack
tant missing packets while limiting the induced congestion. of performance guarantees in the P2P network. Peer nodes
Experiments for several hundred hosts simulated in NS-2 might be turned off or disconnected at any time without
illustrate the benefits of our system. We achieve typical prior notice, while other nodes join or re-join. Such a highly
end-to-end delays of 1 sec, and a stable video quality with unreliable network fabric poses a major difficulty for media
less than 2.5% of frames lost to playout interruptions. streaming.
A recent study based on statistics collected over the Inter-
Categories and Subject Descriptors net reveals that although the uplink throughput of peers is a
limiting factor, there is enough bandwidth for P2P stream-
C.2 [Computer-communication networks]: Distributed ing, even on a large scale [21]. The stability of the system
systems. which is necessary to provide a satisfactory user experience
largely depends on the design of the protocol. To gain ro-
General Terms bustness and possibly aggregate data rate, path diversity
Design, performance. should be attained by distributing streams across a suffi-
ciently large number of complementary multicast trees. In-
Keywords dividual nodes are important nodes near the top in some
multicast tree, but near the bottom and less important in
Peer-to-peer, video streaming, multicast. others, thus avoiding single points of failure in the node de-
∗This work was supported, in part, by a gift from Hewlett- pendency graph.
Packard Laboratories, Palo Alto, CA. Although the control protocol is essential to provide effi-
cient means of building and maintaining multiple multicast
trees, it needs to be combined with efficient multimedia cod-
ing and streaming solutions at the application layer. State-
Permission to make digital or hard copies of all or part of this work for of-the art compression which achieves better rate-distortion
personal or classroom use is granted without fee provided that copies are performance alleviates the bandwidth requirements, while
not made or distributed for profit or commercial advantage and that copies error-resilient streaming techniques may improve the received
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
media quality.
permission and/or a fee. In this paper, we present a P2P multicast protocol and
P2PMMS’05, November 11, 2005, Singapore. show the benefits of video coding, adaptive streaming and
Copyright 2005 ACM 1-59593-248-8/05/0011 ...$5.00.

39
optimized packet scheduling for such a system. The purpose Live media multicast over P2P networks proposes to shift
of this work is to make the following contributions: the burden of the media delivery from a dedicated infras-
tructure to the users. As P2P networks do not require any
• the design and a analysis of a new distributed P2P special servers or routers, the cost of such solutions is ap-
multicast protocol, targeted for low-latency streaming, pealing. However, they only offer an attractive alternative
if their robustness can overcome the dynamic behavior of
• a video distortion model analyzing the streaming per-
the peers used for forwarding. Some functioning solutions
formance in a throughput-limited P2P network,
based on P2P multicast are already available. Coolstream-
• an adaptive video streaming technique, suitable for a ing which offers Chinese and American cable channels re-
P2P network, to mitigate error-propagation through ports an ever-growing user base [1, 23]. Other implemen-
the use of H.264 SP and SI picture types, tations exist such as PPLive [5], which multicasts mostly
sports channels, and ESM [3] which has been used to broad-
• a distortion-optimized retransmission scheduler which cast conferences in the scientific community. Although these
maximizes video quality while limiting the impact on implementations are very exciting advances, they all suffer
congestion. from long startup delays usually on the order of 3-5 minutes,
instability, and their streaming quality is not yet a threat
Although the gains reported in this paper for streaming to standard definition TV. This motivates our work which
with SP and SI frames and for the retransmission scheduler places an emphasis on the role of congestion in P2P video
are shown for our implementation of a P2P multicast control streaming and strives for stability and low startup latencies
protocol, we believe the results are more general and could of a few seconds, comparable to cable or satellite television
be applied to most implementations of video P2P streaming. systems.
In the next section, we describe the control protocol which In most P2P multicast protocol proposed so far peers
builds and maintains multiple multicast trees to broadcast build and maintain one or several multicast trees along which
a video stream from a source to a set of peers. The per- video content is distributed [8, 10, 17, 22]. The construc-
formance of the control protocol is assessed through exper- tion of the trees usually proceeds in a distributed fashion,
iments carried out over a simulated network in NS-2 [6]. which allows the protocol to scale without overwhelming the
In Sec. 4, we make use of a video distortion model, previ- source. The control protocol we propose is similar to these
ously proposed in [20], to predict the received video qual- previous systems. Our protocol establishes multiple multi-
ity when a video stream is sent from multiple throughput- cast trees as proposed in [11, 17]. In these systems, multiple
limited senders to a receiver at different rates. The model trees are used to mitigate the impact of peer disconnection.
is used to determine the amount of over-provisioning of net- Through the use of multiple description coding or of for-
work resources required to limit the congestion created be- ward error correction (FEC) they also maintain acceptable
tween peers. The video stream multicast over the P2P net- video quality even when a node is not connected to all the
work contains the new picture types “switching-P” (SP) trees. In our system, we also make use of optimized encod-
and “switching-I” (SI), introduced in the latest video cod- ing and application layer solutions. The overall performance
ing standard H.264 [12]. These switching picture types can of P2P video multicast systems can be greatly improved
be sent adaptively to stop error propagation in the case of by using efficient single-description coding techniques and
transmission errors. In Sec. 5, we characterize the bit rate achieving the required robustness through over-provisioning
savings and performance improvement achieved by using SP and distortion-optimized scheduling of transmissions and re-
and SI frames compared to traditional video transmission transmissions.
based on I and P frames. In the last part of the paper, we
focus on retransmission scheduling from a receiving host to 3. P2P MULTICAST PROTOCOL
its parents. Different from network-level multicast, the in-
corporation of application level retransmission requests into The control protocol enables a source to distribute a video
P2P multicast is possible without feedback implosion, since stream to a population of peers via P2P multicast. The
the fan-out of each individual node is small. We describe video source peer and other peers are connected via multi-
in Sec.6, how to schedule retransmissions, from the receiver, ple trees which are constructed dynamically by the protocol.
to maximize its decoded video quality while limiting the in- The source is the root of all trees and the trees are built and
curred network congestion. maintained mostly independently. The branches of each tree
connect a host to its descendants. These links are virtual
tunnels which hide the underlying physical network topol-
2. RELATED WORK ogy. The video stream is distributed evenly over the different
We address the problem of reducing the cost of large-scale trees. Hence, peers need to join each of the multiple trees
live media delivery and focus on IP networks such as the to decode and play out the video successfully.
Internet. Commercial solutions deployed today mostly rely Our simulations are based on a moderate size network of
on server overlays which act as Content Delivery Networks a few hundred nodes, which resembles a large private in-
(CDN). When a user wishes to access multimedia content, tranet or a campus network. An example is shown in Fig. 1.
he is re-directed towards one of the servers of the CDN. We make the following simplifying assumptions: the control
This server is usually located closer to the user and has and transmission protocol is implemented over the UDP/IP
enough available bandwidth to support the media streaming protocol stack and we ignore any Network Address Trans-
session. Such networks have been deployed, e.g., by Akamai lator or firewall issue which may limit connectivity; peers
or Cisco Systems. The design of these large scale systems is are synchronized and have heterogeneous but fixed upload
a problem of itself which has been studied, notably in [13] bandwidth which they have measured and know accurately.
and [7]. Although these problems need to be addressed for a real In-

40
and reduce multiple tree failures due to a common parent,
different parents are chosen as often as possible. Once the
selection of candidates for each tree is done, attachment re-
quests are sent out and each tree will operate mostly inde-
pendently.

3.1.3 Periodic message exchange


Once a peer is connected, it will inform its parents of its
presence by transmitting periodic “hello” messages. These
messages are also used to propagate topology information
such as the subtree size of a peer. Reception of a “hello”
message generates an immediate response intended to con-
firm the parent’s presence.

3.2 Node disconnection


Ungraceful leaves occur when a peer leaves the group with-
Figure 1: Example of network topology used for sim- out notice which may cause disconnection of the peer’s de-
ulation. The setup used in the experiments is an scendants from the group. When a host leaves, it stops for-
extended version of this example. warding video packets and is unresponsive to probing. Once
a peer detects a parent disconnection it will try to reconnect
ternet implementation (see e.g. [10]), they are not directly to the tree by choosing one of its other parents. If this fast
relevant to the scope of this paper. recovery mechanism fails, the peer has to contact the source
Our control protocol is completely distributed, except for to get a new list of candidate parents. While the host re-
an approximate list of connected peers maintained by the connects, retransmission requests are issued over the other
source. Besides accepting new peers, the protocol maintains multicast trees to recover missing video packets. We will
the multicast trees when peers leave or are disconnected explain this retransmission procedure in detail in Sec. 6.
during the session. A peer may leave ungracefully due to A list of extra links to other potential parents can also be
network shutdown or system failure. To stay connected, maintained by each peer. This list is initialized during the
peers need to monitor the state of their ancestors and may joining process. After a host joins the different multicast
decide to reconnect if they detect traffic interruptions. trees, it keeps the list of remaining available hosts to con-
struct a pool of extra links. If the number of hosts in the
3.1 Joining pool falls below a certain threshold the list is updated using
Each joining host discovers the address of the source of the a gossip algorithm. Extra links are not reserved resources
video stream by a directory service such as a website and but do however indicate which hosts are available for po-
contacts this peer to obtain a list of hosts randomly chosen tential reconnection. The performance of this other rejoin
among connected members. The joining peer contacts all procedure is not presented in this paper but shows that the
the members of the list and waits for replies. It then de- procedure should be lightweight in order to avoid creating
termines a candidate parent for each tree. If the latter has additional control overhead.
enough available bandwidth at the time of the request, it We observed that the time threshold necessary to detect
accepts the host and starts forwarding video packets. This parent disconnections may be large (approximately 1-2 sec-
process is called “6-way handshaking” and consists of one onds), as it is important to avoid false detections, and since
message exchange with the source and 2 message exchanges losses will be compensated by retransmissions requested dur-
with the candidate parent. ing reconnection. Detecting the disconnection of a child is
less influential to the overall performance as it only results in
3.1.1 Partial list of connected peers a temporary waste of the parent’s network resources. Since
To reduce control overhead, the source adjusts the size of the penalty of a false child leave detection is high, a longer
the list sent to joining peers according to the current group time interval is used. When a child leave is detected, parents
size and the number of multicast trees. At the beginning of will remove it from their forwarding table.
the session, the group size is small and the list size increases
linearly. When the group size reaches a certain point, the 3.3 Loop avoidance
list size increases logarithmically. For multiple trees, the Once a peer looses its connection to one of the trees, it
source sends a larger list to allow a joining peer to exploit will try to rejoin this tree and will send out new attachment
path diversity. requests. During this process it may try to attach to one of
its descendants. This will create a loop which will eventu-
3.1.2 Parent selection algorithm ally starve the whole subtree. To prevent this kind of event,
Parents which have enough throughput to support an ad- each peer keeps the list of its ancestors, which are peers in
ditional peer are selected based on their proximity to the the path from the peer to the source. Attachment requests
source. Several other criteria can also be considered in this issued by an ancestor are rejected. Note that this happens
process. For example, the amount of available throughput, independently over each tree and that the hierarchy between
the round-trip time (RTT) or geographical proximity. Simi- peers can and should be different over different trees. This
larly to the results reported in [21], we found that the prox- loop avoidance mechanism does not require global knowl-
imity to the source gave good performance as will be ex- edge of the tree but only of a small subset of its nodes, hence
plained in more detail below. To make use of path diversity additional memory and processing power are negligible.

41
3.4 Simulation setup by sending UDP packets over the different multicast trees
To evaluate the performance of the protocol we carry out in round-robin fashion.
experiments over a network simulated in NS-2 [6]. Figure 1
illustrates the star-shape topology of the network support-
3.5 Protocol evaluation
ing the hosts. Simulations are run over a similar topology We begin by analyzing the performance of the protocol
with 1000 nodes, 750 of which are placed at the extremity connection and re-connection process, then compare the re-
of the network. The actual number of peers participating in sults obtained for control overhead and video quality over
each simulation is 300, placed evenly among the edge-nodes. different numbers of multicast trees.
The backbone links are sufficiently provisioned so that con- 1 1
gestion only occurs on the links connecting the peers to the
network. The latency of each link is 5 ms, and the diameter 0.9 0.9

Cumulative distribution function


Cumulative distribution function
of the network is 10 hops. Losses are only due to conges- 0.8 0.8
tion and overflowing queues, and transmission errors due to
the presence of ISP boundaries or potential wireless last-hop 0.7 0.7
links are ignored. 0.6 0.6
The bandwidth of the peers reflects today’s available ADSL
network access technology. The bandwidth distribution is 0.5 0.5
given in Tab. 1. The degree indicates the number of chil- 0.4 0.4
dren a parent transmitting a video stream encoded at 220
kbps can potentially support. This calculation includes a 0.3 0.3
bit-rate reserve of 33% to account for instantaneous rate 0.2 0.2
fluctuations, retransmissions and control overhead. We re-
fer to this bit-rate reserve as ”overprovisioning” throughout 0.1 minimum hop 0.1 minimum hop
this paper. The overprovisioning factor will be justified in minimum rtt minimum rtt
0 0
more detail in Sec. 4. The distribution is similar to the find- 0 2 4 0 2 4
ings reported in [21]. Note that more than half of the peers Join time (s) Rejoin time (s)
do not have enough bandwidth to forward the video stream.
This makes them free-riders in a system with only 1 distri- Figure 2: Distribution of the time needed to join
bution tree to forward data. When more trees are used, the the multicast trees (left) and to rejoin a tree after a
video stream is divided into several smaller sub-streams and parent disconnection (right).
these peers may forward data to one or several peers.
Figure 2 shows the cumulative distribution function (cdf)
Table 1: Distribution of peer bandwidth. The de- of the join time, i.e. the time necessary for a node to join
gree is computed according to a video stream en- a multicast tree; the cdf of the rejoin time is also plotted.
coded at 220 kbps with an over-provisioning factor This is the time a node takes to rejoin a tree once the discon-
of 33%. nection of its parent has been detected. Results are shown
for two different parent selection algorithms, the first one
is based on the proximity to the source and denoted by
Downlink Uplink Degree Percentage
“minimum hop”, the second is based on RTT. The join-
512 kbps 256 kbps 0 56% ing procedure takes an average of 0.63 second, parent leaves
3 Mbps 384 kbps 1 21% are detected in less than 2 seconds and it takes an addi-
1.5 Mbps 896 kbps 3 9% tional 0.70 second to rejoin, on average. In terms of latency,
20 Mbps 2 Mbps 6 3% both algorithms show little difference. However, this does
20 Mbps 5 Mbps 17 11% not translate to equivalent overall performance. The “mini-
mum hop” parent selection leads to more stable trees which
In the experiments, the dynamic behavior of peers is mod- translates into higher video quality over time. In our ex-
elled as follows. A flash crowd is simulated by letting all the periments, 80% more rejoin events were observed when the
peers request the video during the first minute of the video minimum RTT method was chosen. This justifies the use of
session. During the remaining 14 minutes, peers join and “minimum hop” as our parent selection algorithm.
leave the session ungracefully, following a random Poisson In Fig. 3, the control traffic overhead and the average
process. Peers remain “on” for 4.5 minutes and “off” for 30 video quality is shown when video is multicast over differ-
seconds, on average. ent number of trees. The percentage of overhead obviously
The 10s video sequence Mother and Daughter is transmit- depends on the encoding rate of the video. The video qual-
ted from the source to the peers. It is looped enough times ity is measured by collecting the mean square error (MSE)
to cover the whole session. The video stream is encoded between the luminance of the decoded video signal and the
with H.264 [12] at a constant quality and the encoding rate original video signal. The MSE is translated into Peak Sig-
is approximately 220 kbps. We use the freely available ver- nal to Noise Ratio (PSNR), and represented in dB. This
sion of the codec JM 9.2 maintained by the Video Coding metric will be used throughout the paper. A difference in
Expert Group (VCEG)[4]. The encoding structure will be PSNR exceeding 1 dB is considered visually significant. In
described in detail in Sec.5. Each video frame is packetized Fig. 3, PSNR is averaged both over time and over the dif-
into UDP packets. Frames exceeding the maximum trans- ferent peers.
mission unit size are fragmented before packetization. When As illustrated, as long as 4 multiple trees or less are main-
the simulation begins, the source peer multicasts the video tained, the overhead stays between 2 and 4.5% of the total

42
8% 40 Ddec , is given by:
7% Ddec = Denc + Dloss , (1)
39
Denc = D0 + θ/(R − R0 ), (2)
Percentage of control overhead

6%
Dloss = κ(Pr + (1 − Pr )e−(C−R)T /L ), (3)
38
5% n
C = Ci , (4)

PSNR (dB)
4% 37 i=1
n

3% R = Ri (5)
36 i=1
2% In (2), R is the total rate of the video stream, and the pa-
Average quality 35 rameters D0 , θ and R0 are estimated from empirical rate-
1%
Control overhead
distortion curves via regression techniques.
The second distortion term, Dloss , depends linearly on the
0% 34
1 2 3 4 5 6 7 8 packet loss rate. The scaling factor κ indicates the sensitiv-
Number of trees ity of the stream to losses. It depends on the encoding struc-
ture and on the amount of motion present in the sequence.
Figure 3: Percentage of control protocol overhead The other factor reflects the combined rate of random losses
and average video quality for different numbers of and late arrivals. Pr is the random packet loss rate and T is
multicast trees. the time within which each packet should reach the receiver
(typically a few hundred milliseconds). C is the aggregate
available throughput of the network paths over which the
video is transmitted and L is the average packet size.
traffic observed on the downlink of the peers. When more In (4), the throughput, C, is expressed as the sum of the
trees are used, the control traffic increases, this is due a available throughput between the receiver and its parents
higher frequency of parent leave generating more traffic ex- on the n different multicast trees. Likewise, in (5), Ri is the
change between the peers. When only 1 tree is maintained, fraction of the video stream transmitted over the ith multi-
the control traffic is higher than expected. In this case, a cast tree. Typically, the same throughput will be reserved
parent leave cannot be accommodated by one of the other for a receiver on each tree and the video will be divided
remaining parents, as it happens when more trees are used. equally among the trees: Ri = R/n and Ci = C/n.
This creates additional control traffic. This model reflects the impact of the rate on video distor-
When video is distributed over a larger number of multiple tion. At lower rates, reconstructed video quality is limited
trees, the effect of an ungraceful leave is less important as by coarse quantization, whereas at high rates, more packets
children have several parents which they can use for rejoin- are delayed beyond their playout deadline due to network
ing. On the other hand, maintaining more trees increases congestion. For live video steaming in a bandwidth-limited
the probability of a parent disconnection and requires more environment, we therefore expect to achieve maximum de-
control traffic. In Fig. 3, this tradeoff is shown in terms of coded quality for some intermediate rate. As an illustration,
the average video quality, as a function of the number of we collect the decoded video quality of a sequences trans-
multiple multicast trees maintained by the protocol. In this mitted to a peer using different numbers of multicast trees.
environment, the optimal tradeoff between robustness and The parents of the peer each have a total available uplink
congestion is obtained when 2 multiple trees are used, and throughput of 660 kbps. A fourth of this capacity (165 kbps)
the performance observed for 3 and 4 trees is close to the is reserved by each parent to the peer to carry the video traf-
optimal. Depending on the rate of the video, on whether or fic, the rest being used to serve other peers which share the
not retransmissions are requested and on the dynamic peer same parent. Figure 4 illustrates the decoded video quality
behavior, this optimal could occur for a different number of for the sequence Mother and Daughter encoded at different
trees. However, we believe a small number of distribution rates. The fitted model is shown along with experimen-
trees is enough to guarantee good performance. In most tal points. In this experiment the playout deadline is 500
of the experiments presented in the following video will be ms. The decrease in quality due to congestion is illustrated
transmitted over 4 multiple trees. by the bell shape of the curve representing decoded video
quality. In this experiment the decrease occurs when the
rate of the video reaches approximately 85% of the available
throughput.
For P2P streaming, it is essential to limit the amount
4. VIDEO DISTORTION MODEL of self-inflicted congestion created by the media streams.
For live video streaming applications, video packets are Indeed, as there might be a large number of intermediate
transmitted over the network and need to meet a playout end-hosts separating a peer from the source, any increase in
deadline. Decoded video quality at the receiver is there- network congestion, may be reflected multiplicatively in the
fore affected by two factors: distortion introduced by the total end-to-end delay. Combined with physical link latency
encoder, denoted by Denc , and distortion due to packet loss this delay may cause some packets to miss their playout
or late arrivals, denoted by Dloss . Assuming an additive deadline, resulting in decoding errors and a decrease in de-
relation of these two independent factors, a video distortion coded quality. Hence, for fine tuning, congestion may be
model was derived in [20]. The decoded video distortion, reduced by the following methods: increasing the amount of

43
43

42

41

40
PSNR (dB)

39

38
Figure 5: SI frames share the instant refresh prop-
37 1 tree erties of I frames but are only sent after a frame is
2 trees lost.
36 3 trees

35
150 200 250 300 350 400 450 500
Rate kbps

Figure 4: Streaming performance for the sequence


Mother and Daughter transmitted to a receiver over
a different number of multicast trees. The experi- Figure 6: GOP structures used for streaming with
mental results are shown together with the model, SP and SI frames and for periodic I frame insertion.
represented by a solid line. The parameters of the
model are D0 = 0.49, θ = 1222, R0 = 10.39kbps, κ = 185,
diction chain, as depicted in Fig. 5. In such a scheme, a
T = 0.5s and L = 1.5kbit.
pair of SP and SI pictures are periodically generated dur-
ing the encoding of a sequence. For streaming, the sender
over-provisioning by reserving a larger capacity Ci on each may choose which picture of the pair to transmit. Because
of the multicast trees between a parent and a child; de- of their larger size, SI frames should, ideally, only be sent
creasing the encoding rate of the video R. When video is when their instant refresh properties are needed. Therefore,
pre-encoded, as is often the case, it is not always possible SI frames are transmitted instead of SP frames only when
to reduce the streaming rate. Thus, it may be necessary to a decoding error at the receiver is detected, to stop error
employ more over-provisioning, this in turn may decrease propagation. This differs from traditional video streaming
the number of hosts supported by the P2P system. The where large I frames are transmitted periodically whether
model can be used to determine the right amount of over- decoding errors occur or not.
provisioning. For example, given the rate of the video, the
number of trees, and the playout deadline, we can deter-
5.2 Compression efficiency
mine the throughput necessary on each tree to keep Dloss The video encoding structure shown at the top of Fig. 6
under a given threshold. For video encoding rates of 100 was chosen for the streaming experiments presented in this
kbps or more, and playout deadlines of no less than 300 ms, paper. The number of frames in a Groups of Pictures (GOP)
Ci = 1.33 ∗ Ri is enough to avoid any noticeable congestion. is 16, with one SP frame (and its corresponding SI frame)
per GOP and 3 B frames between P frames. This ensures
good error resilience properties and allows to easily scale
5. ERROR-RESILIENT STREAMING down the frame rate by 2 or even 4 if needed. Links to
To avoid interruptions in a streaming application, decod- video sequences compressed with this coding structure at
ing errors due to packet loss or to late arrivals are concealed different rates, as well as rate-distortion preambles charac-
by freezing the previous frame. Because of the predictive terizing the size and quality of the frames are made publicly
nature of compressed video, this decoding error propagates available [2]1 . We also evaluated the performance of tradi-
to subsequent frames until an independently encoded frame tional streaming with periodic I frames. The structure of
(e.g. an I frame) is received. In this section, we describe the GOP used in this case is similar and is illustrated at the
how error propagation can be stopped adaptively between bottom of Fig. 6. As the size of SP frames is less than that
peers by using SP and SI pictures. of I frames, for limited loss rates, the bit rate savings ob-
tained by not transmitting unnecessary I frames may reach
5.1 SP and SI frames up to 25%. This is illustrated in Fig. 7, which shows the
SP and SI pictures types are part of the latest video coding compression efficiency of GOPs containing SP or SI frames
standard H.264 [12]. These new types of predictively/intra compared to GOPs with periodic I frames. When loss rates
coded pictures were initially proposed in 2001 by Karczewicz are high, the rate-distortion performance decreases and may
and Kurceren, as a solution for error resilience, bitstream be worse than streaming with I frames, as the size of SI
switching and random access [15, 16]. The main advantage frames exceeds that of I frames.
of SP and SI pictures is that they can be used interchange- 1
The preambles also indicate the distortion values obtained
ably in a video stream without creating any decoding er- by concealing a frame with any other frame of the stream,
ror or drift. This leads to interesting applications, such as allowing to simulate realistically video streaming without
stopping error propagation adaptively by refreseshing a pre- the overhead of encoding and decoding.

44
44 5.4 Experimental results
We analyze the benefits of SP and SI frames for video
43 streaming with low latency over the network described in
Sec. 3. The video stream is encoded at approximately 220
42 kbps and the maximum tolerated delay between the time
a frame is available at the source and the time it is played
41 by the peers is 2 seconds. As a comparison, results are also
PSNR (dB)

collected for the same sequence of events for a video stream


40
with periodic I frames, also encoded at the same rate. For
the bandwidth distribution indicated in Tab. 1, the dynamic
behavior of the hosts and this tight delay constraint, the
39
two video streams represent the highest supportable video
quality for the two different encoding structures considered.
38
1
periodic I frames periodic I frames
37 SP/SI frames
periodic SP frames 0.9
periodic SI frames
36 0.8
0 100 200 300 400 500 600 700

Cumulative distribution function


0.7
Rate (Kbps)
0.6
Figure 7: Rate-distortion performance with periodic
0.5
I frame, SP frame or SI frame insertion, for Mother
and Daughter. 0.4

0.3

5.3 Extension to the P2P case 0.2


In Sec. 5.1, we explained how a sender possessing a pair of 0.1
corresponding SP and SI pictures could achieve better rate-
distortion performance by choosing which one to transmit 0
38.5 39 39.5 40 40.5 41
adaptively. We propose to extend this technique to P2P PSNR (dB)
streaming where a sender has not necessarily received both
the SP and the SI pictures from its ancestors. Figure 8: Video quality performance for the 300
The complete details of the encoding/decoding process of peers for different encoding structures.
SP and SI frames are beyond the scope of this paper. They
can be found in [16]. One interesting aspect of the decoding Figure 8 shows the cdf of the peers’ decoded video quality.
of SP frames is that it allows the corresponding SI picture The average quality for the SP and SI frames is 40.3 dB, 0.8
to be regenerated with very limited additional complexity2 . dB higher than for streaming with periodic I frames. The
As a consequence, in the P2P streaming scenario, each peer distributions show that 99% of the hosts benefit from this
which receives and decodes an SP frame correctly may also novel encoding structure. Furthermore, for more than 25%
create the corresponding SI frame. Therefore, it can subse- of the hosts the performance gain exceeds 1 dB. This gain is
quently use the resulting SP and SI picture pair adaptively due to the better rate-distortion-congestion performance of
to either reduce the bit-rate or stop the error propagation the video stream containing SP/SI frames. As the number of
its descendant might be experiencing. This technique al- node disconnections is limited, most of the time, SP frames
lows the adaptive streaming to take place not only between are transmitted in lieu of SI frames. This reduces the overall
the source and its direct descendants but also further down bit-rate and allows for a higher sustainable quality. More-
in the tree, as long as SP frames are received and decoded over, instantaneous bit-rate fluctuation are reduced when
correctly by peers. SP frames are used instead of I frames. This results in less
As larger video frames usually span several transport layer congestion and better end-to-end delay performance. More
packets, different parts of SP and/or SI frames will usually detailed experimental results describing similar effects were
be transmitted by different parents, when different multi- reported in our prior work [19].
cast trees are maintained. As all the packets of one of the
pictures are needed to continue decoding, it is necessary for
these different peers to coordinate and choose to transmit
6. OPTIMIZED RETRANSMISSIONS
the same picture to their child. This is done by signalling The dynamic behavior of peers is one of the most challeng-
from the child to the parents the periods when error propa- ing aspects of peer-to-peer live streaming. In particular, a
gation occurs. During these periods SI frames will be trans- peer must continuously monitor the state of its parent(s)
mitted. and immediately regain connectivity via another peer when
a leave is detected. As analyzed in Sec. 3, this process can
2
This requires the use of an additional entropy encoder. take a few seconds, a time during which none of the pack-
Note, however, that the reverse is not true: the SP frame ets transmitted over the affected tree will reach the peer.
cannot be recovered from the SI frame without a complete The corresponding loss statistics are obviously time-variant
video encoder. as there might be some long periods during which a host

45
is fully connected and experiences no loss, and other times frame is replaced with the nearest correctly decoded frame,
when a large portion of packets are missing. Hence, FEC or and given the encoding structure of the video, we can find
multiple description coding which seek to protect a user for which frames will be shown as a function of time. Let
a specific loss level may not be appropriate as they will cre- D(s, c(s)) denote the approximate distortion resulting from
ate superfluous redundancy most of the time and might be showing frame c(s) instead of frame s. We assume the ta-
overwhelmed when losses occur. However, as reconnections ble D, pre-computed offline for a generic video sequence, is
are not instantaneous, in order to maintain high decoded available at each peer. Each display outcome is associated
quality, missing portions of the video need to be recovered with the appropriate pre-computed distortion value and the
while the peer is rejoining. In this section, we describe how resulting approximate video quality is computed over several
a peer can mitigate the video quality degradation during the frames:
rejoining process by requesting retransmissions to its other
parents. 
D= D(s, c(s)) (6)
Retransmission requests will place an additional burden s
on the uplink of peers already forwarding a portion of the
video to 1 or possibly many children. This increase in con- It is then straightforward to determine, the sensitivity of
gestion is more important when there are few multicast trees the average distortion to a single frame, and this sensitivity
as a larger portion of the video will be requested from fewer is extended to each of the packets composing the frame.
parents. Regardless of the number of multicast trees, op- Retransmission requests are sent out following the order of
timal quality will be achieved when enough packets are re- importance.
quested to maintain high video quality while avoiding a too The difficulty resides in determining which packets will or
severe increase in terms of end-to-end delay. Our goal is will not be received before their playout deadline. Ideally,
to determine an optimal retransmission schedule for miss- probabilities for these events should be computed, based
ing packets of a video stream. This schedule would indicate on delay distributions, and the resulting expected distor-
which packets of the stream will be requested to maximize tion could be computed, as described in [14], by combin-
the decoded video quality at the receiver while limiting the ing these different probabilities. In our scheduler, we use a
congestion created on the network. much simpler techniques and only distinguish lost packets
and received packets. We consider all the parents of a peer
6.1 Congestion-distortion optimized scheduler will forward successfully the packets that are transmitted
on their respective multicast tree. The packets sent on the
In [18], we presented and analyzed the performance of a tree(s) of the disconnected parent(s) are considered missing
sender-driven congestion-distortion-optimized scheduler (Co- except if the peer has requested them from another parent
DiO) which determines how to minimize video distortion for within a short time interval (we arbitrarily chose 200ms in
a given level of congestion. The benefits of using congestion our experiments). This classification reduces the number of
and distortion as metrics rather than rate and distortion, possible decoding states to one, thus we know which frames
as for example in [9], are two-fold. End-to-end delay (i.e. will be displayed over the horizon considered.
congestion) is inherently adaptive to time-varying network
conditions. In addition, it reflects better the impact of a 6.3 Limiting congestion
user operating on a bandwidth-limited network. To find In the scenario considered, the available transmission rate
an optimized schedule, CoDiO selects the most important between a sending peer and a receiving peer depends on sev-
packets in terms of video distortion reduction, and requests eral factors, such as the uplink throughput of the sending
them in an order which minimizes the congestion created peer, the number of other hosts served by this peer and the
on the network. For example, I or SI frames are requested rate of the video stream. Given these parameters, the avail-
in priority whereas B frames might not be retransmitted at able throughput Ci may be computed and the average end-
all. In addition, CoDiO avoids requesting packets in large to-end delay over a certain time horizon can be estimated
bursts as this has the worse effect on the queuing delay. for a given retransmission schedule. However, precise esti-
In the following, we describe how to extend CoDiO to the mation would require modelling the delay distribution of the
P2P receiver-driven scenario. We present a low-complexity path between sender and receiver.
CoDiO scheduler which performs retransmission scheduling. As a low-complexity alternative, we limit the number of
We show how to select the most important packets for re- unacknowledged retransmission requests from a peer to each
transmission, how to limit congestion and analyze the per- of its parents. As an unacknowledged retransmission request
formance of the CoDiO retransmission scheduler. represents a packet being transmitted or processed between
the two peers, these packets contribute to end-to-end delay
6.2 Computing the importance of packets hence to congestion. The tradeoff between the rate of the
After detecting a parent disconnection, a peer can deter- retransmissions and the amount of congestion created on the
mine a list of missing packets and iteratively select the most bottleneck links can be set by determining the optimal num-
important ones to request. This choice should depend on the ber of unacknowledged packets tolerated between a sender
time at which packets are due, and on the contribution of and a receiver This optimization is carried out experimen-
each packet to the overall video quality. CoDiO proceeds by tally in the following.
discarding packets already past their due time and by com-
puting an approximate sensitivity of the video distortion to 6.4 Experimental results
the remaining packets.
Given a set of received frames an approximate video dis- 6.4.1 Influence on video quality
tortion may be computed in the following manner. Assum- We first analyze the influence of retransmissions on the
ing copy error concealment is used, where an undecodable video quality. Specifically, we would like to know to what ex-

46
40 100%

Connected hosts
PSNR(dB)
38 90%

36 80%

34 70%
18 19 20 21 22 23 24 25 26
Time (second)
40
100%

Connected hosts
PSNR(dB)

38

90%
36

34 80%
18 19 20 21 22 23 24 25 26
Time (second)

Figure 9: Video quality (solid line) and percentage of fully connected peers (dashed line) as a function of
time for 299 peers when a host close to the source leaves. Results are shown in the absence of retransmission
(top) and when the maximum number of unacknowledged retransmissions on each tree is 4.

tent retransmissions mitigate the quality degradation which


occurs when the parent of a peer leaves. To maximize the Table 2: Average decoded video quality for different
effect of the disconnection, in the network scenario specified numbers of retransmissions.
in Sec. 3, we let the 300 peers join and disable disconnec-
tions. When a steady state is reached, we disconnect a host
close to the source in one of the trees. In this experiment, 4 Maximum unacknowledged
multicast trees are used to transmit the video and the max- retransmissions per tree PSNR
imum number of unacknowledged retransmissions requests
on each tree is 2. The performance is shown in Fig. 9, as a 0 37.0 dB
function of time, in terms of the video quality. The average 1 37.5 dB
video quality is taken over all 299 peers. Performance is also 2 37.6 dB
shown when no retransmissions are allowed. The host dis- 3 37.3 dB
connection occurs at time 18 s, and takes about 1.5 s to be 8 36.3 dB
detected. This can be deduced from the fact that connectiv-
ity is reported directly by the peers. As illustrated in Fig. 9,
about 30% of the hosts are affected by the disconnection. sions for this scenario. As illustrated, two simultaneous re-
At time 21.5 s, all the affected peers have recovered. Simi- transmissions requests per tree allow to gain approximately
lar behavior is observed with and without retransmissions. 0.6 dB. Although this gain is modest, we stress that the
However, the video quality during the rejoin time is very impact of visual quality discussed in the previous section
different in both cases. When the CoDiO retransmission might be much larger. The small performance gap is due
scheduler is used, the video quality remains almost constant to the fact that during the 15 minutes event, the number of
over time, a large majority of the missing frames are recov- disconnections affecting each peer might not be very large.
ered. In the other case, quality drops by approximately 2-3 When more than two simultaneous retransmissions are al-
dB during the reconnection time. lowed, the performance degrades slightly for 3 retransmis-
sions and more severely for 8 retransmissions. This indicates
6.4.2 Influence on congestion congestion is disrupting the video transmissions. Please note
that the optimal number of simultaneous retransmissions in-
We analyze the impact of retransmissions on congestion creases with the number of multicast trees.
and show how to determine the optimal tradeoff between
congestion and the number of retransmission requests.
Tab. (2) indicates the average decoded video quality for 7. CONCLUSIONS
300 peers operating following the scenario described in Sec. 3. Live P2P multicast streaming is constrained by the dy-
Video is transmitted over four multicast trees. The numbers namic behavior of hosts and by limited uplink through-
reported in the figure show the gain on the total average put. Dynamic control protocols are needed to support het-
quality which can be achieved through the use of retransmis- erogeneous peers and react rapidly to node disconnections.

47
We present a new multicast control protocol which builds Advanced Video Coding for Generic Audiovisual
and maintains multiple trees to transmit live video to a services, ITU-T Recommendation H.264 - ISO/IEC
large population of peers, and demonstrate how video encod- 14496-10(AVC), 2003.
ing, streaming and scheduling techniques developed recently [13] J. Jannotti, D. Gifford, K. Johnson, M. Kaashoek,
further enhance the performance of the system. A rate- and J. O. Jr. Overcast: Reliable multicasting with an
distortion model is proposed to analyze the tradeoff between overlay network. USENIX Symposium on Operation
self-inflicted congestion and video quality and is used to de- Systems Design and Implementation, San Diego, USA,
termine the amount of over-provisioning necessary when low Oct. 2000.
latency is required. H.264 SP and SI frames are incorporated [14] M. Kalman, P. Ramanathan, and B. Girod.
into the video stream to provide adaptive error-resiliency ca- Rate-distortion optimized streaming with multiple
pability and achieve bit-rate saving gains of up to 25% com- deadlines. Proc. International Conference on Image
pared to traditional video streams. Last, retransmissions Processing, Barcelona, Spain, Sept. 2003.
of missing packets are requested in a congestion-distortion- [15] M. Karczewicz and R. Kurceren. A Proposal for
optimized fashion which selects the most important packets SP-Frames. Video Coding Experts Group Meeting, ,
in terms of video quality while limiting the effect on end- Doc. VCEG-L-27, Eibsee, Germany, Jan. 2001.
to-end delay. For simulations with up to 300 peers, achieve [16] M. Karczewicz and R. Kurceren. The SP- and
typical end-to-end delays of 1 sec, and a stable video quality SI-frames design for H.264/AVC. IEEE Trans. CSVT,
with less than 2.5% of frames lost to playout interruptions. 13(7):637–644, July 2003.
[17] V. N. Padmanabhan, H. J. Wang, and P. A. Chou.
8. REFERENCES Resilient peer-to-peer streaming. IEEE International
[1] Coolstreaming. http://www.coolstreaming.org, seen on Conference on Network Protocols, Atlanta, USA, Nov.
Aug. 28 2005. 2003.
[2] Encoded sequences with SP/SI frames. [18] E. Setton and B. Girod. Congestion-Distortion
http://ivms.stanford.edu/˜ esetton/sequences.htm seen Optimized Scheduling of Video. Multimedia Signal
on Aug. 28 2005. Processing Workshop (MMSP), Siena, Italy, pages
[3] ESM. http://esm.cs.cmu.edu/, seen on Aug. 28 2005. 99–102, Oct. 2004.
[4] H.264/AVC Reference Software. [19] E. Setton and B. Girod. Video streaming with SP and
http://iphome.hhi.de/suehring/tml/download/, seen SI frames. Proc. Visual Communications and Image
on Aug. 28 2005. Processing, Beijing, China, July 2005.
[5] PPLive. http://www.PPLive.com, seen on Aug. 28 [20] E. Setton, X. Zhu, and B. Girod. Minimizing
2005. distortion for multipath video streaming over ad hoc
[6] The Network Simulator - ns-2. networks. International Conference on Image
www.isi.edu/nsnam/ns/, seen on Aug. 28 2005. Processing, Singapore, pages 1751–1754, Oct. 2004.
[7] J. Apostolopoulos, T. Wong, W. Tan, and S. Wee. On [21] K. Sripanidkulchai, A. Ganjam, B. Maggs, and
multiple description streaming with content delivery H. Zhang. The feasibility of supporting largescale live
networks. Proceedings Infocom, New York, USA, streaming applications with dynamic application
3:1736–1745, June 2002. endpoints. Proceedings SIGCOMM’04, Portland, USA,
[8] S. Banerjee, B. Bhattacharjee, and C. Kommareddy. Aug. 2004.
Scalable application layer multicast. Proceedings ACM [22] D. Tran, K. Hua, and T. Do. Zigzag: An efficient
Sigcomm, Pittsburgh, USA, pages 205–217, Aug. 2002. peer-to-peer scheme for media streaming. Proceedings
[9] P. Chou and Z. Miao. Rate-distortion optimized Infocom, San Francisco, USA, 2:1283–1292, Mar.
streaming of packetized media. Microsoft Research 2003.
Technical Report MSR-TR-2001-35, Feb. 2001. [23] X. Zhang, J. Liu, B. Li, and T.-S. P. Yum.
[10] Y. Chu, A. Ganjam, T. Ng, S. Rao, Donet/coolstreaming: A data-driven overlay network
K. Sripanidkulchai, J. Zhan, and H. Zhang. Early for live media streaming. Proceedings IEEE Infocom,
experience with an internet broadcast system based on Miami, USA, Feb. 2005.
overlay multicast. Proceedings of USENIX’04, page
155170, June 2004.
[11] Y. Chu, S. Gao, and H. Zhang. A case for end system
multicast. Proceedings ACM Sigmetrics, Santa Clara,
USA, June 2000.
[12] ITU-T and ISO/IEC JTC 1.

48

You might also like