You are on page 1of 6

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts

for publication in the IEEE ICC 2011 proceedings

QoE-based cross-layer optimization of wireless video with unperceivable temporal video quality uctuation
Srisakul Thakolsri, Wolfgang Kellerer
DOCOMO EURO-Labs, Ubiquitous Networking Research Group, Munich, Germany Email: thakolsri,

Eckehard Steinbach
Institute for Media Technology Technische Universit t M nchen (TUM), a u Munich, Germany Email:

AbstractThis paper proposes a novel approach for Quality of Experience (QoE) driven cross-layer optimization for wireless video transmission. We formulate the cross-layer optimization problem with a constraint on the temporal uctuation of the video quality. Our objective is to minimize the temporal change of the video quality as perceivable quality uctuations negatively affect the overall quality of experience. The proposed scheme jointly optimizes the application layer and the lower layers of a wireless protocol stack. It allocates network resources and performs rate adaptation such that the uctuations lie within the range of unperceivable changes. We determine corresponding perception thresholds via extensive subjective tests and evaluate the proposed scheme using an OPNET High Speed Downlink Packet Access (HSDPA) emulator. Our simulation results show that the proposed approach leads to a noticeable improvement of overall user satisfaction for the provided video delivery service when compared to state-of-the-art approaches.

I. I NTRODUCTION Network resource management and resource allocation across multiple users in wireless networks have become a priority for mobile network operators as a result of the continuously increasing demand for video applications. In order to make efcient use of network resources and to keep high user satisfaction while providing video delivery services, the network operator needs to understand the application-layer requirements and characteristics. Through signaling from a video streaming server and deep packet inspection in the core network, it is possible to extract the application-layer information such as Type of Service (ToS), application type, or even the details of the video application. The latter information may include the encoding scheme of the video and the sensitivity of the video content, which describes the video quality perceived by the user as a function of network parameters such as, for example, the data rate and the packet loss rate. For instance, a video that contains a dynamic scene (e.g. sport) requires a higher data rate and is more sensitive to packet loss [1]. The encoding scheme describes how each video packet depends on the other video packets belonging to the same video bitstream. One advantage of knowing such packet dependency is that the core network entities are able to prioritize important packets when network congestion occurs.

Optimizing the network resource allocation with a joint consideration of information from different layers (e.g. application model/application sensitivity, channel quality condition), also called Cross-Layer Design (CLD), has been extensively studied over the past years [2]. A good survey of CLD is presented in [3]. Most previous work focuses on throughput maximization [4]. The rst work based on utility maximization is proposed for elastic applications [5]. [6] applies the utility maximization framework for network resource allocation across different applications and uses the Mean Opinion Score (MOS), which has been originally proposed for voice applications [7], as a common metric for user perceived quality (Quality of Experience, QoE). The actual resource allocation depends on the objective function set by the network operator. For example, the optimization may target at the maximum average user quality [6], or may try to achieve a similar quality for all users regardless of the application type and the channel quality condition [8]. Alternatively, one can set a minimum guaranteed quality for all users, and then adapt the resource allocation to achieve the same quality that is equal or higher than the guaranteed quality [9], or so as to achieve maximum average quality [10]. To the best of our knowledge, none of the known crosslayer optimization approaches deals with the problem of how to avoid noticeable quality uctuations over time. Even if the user-perceived quality is good on average, drastic quality changes lead to a negative impression of the service quality. Hence, temporal quality uctuation is an important factor for wireless video transmission. In this paper, we apply the Just Noticeable Difference (JND) concept [11] to nd a threshold for the perception of temporal video quality changes, and incorporate it into QoEdriven resource allocation optimization. Figure 1 depicts our single cell scenario, in which each user accesses various video contents that are encoded at high quality and are stored at the Application Server (AS). In the core network, a QoE optimizer acts as a downlink resource allocator and as a controller for rate adaptation. Optimization is done based on lower layer information (e.g. average channel quality), the

978-1-61284-231-8/11/$26.00 2011 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2011 proceedings

objective function, and application utility functions, which are either stored in advance or sent along with the data stream. As an extension of conventional cross-layer optimization, our QoE-based objective function incorporates an additional constraint on the temporal video quality uctuation. Let us explain this extension along with the drawbacks of two typical objective functions (a) maximization of average user-perceived quality [6] and (b) maxmin fairness [8]. With the utility maximization, when the channel quality changes drastically over time, the user experiences perceivable changes of video quality and may be annoyed while watching the video. Specifically, when the channel quality condition is getting poor, the optimizer will not allocate any network resources to the user. But when the channel quality is very good, the optimizer will give a higher priority for network resource allocation to the user. Thus, the service quality perceived by the user strongly depends on the channel condition. In fact, early work in the area of Variable Bit Rate (VBR) versus Constant Bit Rate (CBR) encoding of video has shown that users prefer constant quality compared to temporally uctuating quality even if the average quality is lower [12]. In contrary, with the maxmin fairness objective function, all users perceive the same quality, which would make the temporal quality smooth. However, this leads to a minimum of system efciency in terms of network resource utilization, as most of the network resources are given to the user having a poor channel quality or to the user accessing a high-demand application. Considering the impact of temporal quality changes perceived by the user in the objective function of the network resource allocation problem hence has the potential to improve the overall user perceived quality. We determine the JND for video quality uctuation through subjective tests. In case the temporal quality changes exceed the JND, an average human is able to perceive the difference of video quality. In other words, the JND tells us whether the quality of experience is affected, if there is a certain change of video quality. We propose in this paper a combination of utility maximization and smooth change of temporal video quality in order to compromise between them and to give exibility to the network operator to prioritize the two different objectives. II. R ADIO LINK LAYER MODEL We use the long term link layer model originally proposed in [9], which can be described as: Rk = k Rmax,k , 0 k 1, k K (1)

Fig. 1.

Target use case and network conguration considered in this paper.

are performed in TTI time frames (2ms), the instantaneous data rate r is hence a function of Q. Assuming that only one user is scheduled at a TTI and the average channel quality Q is slowly changing, the maximum average data rate of each user can be estimated by observing the mean Channel Quality Indicator (CQI) over a period of time as follows: Rmax = g(Q) (2)

In each TTI, one Transport Block (TB) is sent over the air. The Transport Block Size (TBS), which gives a number of bits to be transmitted in each TB, depends on the CQI of the user. In this paper, we use a Look-Up Table denoted by the relationship g to get the TBS, which is standardized in 3GPP [13]. III. A PPLICATION LAYER MODEL We use the MOS metric to measure the level of user satisfaction for the video being watched by the user as proposed in [6]. The video utility U is described as a function of transmission data rate and packet loss rate. Nevertheless, due to the HSDPA MAC-layer retransmission mechanism, providing a more reliable transmission over the wireless interface, we assume that all packets are transmitted successfully, and therefore, the video utility function U can be simplied as a function of transmission data rate as given below: U = f (R), f : R M OS (3)

where K is the set of users, K = {1, 2, , K}. Rmax,k is the maximum average rate of each user when allocating all resources to user k, and k is the normalized resource share of user k. We adopt this model in our OPNET-based HSDPA implementation, where adaptive modulation and coding (AMC) and hybrid automatic repeat-request (HARQ) are used for link adaptation and fast packet scheduling/retransmission at the base station, respectively. Since the adaptation and packet (re)transmission are dependent on the channel quality Q and

where R is a set of application data rates and M OS = [1 : 4.5]. MOS 1 reects an unacceptable application quality and MOS 4.5 refers to an excellent quality experienced by the user. One can vary the video data rate via source encoding with different quantization steps, using an open-loop transcoding with new quantization steps, or a less brute-force approach such as packet/layer dropping. Because of its simplicity of implementation, we achieve rate changes in our experimental evaluation by performing a simple transcoding, which fully decodes and re-encodes the video at the desired rate. At each video data rate, we measure the video quality using the Video Structural SIMilarity (VSSIM) index [14]. As proposed in [16], it is recommended to use a nonlinear regression function to transform the set of objective video quality values to a set

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2011 proceedings

4.5 4 3.5 U(MOS) 3 2.5 2 1.5 1 0 100 200 300 Data rate (kbps) News MthrDotr Foreman Soccer Football 400 500

the base station sends periodically the link-layer information as discussed in Section II to the QoE optimizer module, which is located nearby or at the base station. A. Utility maximization With utility maximization, the optimizer searches for a resource allocation that maximizes the average utility of all K users. In a resource constrained system, priority of allocating network resouces is rst given to the user having good channel condition and accessing a low-demand application that gives high user perceived quality for a small amount of network resources. The objective function is given as: opt = arg max(

Fig. 2. Video utility functions for different video sequences obtained with open-loop transcoding.

1 K

Uk (k ))


of predicted MOS values. For simplicity, we apply a linear mapping between the VSSIM and the MOS with an upper and lower bound, which is an approximation of the nonlinear regression function, as follows: 1, if V SSIM < 0.7 M OS = 12.5 V SSIM 7.75, if 0.7 V SSIM 0.98 4.5, if V SSIM > 0.98 (4) Figure 2 depicts examples of video utility curves for ve different video sequences. In this example, we perform video transcoding from a reference video, for example News (350kbps) and Soccer (450kbps), to lower data rates. All videos have QCIF resolution and are encoded with H.264 AVC at 30 frames/sec. Figure 2 shows that Football is most sensitive to rate adaptation, and News is least sensitive. In other words, if the video sequence contains more dynamic video scenes, the user will see more impact on quality changes when performing rate adaptation. IV. Q O E- DRIVEN RESOURCE ALLOCATION OPTIMIZATION As discussed in Section I, the outcome of the resource allocation depends on the objective function and the constraints imposed. In the following subsections, we discuss how we formulate the objective function with the goal of utility maximization, and how to apply the JND concept to the QoEdriven optimization problem in order to achieve a smooth temporal video quality while keeping the network resource usage (system efciency) as high as possible. Note that the optimization is performed based on the assumptions that the video utility functions as shown in Figure 2 are precomputed at the streaming server and are signalled as side information along with the video bitstream, which will be then extracted in the core network for performing the QoE-driven optimization and necessary rate adaptation; and

subject to k=1 k = 1 where k is the fraction of network resources given to user k, and is all possible sets of vectors of network resources shared for each user. opt is the optimal resource allocation tuple that maximizes the objective function. To determine the optimal resource allocation, we use a greedy search algorithm (GR) rather than the full search due to its low computational complexity. Furthermore, it has been shown that a greedy search leads to close-to-optimal results [6]. B. Temporal video quality uctuation Similar to any other instances in which a human is able to perceive the change of a stimulus, for example the perception of the weight change of carried objects, which is only perceivable if the weight change exceeds a certain threshold, there is also a threshold for the temporal video quality change humans are able to recognize. Incorporating such a threshold into the objective function of network resource allocation improves the overall user-perceived quality of the whole period of accessing the service/application. Also, it gives more exibility for network resource allocation. For instance, within the range of unperceivable video quality change, some of the network resources allocated to the user accessing a low-demand video or to the user having a good channel condition may be given to the user accessing a high-demand video or to the user having a bad channel condition, while the user giving the resources to others is not aware of any quality change. 1) Subjective test: To nd the Just Noticeable Difference (JND) for the change of temporal video quality, we have performed a subjective test with 30 persons using the forcedchoice method specied in [15]. In this recognition testing, we present a stimulus (test video sequence) to the subject (user), and ask him/her Do you recognize any changes of video quality? Figure 3 depicts the GUI design that is used in our subjective test. As specied in the ITU standard, it is required that each subject should not perform user tests longer than 30 min., as human eyes will be tired and the results will not be reliable. Due to the time constraint, two video sequences: Mother and Daughter (static scene) and Foreman (dynamic scene) are used in our test. For each video, we create 16 test

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2011 proceedings

1 0.9 0.8 0.7 0.6 CDF 0.5 0.4 0.3 0.2 0.1 Mthr&Dotr Foreman

Fig. 3.

Screenshot of GUI used in the subjective test.

0 0


0.1 0.15 0.2 0.25 0.3 Magnitude of quality change (MOS)



TABLE I T EST CONDITIONS FOR VIDEO SEQUENCES IN USER TEST Video content Mother and Daughter Foreman Rate (kbps) [20;120] [65;200] (MOS) [-0.23;0.33] [-0.39;0.28] (sec) [2;4] [2;4]

Fig. 4.

CDF of magnitude of perceivable quality change

0, if k < th 1, if k th


video sequences that are encoded at two different levels of video quality. For instance, if the length of a video is 8 sec, the video is encoded at MOS 3.0 quality for the rst 4 seconds, and at MOS 4.0 quality for the last 4 seconds. In this case, we have a magnitude of quality change of MOS 1.0 quality, and the time of quality change at 4 second. All test video sequences are encoded with H.264/AVC at 30 frames/sec, and have QCIF resolution. Table I gives an overview of the test conditions for all test sequences. Note that we use the Variable Bit Rate (VBR) techniques such as encoding the video by xing a quantization step to maintain the video quality for a certain period. Also, we calculate the video quality in terms of MOS using the VSSIM index as discussed in Section III. Figure 4 shows the Cumulative Distribution Function (CDF) of the absolute value of from all persons participating in the subjective test. We see that the JND for Mother and Daughter and for Foreman is 0.21 MOS and 0.26 MOS respectively. This is the threshold at which 15 persons (50 percent of all subjects) are able to recognize the change of temporal video quality. In this paper, we apply an average JND threshold of th = 0.23 MOS to our QoE driven resource allocation optimization. 2) Temporal quality smoothness maximization: We enhance the objective function in Eq. (5) by taking into account the threshold of perceivable quality change th as: 1 K

t is the notion of time scale in seconds, and Uk (k (t 1)) is the average MOS for user k with the fraction of allocated resource k during the last 1 second in the past. The subtracted element in Eq. (6) is regarded as a penalty parameter, which negatively affects the overall perceived quality, if the temporal change of video quality exceeds the threshold th . is a weighting factor used for giving priority for the smoothness of temporal video quality. Using the enhanced objective function in Eq. (6), the optimizer allocates resources such that all active users (if possible) experience a smooth and unperceivable change of temporal video quality even in the presence of a drastic change of wireless channel condition, while maintaining the system efciency in terms of network resource utilization as high as possible. With the dened in Eq. (8), we use a linear function to penalize the conventional utility maximization. To increase the impact of the penalty term, an operator may use for example a quadratic function by dening as follows: 0, if k < th (9) = k + , if k th where and are constant parameters that are set prior to network resource allocation optimization. For example, we use the value 100 and -22 for and respectively. The optimal resource allocation with the new objective function is achieved again by using a greedy search algorithm. V. S IMULATION RESULTS We consider a resource-constrained single cell HSDPA scenario, in which six users access different video contents at high data rates as given in Figure 2. The parameters used in our simulations are given in Table II. The wireless channel model is based on the measured CQI trace representing different mobility schemes under different environments.

opt = arg max

Uk (k )

(k th )

subject to where

K k=1


k = 1

k = |Uk (k (t 1)) Uk (k (t))|


This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2011 proceedings


4 3.5 3 U(MOS) 2.5 2 1.5 1 0 (1) No adapt (2) MaxRate (3) MaxMOS (4) MaxMinMOS (5) MaxMOSMinFlucC1 (5) MaxMOSMinFlucC2 10 20 30 40 Time (sec) 50 60

Total transmit power Power allocated to HS-DSCH Carrier Frequency User speed Distance from Node B UE category Target BLER CQI averaging cycle RLC PDU size Scheduler Rvs Video codec used Loss concealment Video rate shaping Simulator

15.8W 11W 2GHz 3km/h 500m 1.8km 6 10% 1sec 40byte Proportional Fair {0, 30, ..., < 500}kbps H.264 Copy previous frame Transcoding OPNET 9.1 with NTT DoCoMo HSDPA plugin

Fig. 5.

Plot of mean utility of all users as a function of simulation time.

For performance evaluation of the proposed schemes, we implement ve schemes in our OPNET-based HSDPA simulator as follows: 1) No-adaptation: This is the default HSDPA mode. The system is left to run into overloaded situations (network congestion), as no application-layer rate adaptation is supported. If the data rate measured at the UE is lower than the data rate sent by the application server and the packet delay exceeds the playout buffer time (e.g. 2 seconds), then we assume that the user experiences a minimal quality level (MOS 1). This is because the user cannot enjoy watching the video continuously, as the video player will stop displaying the video due to late packet arrival. 2) Max-Rate: Based on the channel condition, adaptation is done so as to achieve maximum cell throughput regardless of the application type and the video content. In this case, the utility function is based on the average data rate Rk as dened below: Uk = Rk , k K (10)

4) MaxMin-MOS: Based on the utility function dened in Eq. (11), the max-min fairness allocates resources such that all users experience the same perceived quality regardless of channel condition and application sensitivity. 5) MaxMOS-MinFluc: In addition to utility maximization as done in MaxMOS, our proposed scheme in Eq. (6) performs the resource allocation such that the change of temporal video quality lies within the unperceivable threshold th . Based on the results of the subjective test, we assume that users are able to perceive a change of video quality with the average threshold of 0.23 on the MOS scale for all video contents. The two cases of described in Eq. (8) and (9) are denoted as Case 1 (C1) and Case 2 (C2) respectively. These two cases are used to see its effect when giving higher priority to the temporal quality uctuation objective. Note that schemes 2 to 5 are application aware, and a simple transcoding is used for rate adaptation in order to avoid network congestion. We perform optimization and rate adaptation every one second. Figure 5 shows the mean MOS over all users over the simulation period of 1 min. We see a signicant gain between all application aware schemes and the no-adaptation scheme. A further gain of 0.5 on the MOS scale is achieved when applying our proposed MOS-based optimization (scheme 3 to 5) when compared to the throughput maximization scheme. Among all MOS-based schemes, maxmin fairness is the worst. The MaxMOS and our proposed scheme perform approximately the same. However, in terms of smoothness of temporal video quality, the MaxSum-MinFluc scheme outperforms the other approaches as depicted in Figure 6, which shows the CDF of the number of users experiencing the changes of temporal video quality exceeding the threshold th . The CDF is plotted after 20 simulation runs. In the CDF gure, it should be noted that the no-adaptation scheme results in a

Resources will be given to a user with good channel condition to achieve the highest data rate for the video that a user is accessing. If there are resources left, they will be given to the user, who has the next best channel condition. 3) MaxMOS: Adaptation is done to maximize the mean user-perceived quality over all users. The utility function is a function of MOS as described as follows: Uk = M OSk (Rk ), k K (11)

Resources are rst given to a user having a good channel condition and accessing a low-demand application.

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE ICC 2011 proceedings

[1] W. Tu, J.

Chakareski and E. Steinbach, Rate-distortion optimized frame dropping for multi-user streaming and conversational video, Advances in Multimedia, Special Issue on Collaboration and Optimization for Multimedia Communications, vol. 2008, Article ID 628970, 2008. [2] R. A. Berry and E. M. Yeh, Cross-layer wireless resource allocation, IEEE Signal Processing Magazine, vol. 21, no. 5, pp. 59-68, Sep. 2004. [3] V. Srivastava, M. Motani, Cross-layer design: a survey and the road ahead, IEEE Communications Magazine, vol. 43, Issue 12, pp. 112-119, Dec. 2005. [4] V. Tsibonis, L. Georgiadis, and L. Tassiulas, Exploiting wireless channel state information for throughput maximization, IEEE INFOCOM, San Francisco, California, USA, Apr. 2003. [5] F. P. Kelly, Charging and rate control for elastic trafc, European Transaction for Telecommunication, vol. 8, pp. 33-37, Jan. 1997. [6] S. Khan, S. Thakolsri, E. Steinbach, W. Kellerer. QoE-based Cross-layer Optimization for Wireless Multiuser Systems. In proc. 18th ITC Specialist Seminar on Quality of Experience, Karlskrona, Sweden, May 2008. [7] International Telecommunication Union, Method for subjective determination of transmission quality, ITU-T Recommendation P.800, Aug. 1996. [8] B. Radunovic and J.-Y. L. Boudec, A unied framework for maxmin and min-max fairness with applications, IEEE/ACM Trans. on Networking, vol. 15, no. 5, pp. 10731083, Oct. 2007. [9] A. Saul, Wireless resource allocation with perceived quality fairness, IEEE Annual Asilomar Conference on Signals, Systems, and Computers, PACIFIC GROVE, CA, USA, Nov. 2008. [10] S. Thakolsri, S. Khan, E. Steinbach, and W. Kellerer, QoEDriven Cross-Layer Optimization for High Speed Downlink Packet Access, Journal of Communications, Special Issue on Multimedia Communications, Networking and Applications, vol 4, no 9, pp. 669-680, Oct. 2009. [11] Michael T. Nietzel, Introduction to Clinical Psychology, 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1991. [12] A. Ortega, M. Khansari, Rate control for video coding over variable bit rate channels with applications to wireless transmission, IEEE ICIP, Washington, DC, USA, Oct. 1995. [13] Third Generation Partnership Project, Physical layer procedures (FDD), 3GPP, Sophia Antipolis, France, Technical Specication TS 25.214 V7.1.0, Jun. 2006. [14] Z. Wang, L. Lu, and A. C. Bovik, Video Quality Assessment Based on Structural Distortion Measurement, Signal Processing: Image Communication, vol. 19, no. 1, pp. 121-132, Feb. 2004. [15] International Telecommunication Union, Studies toward the unication of picture assessment methodology, ITU-R Report BT.1082-1, Jan. 1990. [16] VQEG, Final report from the video quality experts group on the validation of objective models of video quality assessment, Mar. 2000.


0.6 CDF (1) No adapt (2) MaxRate (3) MaxMOS (4) MaxMinMOS (5) MaxMOSMinFlucC1 (5) MaxMOSMinFlucC2 1 2 3 4 Number of users 5 6



0 0

Fig. 6. CDF of number of users experiencing quality changes exceeding the experimentally determined perception threshold.

lower quality uctuation than the MaxSum and the MaxMin schemes, as many users perceive minimal quality for almost the whole period of the simulation time. Whereas for the MaxMin scheme, if one user experiences a bad wireless channel condition at one moment due to his mobility or his location such as at the edge of cell coverage, all other users would experience a drop of their perceived quality so as to achieve similar quality for all users and thus causing quality change to all of them. Last, we see that applying the scheme 5 with Case 2 further improves the result, as fewer users are able to perceive a temporal video quality uctuation.

VI. C ONCLUSION We introduce a novel QoE-based objective function for cross-layer optimization of wireless video, which considers the change of temporal video quality and the corresponding human perception threshold into the overall user-perceived quality rating. The threshold is based on the Just Noticeable Difference (JND) concept and is derived by performing subjective tests. The proposed scheme is implemented in the resource allocation optimization across multiple users accessing different video contents and being present in the same wireless cell. The goal of the optimization is to achieve minimal perceivable change of temporal quality while at the same time maintaining the average perceived quality of all users as high as possible. Results show that our proposed scheme achieves a better userperceived quality compared to other schemes, as the users experience a smooth change of quality even in the presence of drastic changes of the wireless channel conditions. Future works would be a user test investigating the impact of the duration of a video on the average user-perceived quality that includes the variation of spatiotemporal content.