You are on page 1of 5

GAME-THEORETIC RATE-DISTORTION-COMPLEXITY OPTIMIZATION FOR HEVC Anna Ukhanova Simone Milani Sren Forchhammer

DTU Fotonik, Technical University of Denmark, Kgs. Lyngby, Denmark Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy
ABSTRACT This paper presents an algorithm for rate-distortioncomplexity optimization for the emerging High Efciency Video Coding (HEVC) standard, whose high computational requirements urge the need for low-complexity optimization algorithms. Optimization approaches need to specify different complexity proles in order to tailor the computational load to the different hardware and power-supply resources of devices. In this work, we focus on optimizing the quantization parameter and partition depth in HEVC via a game-theoretic approach. The proposed rate control strategy alone provides 0.2 dB improvement compared to the approach implemented in HEVC reference software, while rate-distortion-complexity optimization allows very accurate complexity control providing at the same time rate-distortion performance close to the optimal one. Index Terms Rate-distortion optimization, complexity, rate control, HEVC, game theory 1. INTRODUCTION As the evolution of video coding technologies continues, bit rate channel capacities and maximum achievable computational complexity grow in line with Moores law. However, the quality expectations, resolutions and demand for bandwidth are also rising together with it. One of the main trends to maximize the quality and minimize the bit rate, which was clearly noticable in video compression development during the past decades - still remains central. The development of the High Efciency Video Coding (HEVC) standard continues the strategy of improving the rate-distortion performance by adding new coding modes. At the same time, the complexity and power consumption of video compression algorithms become two important issues for many mobile systems and handheld devices. Experimental studies show that the contribution of the encoder to the overall power consumption for video communication systems is high and it usually exceeds the contribution of transmission power [1]. One of the goals of compression is to optimize the quality of the reconstructed video data. The limited bandwidth of the communication channels constrains the available bit rate. An optimal compression solution under constrained bit rate can be found by rate-distortion optimization. In mobile video communications, power and computational complexity are two constrained resources, therefore it is desirable to perform rate-distortion-complexity (R-D-C) analysis and optimization [2]. Some approaches for it have been designed for the H.264/AVC standard [3, 4, 5]. As its successor HEVC has even higher computational complexity, R-D-C optimization is desirable for it as well. HEVC standard introduces a Coding Unit (CU) - an analog of a macroblock in H.264/AVC - which denes the smallest block size that can be used for encoding. The complexity of HEVC highly depends on the partition depth in the CU tree. One approach for complexity control of HEVC based on dynamic adjustment of the partition depth is described in [6]. In this work CU partition depth (or coding tree depth) is selectively constrained in order to satisfy the complexity limits. The idea is based on the assumption that co-located areas in adjacent frames are more likely to have similar behavior and consequently similar values of maximum depth; therefore it is possible to restrict depth values in a chosen number of frames based on the maximum depth values in previous frames (in corresponding areas). This method allows achieving good precision in fullling target complexity. We propose to use Game Theory (GT) for R-D-C optimization and specically apply it to the choice of the quantization parameter and control of the depth parameter. The remainder of the paper is organized as follows. Section 2 discusses the problem of optimization for video coding. Section 3 presents GT that lies behind our approach. Section 4 explains rate control principles. Section 5 presents the proposed R-D-C optimization algorithm. Performance results are shown in Section 6 and Section 7 concludes the paper. 2. VIDEO CODING AND ITS OPTIMIZATION Current video coders consist of a collection of coding tools characterized by different compression efciencies and complexity requirements. Some of these tools are controlled by a set of parameters that permits modulating the compression rate according to constraints posed by external conditions (e.g., the available transmission or storage capacity, channel reliability, desired quality). This is the case of quantization that is controlled by Quantization Parameter (QP), whose

978-1-4799-2341-0/13/$31.00 2013 IEEE

1995

ICIP 2013

possible values are included in a nite set and associated quantization step sizes. In other cases, parameters determine the number of possible congurations that need to be tested in order to nd the most suitable one, and therefore, their values affect the required complexity as well. The partition depth of CUs is one of these since it controls the maximum subpartitioning level that can be applied to CUs to maximize the coding performance. In general, it is possible to assume that the higher the complexity the better the nal quality of the reconstructed sequence. Since the characteristics of a video sequence are changing along time, optimization strategies aim at tuning the values of coding parameters according to the local characteristics. However, local choices affect the overall bit rate, distortion, and required complexity. Performing strong quantization on a certain frame implies reducing its quality and bit rate, but it also has strong implications on the following frames since the coding process uses it to predict the following ones. The better the quality of the reconstructed frames, the better the applied temporal prediction. As a result, the residual signal to be coded has lower energy and can be coded with fewer bits. On the other hand, in case the available bit rate is constrained, allocating more bits for a certain frame implies reducing the amount of available bits for the others. This leads the designed video codec to the dilemma between assigning more bits to initial frames in order to reconstruct a reliable reference for prediction or compensating the inefcient prediction by assigning more bits to the following pictures. A similar problem can be found for computational complexity in real time or mobile applications. Real time coding requires that the coding operations for a frame are completed in a frame period, while mobile applications need to limit the computational complexity in order to reduce the battery consumption and increase the autonomy of the device. From these premises, it seems that a video sequence is a collection of different parts (frames, CUs) with similar but contrasting requirements in terms of rate, distortion, and complexity. In order to avoid conicts, a joint optimization of all the parameters is desirable [7] but the size of the problem makes it prohibitive in terms of complexity. A possible solution can be found by designing a distributed optimization algorithm based on GT principles. 3. GENERAL NON-COOPERATIVE GAMES GT is used in many science elds, including economics, social, political, and computer sciences. The general structure of non-cooperative games [8] is widely used to model and solve many multitarget parameter optimization problems, where the optimization procedures need to nd the values of the conguration parameters that dene the best trade-off among different contrasting requirements [9]. As a matter of fact, the GT framework has been successfully applied to

optimize resource allocation [10], channel access [11], and distributed optimization [12]. In video coding perspective, GT is used for rate-distortion optimization [13, 14]. In this work, we propose to use non-cooperative games in the R-D-C optimization of video coding parameters. A game consists in the interaction of n players that take decisions in order to maximize their own utility. The decision chosen by the i-th player at the time moment t is referred to as a strategy and is represented as i,t . The set of strategies chosen by n players at the moment t can be dened as = [1,t , 2,t , ..., n,t ]. The utility or payoff function u( ) depends on the array of strategies . The utility function ui ( ) for player i is dened so that ui : C1 ... Cn (1) ui : ui ( ) . A utility function is a criterion that allows to compare different strategies, i.e. the set of strategies 1 is better than the set of strategies 2 if u(1 ) > u(2 ). A game with n players where the i-th player chooses the strategies i in the set Ci and has a utility function ui ( ) dened above, can be expressed in strategic form: (C1 , ..., Cn , u1 ( ), ..., un ( )). For a game G = (C1 , ..., Cn , u1 , ..., un ), a given con ) is a Nash equilibrium if the stratguration (1 , ..., n egy i is the best response of player i to the strategies (1 , ..., i 1 , i+1 , ..., n ), i.e.
ui ( 1 , ..., i , ..., n ) > ui (1 , ..., i , ..., n )

(2)

for i N and i Ci . 4. RATE CONTROL Rate allocation strategies assign to each frame, in the current Group-of-Pictures (GOP), a target bit budget that must be respected as much as possible. The frames at the different positions in a GOP can be seen as players that compete among each other in a repeated game to maximize the nal visual quality given a xed target bit rate for the current GOP. Given a target bit rate Rb for the input sequence (with frame rate Fr ), the quadratic rate control strategy implemented in the reference software [15] computes a target bit rate T0,i for the i-th frame in the GOP at the beginning of the coding process. The value T0,i depends on the target buffer occupancy, the actual buffer occupancy (which takes into account all the previous possible bit allocation errors), and parameters Rb and Fr (see [15] for more details). This initial value T0,i is then computed considering only previous events, but possible future conditions are not forecasted. As a matter of fact, the proposed rate control strategy builds an Np players game with Np = N i and N the number of frames in the current GOP or sub-GOP. The possible strategies tg,p T denote the requested target bit rates by the p-th player, i.e., the (p + i 1)-th frame (p = 1, . . . , Np ). Strategies for all

1996

the frames/players can be grouped in the array tg = [tg,p ]. Each player p selects the desired target bit rate in the set T = T0,p+i1 (k NT /2)/NT | k = 0, . . . , NT 1, where NT is the number of strategies. For every player p, it is possible to compute an approximate distortion measure associated to tg Np ((tg,p ))2 i k=1 tg,k Np < e if T0 12 ,p+i1 u(tg ) = 0 otherwise. (3) The function () maps bit rate values into quantization steps using the quadratic rate-distortion model of [15]. The parameter is a tolerance value that is usually set to .5 for small GOPs (value is .2 for long GOPs). The value weighs the impact of the distortion for the reconstructed frame on the whole GOP. Every player/frame aims at minimizing its own distortion function. The rate control algorithm looks for those congurations tg that are Nash equilibria (NE). Among these, the algorithm is going to select the one such that
Np

up (tg ). t g = arg min tg is NE p=1

(4)

The new target bit rate value Tg,i = t g,0 is assigned to the i-th frame. The rate control algorithm then converts it into an average QP value (via the quadratic model) and codes the frame. The choices of a single frame must be fair since a greedy behavior (allocating too many bits for itself) would lead to bit starvation for the following pictures. These will compensate this loss adopting a similar greedy behavior that would decrease the available bits for the following GOPs. 5. RATE-DISTORTION-COMPLEXITY OPTIMIZATION In the case of additional resource constraints such as computational complexity, it is possible to perform complexity management together with rate control. We propose a solution for this task for HEVC. In contrast to the approach described in [6], where only complexity is controlled, we provide ratedistortion optimization within the given complexity and bit budget. HEVC introduces a new data structure - CU tree - that has partition depth d as a parameter. This depth denes the smallest block size that can be used for encoding. HEVC tries encoding CUs on all depth levels in all possible congurations. Even though providing efcient compression, this method has a high computational complexity. As HEVC does not allow changing the depth while a sequence is being encoded, a maximum of four complexity levels are available (referred to as four possible partition depth values). In our approach we restrict the maximum partition depth value for some frames. Using predictive techniques and GT methods, we can dene depth values independently for each

encoded frame, which allows us to control the complexity and perform the encoding of the sequence within a given complexity budget. In our problem statement, a game consists in the interaction of n players (frames) that choose their strategies, i.e. the value of the partition depth, in order to maximize their own utility u. Each player i has a choice between m values of partition depth, i.e. the cardinality of the conguration sets is equal to mn . For each strategy we nd a utility function u( ) that takes into consideration both predicted distortion and complexity of the following frame. Afterwards we search for the set of strategies that provide Nash equilibria. The set of strategies that gives the best payoff according to the prediction will be a solution of the optimization task for the following frame. The proposed algorithm works in the following manner. After encoding the rst 4 frames with initial depth d0 and obtaining distortion and complexity characteristics (measured as performance time), we apply the GT approach to choose depth values for next frames. Similar to [16], we also estid mated the scaling factors Sd for complexity (C ) and distor0 tion measured as PSNR quality (Q) from d0 to other depth values d . We use linear predictor functions (5) and (6) to model data for C and Q. Knowing C and Q values for the rst 4 frames and Mean Absolute Difference (MAD) values for these frames, we get unknown model parameters and for d0 . C = C M AD + C , (5) Q = Q M AD + Q . (6) Formulas (5) and (6) allow us to predict C and Q values for d0 for next frames knowing their MAD value, and and parameters. The results are proportionally scaled to obtain prediction for other possible partition depths d in the following manner:
d Cd (C , C ) = Cd0 (C , C ) Sd , 0 d Qd (Q , Q ) = Qd0 (Q , Q ) Sd . 0

(7) (8)

The values of and are updated to the values and by applying linear regression after encoding each new frame with the chosen depth value d, scaling the obtained values of C and Q to d0 if needed. Predicted complexity and distortion values are used to dene the optimal values of the partition depth for the following frame using the GT approach described above. After the frame is encoded using the chosen depth value, the complexity budget for the remaining frames is updated using the actual encoding complexity of the encoded frame. 6. PERFORMANCE RESULTS The proposed approach for rate control and R-D-C optimization was implemented in HEVC Test Model (HM), version 7.0 [17]. We have performed R-D-C optimization on 8

1997

37 PSNR (dB) PSNR (dB) 36 35 34 33 50 GT HM

34 33.5 33 32.5 32 100 150 Rate (kbit/s) 200 31.5 150 200 250 300 Rate (kbit/s) 350 GT HM

Table 1: Average PSNR loss (dB) for the proposed conguration in respect to the optimal one for target complexity levels
Frames Initial QP 28 30 32 36 28 30 32 36 60% 0.043 0.019 0.064 0.015 0.021 0.001 0.027 0.028 70% 0.033 0.045 0.081 0.064 0.072 0.040 0.045 0.043 80% 0.029 0.087 0.077 0.080 0.054 0.062 0.046 0.115 90% 0.041 0.048 0.044 0.000 0.108 0.029 0.018 0.034

20 - 27

Fig. 1: Rate-distortion performance of GT for sequences foreman (left) and soccer (right).
90 - 97
32.5

Average actual complexity (%)

90 80 70 60 60 70 80 90 Target complexity (%)


31.5 0.055 0.06

PSNR (dB)

32

Table 2: Mean diff. between target and actual complexity (%)


all combinations solution by GT

0.065

0.07

Bit rate (bpp)

Target complexity Proposed approach Approach in [6]

60% 0.70 2.80

70% 0.64 2.92

80% 0.87 3.12

90% 0.91 4.28

(a) Relationship between the target and actual complexity

(b) Example of optimality of the proposed solution

Fig. 2: Performance of GT R-D-C optimization. CIF video sequences (akiyo, city, coastguard, crew, foreman, mobile, news, soccer), which represent different kinds of contents: high motion, texture and low motion. 100 frames per sequence were used in our experiments. The sequences have been encoded with 4 target complexity levels (60%, 70%, 80% and 90%) and 4 initial QPs (28, 30, 32, 36), referring to the target bit rates (280, 240, 190, 160 kbps), respectively. The chosen GOP size is 4 and CU size is 64 64. A rst set of experimental tests were performed considering the rate-control strategy only. Figure 1 compares the average PNSR value of the luminance component vs. the bit rate of the proposed strategy (labelled GT) and the reference approach implemented in the HM software [17] (labelled HM). Results show that the proposed strategy allows an improvement of 0.2 dB on average with the additional possibility of merging the proposed strategy with the R-D-C strategy. Performance results for the R-D-C optimization are shown on Fig. 2 and Table 1. The complexity is measured by processing time, and the actual complexity refers to the de facto time spent for encoding the sequence. The 100 % complexity refers to the encoding complexity of the whole sequence with d = 4. As it can be seen from Fig. 2(a), the proposed GT approach combined with prediction provides very accurate control of the complexity. The mean deviation from the target complexity, compared to the results presented in [6], is demonstrated in Table 2. Figure 2(b) shows the ratedistortion performance of the proposed approach (labelled solution by GT) for the frames 90 - 97 of the sequence crew, encoded with 80% complexity with initial QP 32. For comparison we show the results for all combinations of depth values for the given frames. Table 1 shows the performance results of the proposed

R-D-C optimization. These results show the PSNR loss for the solution found by our algorithm compared to the optimal one. The optimal is found by performing full search over all possible combinations of depth values, for the given frames, not exceeding the complexity limit. Rate, distortion and complexity performance for each depth value for each frame were evaluated in advance, therefore we refer to this optimization as ofine, which prevents its application in real-time encoding. Due to the infeasibility of performing full search on all 100 frames per sequence, we have chosen two points in all 8 sequences (corresponding to frames 20th and 90th) and evaluated the performance on the following 8 frames. 7. CONCLUSION AND DISCUSSION The presented work for HEVC based on the application of GT allows R-D-C optimization of the video codec. Our results evidence a small PSNR loss against the solution that can be found by ofine optimization. In contrast to the work [6], we do not only decrease the complexity of HEVC, but also provide a solution that gives rate-distortion performance close to the optimal one and achieve target complexity with very high precision. The use of rate control for HEVC allows performing optimization under bit rate and complexity constraints at the same time, providing an improvement of 0.2 dB on average compared to standard HM approach. Our approach shows the possibility of controlling the complexity of HEVC allowing its use in power-constrained devices. Optimization of the power consumption of the device through the complexity control of the encoder implies accessing the state of the hardware resources in the device. This means that R-D-C control should be performed from a cross-layer optimization point of view. Future optimization of the proposed algorithm can include another layer - transmission - if compressed video data has to be sent over the network. Such cross-layer optimization will result in nding computation-communication trade-offs.

1998

8. REFERENCES [1] Z. He, W. Cheng, and X. Chen, Energy minimization of portable video communication devices based on powerrate-distortion optimization, IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 5, pp. 596608, 2008. [2] Z. He, Y. Liang, L. Chen, I. Ahmad, and D. Wu, Powerrate-distortion analysis for wireless video communication under energy constraints, IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 5, pp. 645658, 2005. [3] J. Stttrup-Andersen, S. Forchhammer, and S.M. Aghito, Rate-distortion-complexity optimization of fast motion estimation in H.264/MPEG-4 AVC, IEEE Intl Conf. Image Process. (ICIP), 2004. [4] R. Vanam, E.A. Riskin, S.S. Hemami, and R.E. Ladner, Distortion-complexity optimization of the H.264/MPEG-4 AVC encoder using the GBFOS algorithm, Data Compression Conf. (DCC), pp. 303312, 2007. [5] X. Li, M. Wien, and J.-R. Ohm, Rate-complexitydistortion optimization for hybrid video coding, IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 7, pp. 957970, 2011. [6] G. Corr ea, P. Assuncao, L.A. da Silva Cruz, and L. Agostini, Adaptive coding tree for complexity control of high efciency video encoders, Picture Coding Symp. (PCS), pp. 425428, 2012. [7] S. Milani and G. Calvagno, Low-complexity crosslayer optimization algorithm for video communication over wireless networks, IEEE Trans. Multimedia, vol. 11, no. 5, pp. 810821, Aug. 2009. [8] M. J. Osborne, An Introduction to Game Theory, Oxford University Press, 2003. [9] S. Milani and G. Calvagno, A game theory based classication for distributed downloading of multiple description coded videos, IEEE Intl Conf. Image Process. (ICIP), pp. 30773080, 2009. [10] G. Wei, A. V. Vasilakos, Y. Zheng, and N. Xiong, A game-theoretic method of fair resource allocation for cloud computing services, J. Supercomput., vol. 54, no. 2, pp. 252269, Nov. 2010. [11] S. Rakshit and R. K. Guha, Fair bandwidth sharing in distributed systems: A game-theoretic approach, IEEE Trans. Comput, vol. 54, no. 11, pp. 13841393, 2005. [12] D. A. Smith, C. Shi, R. A. Berry, M. L. Honig, and W. Utschick, Distributed resource allocation schemes: Pricing algorithms for power control and beamformer

design in interference networks, IEEE Signal Process. Mag., vol. 26, no. 5, pp. 5363, Sept. 2009. [13] I. Ahmad and J. Luo, On using game theory to optimize the rate control in video coding, IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 2, pp. 209219, Feb. 2006. [14] M. Tiwari, T. Groves, and P. Cosman, Bit-rate allocation for multiple video streams using a pricing-based mechanism, IEEE Trans. Image Process., vol. 20, no. 11, pp. 32193230, Nov. 2011. [15] Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JCTVC-A124: Samsungs response to the Call for Proposals on video compression technology, 2010, ed. Dresden, Germany. [16] G. Corr ea, P. Assuncao, L. Agostini, and L.A. da Silva Cruz, Complexity control of high efciency video encoders for power-constrained devices, IEEE Trans. Consum. Electron., vol. 57, no. 4, pp. 18661874, 2011. [17] HEVC software repository (main at HHI), https://hevc.hhi.fraunhofer.de/svn/ svn_HEVCSoftware/.

1999

You might also like