ANALYSIS OF VIDEO CODEC BUFFER AND DELAY UNDER TIME-VARYING CHANNEL Zhifeng Chen and Yuriy Reznik

InterDigital Communications, LLC 9710 Scranton Rd. Suite 250, San Diego, CA 92121 USA
ABSTRACT In this paper, we analyze the effect of time-varying channels to video codec buffer specially for low-delay applications. We derive the sufficient conditions under which an encoder can design a bitstream for any time-varying channel without decoder buffer overflow and underflow. We then apply those conditions to design a bandwidth adaptive rate control in x264 and test it under LTE simulator. Our test results show significant improvement of delay and delay jitter over traditional leaky bucket models. Index Terms— Leaky bucket model, HRD, VBV, Rate control, LTE, H.264/MPEG-4 AVC 1. INTRODUCTION Thanks to the advances in wireless networks and improvements in processing and graphics capabilities of mobile devices, mobile video telephony is now becoming part of our daily lives [1]. Yet, some technical challenges in the design of video phone applications still exist. On one hand, such applications require low-delay, jitter-free delivery of video, but on the other hand, they are constrained by time-varying behavior of mobile networks. In most video coding standards, a hypothetical reference decoder (HRD) [2] or a video buffering verifier (VBV) [3] is proposed to help design of conforming encoders and bitstreams. One major part of HRD/VBV is to ensure that there will be no underflow and overflow in the decoder coded picture buffer (CPB) where underflow causes delay jitter and overflow causes packet loss. In video encoders, leaky bucket models are usually adopted to control encoded bit rate conforming to the decoder HRD or VBV. Ref. [4] proposes a generalized HRD in H.264/AVC, where several leaky bucket models are specified for a given bit stream instead of just one option in previous standards. The best leaky bucket model can be selected by communication system according to different network connections and delay requirements. New wireless network standard, e.g. 4G LTE, provides much faster transmission rate than previous wireless standards. Due to the multi-path fading and multi-user characteristics, the transmission bit rate fluctuates significantly, from zero to dozens of megabits per second, every transmission time interval (TTI). Rapid changes in channel behavior make the problem of low-delay rate control for video encoding very challenging. However, the generalized HRD assumes channel has constant bit rate (CBR) or piece-wise variable bit rate (VBR), i.e channel capacity changes much slower than frame rate. To the best of our knowledge, there is no literature strictly prove the conditions guiding encoder to design the bitstream conforming to HRD/VBV after transmitted over a fast time-varying channel. In this paper, we analyze the effect of time-varying channels to buffer specially for low-delay applications. We derive, for the first time, the sufficient conditions under which an encoder can design a bitstream for any time-varying channel without buffer overflow and underflow. We then apply those conditions to design a bandwidth adaptive rate control in x264 and test it under LTE simulator. Our method requires an estimation of channel bit rate. Both perfect and imperfect channel estimation are simulated. The test results provides an reference about how much the improvement of delay/delay jitter and video quality over traditional CBR/VBR leaky bucket models we may gain if channel estimation is possible to encoder. The rest of paper is organized as follows. Section 2 offers background information and illustration of problems why traditional leaky bucket models fail if channel bit rate rapidly changes. Section 3 proves the conditions for avoiding overflow and underflow in buffer and explains the bandwidth adaptive rate control method. Section 4 presents experimental results. Conclusions are drawn in Section 5. 2. PROBLEM STATEMENT Fig. 1 show the relationship among a leaky bucket model and encoding schedule in a CBR case. In H.264/AVC HRD, the decoding process is assumed to be instantaneously, i.e. the decoding schedule is a step function (or staircase function). Correspondingly, in Fig. 1 we also assume the encoding process is instantaneous. Note that our method in this paper is general for non-instantaneous encoding and decoding processes. In Fig. 1, the encoder buffer fullness is defined as 𝐹𝑒 (𝑡) = 𝒮𝑒 (𝑡) − 𝒮𝑡 (𝑡), which is constrained by 𝐵𝑒 and the decoder buffer fullness is defined as 𝐹𝑑 (𝑡) = 𝒮𝑟 (𝑡) − 𝒮𝑑 (𝑡), which is constrained by 𝐵𝑑 . The leaky bucket model actually specifies a combination of {max[𝑅(𝑡)], 𝐵𝑒 , 𝐹𝑒 (𝑡0 )} for the encoder, where 𝑅(𝑡) is 𝑒 the channel capacity over time. It has been proved that the

while 𝒮𝑡. 3. Quantities 𝛿𝑒 . 𝒮𝑒 (𝑡). Leaky bucket model in the CBR case. 𝒮𝑡. This frame was regulated by the traditional leaky bucket model in order to meet HRD but can no longer be delivered on scheduled arˆ𝑑 rival time 𝑡1 due to very low instantaneous channel bit rate. and decoder buffering correspondingly. (3) complement of encoder buffer fullness is equal to the decoder buffer fullness for the CBR cases [4]. These examples explain motivation for our work. (4) We next produce bounds for encoding and decoding schedules for time-varying channels with a continuous function 𝑅(𝑡). Similarly. and decoder correspondingly. the encoding schedule 𝒮𝑒 (𝑡) shall satisfy 𝒮𝑡 (𝑡) < 𝒮𝑒 (𝑡) < 𝒮𝑡 (𝑡) + 𝐵𝑒 . transmitter. 𝐵𝑑 . 𝒮𝑑 (𝑡) show cumulative numbers of bits over time (or schedules) at the encoder. 𝛿𝑒 and 𝛿𝑑 has critical impact on the HRD performance. Δ𝑘 is the endto-end delay. So the decoder has to wait till point 𝑡1 in order to start de𝑑 coding of this frame.𝐶𝐵𝑅 (𝑡) corresponds to constant rate channel model. This introduces decoder jitter. 𝒮𝑟 (𝑡). Fig. 2 . 1 and Fig.𝑉 𝐵𝑅 (𝑡) shows the actual transmission schedule. ANALYSIS AND ALGORITHM 3. 2.Fig. 𝒮𝑡 (𝑡). which is the most general case including CBR and piece-wise VBR. 𝐵𝑒 and 𝐵𝑑 denote buffer sizes of encoder and decoder correspondingly. 𝐿𝑘 is the length of 𝑘-th encoded frame. [4]. 2. 2 also shows possible decoder overflow and encoder underflow situations that can occur under time-varying channels. which is pushed into encoder buffer at 𝑡𝑘 and 𝑒 𝑘 𝑘 𝑘 𝑘 pulled out of decoder buffer at 𝑡𝑘 . 𝛿𝑑 𝑑 are delays due to encoder buffering. Behavior of the leaky bucket model under timevarying channel. in order to avoid overflow and underflow in the decoder buffer. a bit stream designed to avoid overflow and underflow in encoder under the corresponding leaky bucket model will conform to the HRD in decoder. assumed by the encoder. We will see that in the experimental section.2. although the exactly same encoding schedule as in Fig. propagation. (1) However. (1) fails and both underflow and overflow happens due to time-varying channels. Constraints under time-varying channels By 𝑡𝑘− and 𝑡𝑘+ let us denote time points right before and 𝑒 𝑒 after adding the 𝑘-th frame bits into encoder buffer. 𝐹𝑑 (𝑡0 ) (equivalent to 𝑑 0 0 𝛿𝑡 + 𝛿𝑑 )} for the decoder. Basic constraints From previous discussion. 1. may be problematic for low-delay applications. which. (2) where 𝒮𝑡 (𝑡) depends on the channel capacity and initial en0 coder delay 𝛿𝑒 as shown in Fig. 𝛿𝑝 . Circled regions show encoder and decoder underflows caused by changing behavior of the channel. the decoding schedule shall satisfy 𝒮𝑟 (𝑡) − 𝐵𝑑 < 𝒮𝑑 (𝑡) < 𝒮𝑟 (𝑡). the reception and transmission schedules become connected as follows 𝒮𝑟 (𝑡) = 𝒮𝑡 (𝑡 − 𝛿𝑝 ). which is to derive the sufficient conditions for avoiding buffer underflow and overflow under rapidly-varying channels and based on that to design bandwidth adaptive rate control mechanism that can ensure HRD compliant bitstreams. authors also proved that the complement of encoder buffer fullness is a tight (achievable) upper bound to the decoder buffer fullness in piece-wise VBR case 𝐹𝑑 (𝑡) ≤ 𝐵𝑒 − 𝐹𝑒 (𝑡). 1. as shown in Fig. If data packets have constant propagation delay 𝛿𝑝 . transmission. HRD or VBV only specifies {max[𝑅(𝑡)]. It is easy to prove that a sufficient and necessary condition of (2) is 𝒮𝑒 (𝑡𝑘− ) > 𝒮𝑡 (𝑡𝑘− ) and 𝒮𝑒 (𝑡𝑘+ ) < 𝒮𝑡 (𝑡𝑘+ ) + 𝐵𝑒 . it follows that in order to avoid overflow and underflow in the encoder buffer. 2 shows an example of decoder underflow caused by transmission of frame 1 (index begins from 0). As a result.1. 0 where 𝒮𝑑 (𝑡) is a function of initial decoder delay 𝛿𝑑 . in video coding standards. 𝑒 𝑒 𝑒 𝑒 (5) where 𝒮𝑒 (𝑡𝑘− ) is the cumulative number of encoded bits at 𝑒 time 𝑡𝑘− and 𝒮𝑡 (𝑡𝑘− ) is the cumulative number of transmitted 𝑒 𝑒 . 𝛿𝑡 . Fig. if happens frequently. 3. Therefore. receiver. Fig. In Ref. There0 0 fore.

it is easy to prove that the tightest upper bound of 𝐵𝑑 shall be (Δmax − 𝛿𝑝 ) ⋅ 𝑅max . we will 1) derive the sufficient conditions for 𝐿𝑘 to avoid overflow and underflow in encoder and decoder. The thresholds for CBR and piece-wise VBR cases can be easily obtained from them. 3. we can rewrite left side in (10) as 𝑆𝑟 (𝑡𝑘+1 ) − 𝐵𝑑 = 𝑆𝑡 (𝑡𝑘+1 + Δ𝑘+1 − 𝛿𝑝 ) − 𝐵𝑑 𝑒 𝑑 ∫ 𝑡𝑘+1 ∫ 𝑡𝑘+1 +Δ𝑘+1 −𝛿𝑝 𝑒 𝑒 = 𝛿0 𝑅(𝑡)𝑑𝑡 + 𝑡𝑘+1 𝑅(𝑡)𝑑𝑡 − 𝐵𝑑 . we also need to balance 𝐿𝑘 and Δ𝑘 . (7) From the definition of buffer fullness. We note that 𝑆𝑟 (𝑡𝑘 ) = 𝑆𝑡 (𝑡𝑘 + Δ𝑘 − 𝛿𝑝 ) 𝑒 𝑑 ∫ 𝑡𝑘 +Δ𝑘 −𝛿𝑝 ∫ 𝑡𝑘 𝑒 𝑅(𝑡)𝑑𝑡. 𝑒 ∫ (8) 𝐿𝑘 > ∫ 𝑡𝑘+1 𝑒 𝑡𝑘 𝑒 ˆ 𝑅(𝑡)𝑑𝑡 − 𝐹𝑒 (𝑡𝑘− ). we know that 𝐵𝑒 should satisfy ∫ 𝐵𝑒 ⩾ 𝑡𝑘 +Δ𝑘 −𝛿𝑝 𝑒 𝑡𝑘 𝑒 𝑅(𝑡)𝑑𝑡. Therefore. . it is easy to prove that a sufficient and necessary condition of (3) is 𝑘 ∑ (10) 𝑆𝑟 (𝑡𝑘+1 ) − 𝐵𝑑 < 𝐿𝑖 < 𝑆𝑟 (𝑡𝑘 ). where Δ𝑚𝑎𝑥 = max𝑘 Δ𝑘 and 𝑅max = max𝑡 𝑅(𝑡). 𝛿𝑒 is called initial cpb removal delay and 0 0 𝛿𝑡 + 𝛿𝑑 is called initial cpb removal delay offset. and 2) determine the tightest upper bound for 𝐵𝑒 and 𝐵𝑑 . Note that (15) and (18) are general forms for any channel. (14) ∫ 𝑡𝑘 𝑒 0 𝛿𝑒 𝑅(𝑡)𝑑𝑡. 𝑒 (18) (18) define sufficient conditions for avoiding decoder buffer underflow. Discussion of delay-related constraints From Fig. = 𝛿0 𝑅(𝑡)𝑑𝑡 + 𝑡𝑘𝑒 𝑒 𝑒 (16) that is. 𝒮𝑒 (𝑡𝑘− ) = 𝒮𝑒 (𝑡(𝑘−1)+ ) = 𝑒 𝑒 𝑖=0 From (13). (18) and (17) further ensure the encoder has no overflow. if the transmitter has the capability of channel estiˆ mation for encoder. By (16) and channel estimation. Both of them was defined in Buffering period SEI (supplemental enhancement information) message [2]. By comparing this expression with the left side of (9). the second inequality in (10) can be rewritten as 𝐿𝑘 < ∫ 𝑡𝑘 +Δ𝑘 −𝛿𝑝 𝑒 𝑡𝑘 𝑒 ˆ 𝑅(𝑡)𝑑𝑡 − 𝐹𝑒 (𝑡𝑘− ). we see that 𝐹𝑒 (𝑡𝑘− ) = 𝑒 ∑𝑘−1 𝑖 ∫ 𝑡𝑘 𝑒 𝑖=0 𝐿 − 𝛿 0 𝑅(𝑡)𝑑𝑡.e. 2. 2.4. ∫ 𝑡𝑘+1 +Δ𝑘+1 −𝛿𝑝 𝑒 (13) 𝑅(𝑡)𝑑𝑡 < 𝐵𝑑 . i. Assume the time axis begins from encoding 𝑒 the first frame. ∫ 𝑡𝑘+1 𝑑 0 0 𝑡0 −𝛿𝑡 −𝛿𝑑 𝑑 𝑅(𝑡)𝑑𝑡 − 𝐵𝑑 < 𝑘 ∑ 𝑖=0 𝐿 < 𝑖 ∫ 𝑡𝑘 𝑑 0 0 𝑡0 −𝛿𝑡 −𝛿𝑑 𝑑 𝑅(𝑡)𝑑𝑡. (11) By comparing this expression with the right side of (9). From Fig. (9) and (11) may be not satisfied simultaneously due to time-varying 𝑅(𝑡). 𝑒 (15) 𝑡𝑘+1 𝑒 0 𝛿𝑒 𝑅(𝑡)𝑑𝑡 < 𝑘 ∑ 𝑖=0 𝐿 < 𝑖 𝑡𝑘 𝑒 0 𝛿𝑒 𝑅(𝑡)𝑑𝑡 + 𝐵𝑒 . the estimated channel capacity 𝑅(𝑡) can 𝑘 help design 𝐿 in encoder to meet (9) and (11).3. In some cases. by using 𝐵𝑑 = (Δmax − 𝛿𝑝 ) ⋅ 𝑅max and (15) for designing the encoding schedule. 𝑡𝑘+1 𝑒 (19) 0 In H. 3. Note that (15) also ensures no underflow in the encoder buffer. 𝑑 𝑑 𝑖=0 In other words. we observe that the first inequality in (11) becomes satisfied. we can 𝑒 observe that 𝑘−1 ∑ (6) 𝐿𝑖 . we know that 𝑘 𝑘 𝑘 Δ𝑘 = 𝛿𝑒 + 𝛿𝑝 + 𝛿𝑡 + 𝛿𝑑 . Following the same arguments.264/AVC. we can ensure the resulted bitstreams will not cause overflow in the decoder buffer. The first inequality in (9) can be rewritten as 𝐿 > 𝑘 ∫ 𝑡𝑘 𝑒 0 𝛿𝑒 and 𝒮𝑡 (𝑡𝑘− ) = 𝒮𝑡 (𝑡𝑘 ) = 𝒮𝑡 (𝑡𝑘+ ) = 𝑒 𝑒 𝑒 Therefore. In the following. Sufficient conditions for 𝐿𝑘 and tightest upper bound for 𝐵𝑒 and 𝐵𝑑 From (4). 1 and Fig. (9) for all 𝑘 ⩾ 0. (17) Different from CBR and piece-wise VBR. ∫ 𝑘 ∑ 𝑖=0 𝑅(𝑡)𝑑𝑡 − 𝑘−1 ∑ 𝑖=0 𝐿 + 𝑖 ∫ 𝑡𝑘+1 𝑒 𝑡𝑘 𝑒 ˆ 𝑅(𝑡)𝑑𝑡. if both (13) and the first inequality in (9) becomes satisfied. avoiding decoder overflow. However. (5) becomes 𝑆𝑡 (𝑡𝑘+1 ) < 𝑒 that is. (14) is simplified as 𝑒 𝐿𝑖 < 𝑆𝑡 (𝑡𝑘 ) + 𝐵𝑒 . Consider now the right side in (10). (12) 𝑒 𝑒 which can be ensured by setting 𝐵𝑒 = 𝐵𝑑 . 1 and Fig. that is 𝑡0 = 0.bits at time 𝑡𝑘− .

in other words. for low delay mode. i.5. LTE channel rate changes significantly every TTI. 𝑒 𝑒 By (19) and (20). propagation delay 𝛿𝑝 . Therefore. Target residual bit estimation For a hybrid video coder with block-based coding scheme. The major limitation of this method is the variation of quality frame by frame resulted from time-varying channel and strictly low end-to-end delay constraint.2.e. (23) (21) In case of strict delay bound. before encoding the 𝑘-th frame. For these cases.264/AVC HRD also specifies two delay modes. it requires a very accurate rate-distortion models to estimate 𝐿𝑘 from targeted quality and some rate-distortion optimization are required [5. 4. the second column shows fluctuations of frame rates. 6].e.814 model. 𝛿𝑒 + 𝛿𝑡 + 𝑘 𝑘+1 𝑘 𝛿𝑑 ⩾ 𝑡𝑒 − 𝑡𝑒 . In this case. should design 𝐿𝑘 based on its buffer fullness 𝐹𝑒 (𝑡𝑘− ) and 𝑒 ˆ estimated channel capacity 𝑅(𝑡) as follows ] [∫ 𝑘 1 ∫ 𝑡𝑘 +Δ𝑘 −𝛿𝑝 𝑡𝑒 + 𝑓𝑠 𝑒 1 𝑘 𝑘−1 ˆ ˆ 𝑅(𝑡)𝑑𝑡 + 𝑅(𝑡)𝑑𝑡 − 𝐹𝑒 (𝑡𝑒 ). Compared to 𝐻𝑟𝑒𝑠𝑖 . implemented according to 3GPP TS 36. 𝐿𝑘 𝑚𝑣 and 𝐿𝑘 ℎ𝑒𝑎𝑑𝑒𝑟 can be first estimated from the statistics in the 𝑘 previous frames and then 𝐻𝑟𝑒𝑠𝑖 will be used to determine the 𝑘 𝑄 by any bit rate model. That is. which directly related to the delay/delay jitter and video quality/quality variation. Target frame bit estimation Our final task is to define a rate assignment algorithm that meets conditions (15) and (18). 𝐿𝑘 and 𝐿𝑘 𝑚𝑣 ℎ𝑒𝑎𝑑𝑒𝑟 are less affected by quantization step size 𝑄𝑘 . Results An illustrative subset of our results is shown in Fig. Therefore. Another method is to design 𝐿𝑘 to minimize the quality variation given (15) and (18). We’ve also set our rate control algorithm to meet several end-to-end-delay bounds. Theoretically. coupled with LTE channel simulator. and other header information bits 𝑚𝑣 𝐿𝑘 ℎ𝑒𝑎𝑑𝑒𝑟 . and a modified version of x264 encoder incorporating our proposed algorithm. All tests were performed using standard CIF and 720p . i. mo𝑟𝑒𝑠𝑖 tion information bits 𝐿𝑘 . 90ms. 𝛿𝑡 should be less than Δ − 𝛿𝑝 − 𝛿𝑒 In case delay jitter is allowable. For x264 VBR rate control. H. 3. there are infinite 𝒮𝑒 (𝑡) which meets (15) and (18). Given the target end-to-end delay Δ𝑘 . 𝛿𝑑 can be extended to in𝑘 0 clude the difference between 𝛿𝑝 and 𝛿𝑝 .264/AVC codec. that is Δ𝑘 = Δ. 𝑘 𝐿𝑘 = 𝐻𝑟𝑒𝑠𝑖 ⋅ 𝑁𝑟𝑒𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛 + 𝐿𝑘 + 𝐿𝑘 𝑚𝑣 ℎ𝑒𝑎𝑑𝑒𝑟 . The first frame was encoded as an IDR frame and the following frames were encoded as P-frames with reference number equal to 3. On the other hand. The results are compared with both CBR and VBR in the original x264 encoder. A simple but robust method is to design 𝐿𝑘 to meet the average of the upper bound and lower bound. 𝐿 = 2 𝑡𝑘 𝑘 𝑡𝑒 𝑒 (22) 3. defined for each sequence and bit rate. EXPERIMENTAL RESULTS 4. 4. where Δ is 𝑘 𝑘 a given nominal delay. 𝛿𝑡 ≤ Δ − 𝛿𝑝 − 𝛿𝑒 .From (15) and (18). Experimental Setup To test our rate adaptation logic we have employed H. However. every 1ms. we know that to avoid overflow and underflow. encoder 𝑘 where 𝐻𝑟𝑒𝑠𝑖 is the entropy of residual (bits per pixel) and 𝑁𝑟𝑒𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛 is the normalized video resolution considering 𝑘 color components. Lookahead was disabled and the maximum size for all NAL packets was set to 1400 bytes. it might make more sense to strike a certain balance in choices of constraints for Δ𝑘 and 𝐿𝑘 . which is beyond the scope of this paper. i.resolution video sequences [9]. 3. This method allow maximum margins for both high threshold and low threshold and therefore robust to the imperfect channel estimation. frame rate 𝑓𝑠 . 𝛿𝑑 ≥ 0 and the transmission 𝑘 delay 𝛿𝑡 depends on the 𝑘-th frame bits 𝐿𝑘 . Another delay related constraint is to balance 𝐿𝑘 and Δ𝑘 . the encoded bit rate 𝐿𝑘 consists of residual bits 𝐿𝑘 . it follows that end-to-end delay Δ𝑘 must satisfy: (20) Δ𝑘 − 𝛿𝑝 ⩾ 𝑡𝑘+1 − 𝑡𝑘 . We will use the first method in this paper to show the performance of our bandwidth adaptive rate control algorithm. Both VBV and HRD options are enabled. We have tested two encoders: x264 encoder with its original rate control[7]. Notice that in (19). Decoding of all bitstreams was done by using standard JM decoder[8]. i. the decoder buffering delay 𝑘 should be non-negative. The emphasis on choice of best rate points 𝐿𝑘 may cause large delay jitter. the emphasis on Δ𝑘 may cause large variation in bits allocated to different frames 𝐿𝑘 and hence inconsistent video quality. the sum of initial cpb removal delay and initial cpb removal delay offset in SEI should be designed at 𝑘 𝑘 least larger than the frame period.e. Note that for more general channel models where the 𝑘 propagation delay is time-varying.1. strict delay (delay jitter is not allowable) and low delay (delay jitter is allowable). . Δ𝑘 should include the extra propagation delay of the 𝑘-th frame 𝑘 0 𝛿𝑝 − 𝛿𝑝 .e. as shown in first column. For our rate control algorithm we used information from simulator to obtain channel rate estimates with different channel feedback delay.6. the 𝑘-th frame bits 𝐿𝑘 should be allocated such that: 𝑘 𝑘 𝛿𝑡 ≤ Δ𝑘 − 𝛿𝑝 − 𝛿𝑒 . for strict 𝑘 𝑘 𝑘 delay mode. we set initial delay and provided the mean and maximum channel rate. the delay bound (excluding propagation delay 𝛿𝑝 ) is set to be very low.

2 15 100 (a) 0 0 2000 4000 6000 8000 10000 12000 14000 0 0 50 100 150 200 250 300 10 0 0 50 100 150 200 250 300 0 50 100 150 200 250 300 TTI index (1 ms per TTI) 2 Frame index (40 frames per second) 80 Frame index 350 Frame index proposed x264 70 1. Under CBR case. Delay is calculated by Δ𝑘 with 𝛿𝑝 = 0 in (22) and delay jitter is the variance of delay.5% delay and 87. 3. In Fig. 720p. the former saves up to 70. Note that the PSNR variation in the encoder without considering the channel variation is the best of what can be achieved in the decoder with considering the channel variation. It can be observed. On the other hand. and (c) Parkrun. This is because x264 VBR may allocate more bits to those scene change frames to meet HRD with maximum channel capacity. x264 produces much worse PSNR and larger PSNR variation than our rate control algorithm at the rendering moment. instead of traditional BD-PSNR/rate. using the reconstructed frames in encoder for calculation. Comparison of behavior of x264 VBR and our proposed rate control schemes under time-varying channels. PSNR is shown for both before and after aligning the decoded frames. the gain reduces when the end-to-end delay bound increase. CIF.8 1.6 proposed (before alignment) x264 (before alignment) proposed (after alignment) x264 (after alignment) 40 800 proposed x264 1.6 100 20 0.8 30 0. are shown since the comparison is between two three-dimensional surfaces instead of twodimensional curves. we will analyze how to balance the delay and quality by using some metrics integrating both of them. peak signal-to-noise ratio (PSNR) are shown in third column. However. For example. In our future work. After aligning the decoded frames.2 28 200 1 40 26 PSNR (dB) 24 22 20 18 16 14 150 0. (b) Mobile.4 10 0. that end-to-end delay with native x264 rate control varies significantly. In other words.4 700 Channel rate (kbits) Video rate (kbits) 1. 3 the delay grows up to 845ms. the adaptive method hardly provides any performance gain. only part of data set are shown in Table 1 and 2.8 400 25 0.1% delay and 94. our proposed algorithm was able to maintain end-to-end delay close to the 90ms target.6 proposed (before alignment) x264 (before alignment) proposed (after alignment) x264 (after alignment) 32 proposed x264 300 60 1.8 60 proposed x264 50 900 1. since almost all of frames can be received. the delay-rate-PSNR. the resulted PSNR in the decoder is close to that in the encoder. In Table 1 and 2. due to the instantaneous channel capacity is time-varying and may much smaller than the maximum channel capacity. Instead. those scene change frames will increase the delay/delay jitter. If the end-to-end delay bound is in seconds level (as in streaming video). CIF. that is. and last column shows end-to-end delay. the best can be achieved when the delay bound is infinite. we believe either way of comparing PSNR is not good for low-delay applications. for the first sequence in Fig. Due to the space limit. Under VBR case. Results are produced using the following video sequences: (a) Foreman. 3. In fact.2 40 35 600 Delay (ms) 1 500 30 30 PSNR (dB) 0. although the resulted reconstructed frames in the encoder has a little larger PSNR variation. This is because the significant delay and delay jitter in bitstreams produced by x264 and display has to use previous frames for rendering if frames were not received within the delay bound.4 200 10 0.4 250 30 Channel rate (kbits) Video rate (kbits) 50 Delay (ms) 1. x264 shows smaller PSNR variation than ours. bandwidth adaptive rate control saves up to 84.1.6 20 300 20 0. In our algorithm.5% delay jitter for the same sequence. decoded and rendered before the delay bound.2 50 (b) 0 0 2000 4000 6000 8000 10000 12000 14000 0 0 50 100 150 200 250 300 12 0 0 50 100 150 200 250 300 0 50 100 150 200 250 300 TTI index (1 ms per TTI) 10 900 Frame index (40 frames per second) Frame index 450 Frame index 9 800 proposed x264 8 700 45 proposed (before alignment) x264 (before alignment) proposed (after alignment) x264 (after alignment) 400 proposed x264 350 7 Channel rate (kbits) 600 6 Video rate (kbits) 40 300 500 Delay (ms) 35 250 5 400 PSNR (dB) 30 200 4 300 3 200 25 150 2 20 100 1 100 15 50 (c) 0 0 2000 4000 6000 8000 10000 12000 14000 0 0 50 100 150 200 250 300 10 0 0 50 100 150 200 250 300 0 50 100 150 200 250 300 TTI index (1 ms per TTI) Frame index (40 frames per second) Frame index Frame index Fig. However.4% delay jitter than x264 rate control for foreman-cif sequence. We also compared the performance under imperfect chan- . delay/delay jitter should be a better measure for the performance in this case.

“Rate-Distortion Optimized Cross-Layer Rate Control in Wireless Video Communication.49 21.xiph. We then applied those conditions to design a bandwidth adaptive rate control with low delay constraint. Our encoding schedule and rate control is applicable to future video coding standards. 10.86 123.07 73.70 7.90 jitter (ms) 118.videolan.11 59. pp.J.86 15. Imperfect channel estimation has more impact on very low end-to-end delay case.41 87. Regunathan. Instead. Wiegand and G.61 7. 2007.75 94.39 50.264. Even under imperfect channel estimation.04 11. HRD/VBV enabled sequence foreman-cif foreman-cif mobile-cif mobile-cif parkrun-720p parkrun-720p mobcal-720p mobcal-720p init delay (ms) 90 180 90 180 90 180 90 180 rate (kbps) 400 400 400 400 2000 2000 1000 1000 PSNR (dB) 19.09 122.32 50.86 142. vol.96 x264 delay (ms) 341. we give our observation below.19 28.94 127.1% jitter 87.4% 45.94 114.93 36. 48.7% 52.K.78 20.27 6.A.” 2000.g.93 6. P. e.54 PSNR (dB) 16.39 98.15 proposed delay (ms) 54. 5. The algorithm facilitates the design of encoding schedule which produces bitstreams conform to HRD model even transmitted over a time-varying channel.9% Table 2.” Nov.5% 31.96 PSNR (dB) 35.24 24.14 98.21 23.3% 68. 9.38 172.42 213.7% -6.4% 92.71 70.83 240. Chou.” Circuits and Systems for Video Technology.40 86. Since higher target end-to-end delay has higher tolerance to these deviations. Annex C. 840–849. ranging from miliseconds to hundreds of miliseconds.4% nel estimation where the channel feedback has certain delay. [8] “H.02 17.3% 34. H.8% 15.95 276. pp. vol.82 352. “Optimum bit allocation and accurate rate control for video coding via 𝜌-domain source modeling. [2] “ISO/IEC 14496 − 10 ∣ ITU-T Rec.61 86.264/AVC reference software [JM 16.73 PSNR (dB) 18.8% 81.44 66.85 proposed delay (ms) 54.97 17. [7] “x264 open source video encoder implemenation. pp.13 PSNR (dB) 35.95 139.16 28.59 105. Advanced Video Coding for Generic Audiovisual Services. and S.1% 58.5% 43.66 56.6% 26.5% 73.14 43.37 70.29 52.html.43 21.10 19.0].12 -0. This is because the imperfect channel estimation causes deviation between estimated and true of encoder buffer delay and transmission delay. such as High Efficiency Video Coding (HEVC).6% 9. [5] Zhifeng Chen and Dapeng Wu.4% 6.18 52.81 137. [9] “Collection of test video sequences in YUV format.” IEEE Transactions on Circuits and Systems for Video Technology.92 4. e.79 44. our rate control still shows remarkable gain over both x264 VBR and CBR rate control. no.28 18. there is little impact. 180/270/360ms.78 4.51 60. vol.22 28.90 21. The algorithm was implemented within the H. 352–365.27 46.85 28.34 110.17 28. Due to the space limit.46 6. HRD/VBV enabled sequence foreman-cif foreman-cif mobile-cif mobile-cif parkrun-720p parkrun-720p mobcal-720p mobcal-720p init delay (ms) 90 180 90 180 90 180 90 180 rate (kbps) 400 400 400 400 2000 2000 1000 1000 PSNR (dB) 17.01 84. 12.” IEEE Transactions on Circuits and Systems for Video Technology. 2003.64 47.6% 76. 13.60 jitter (ms) 14.99 96. “The picturephone is here.55 28. IEEE Transactions on. 90ms. pp.21 28.95 16.” http://www. really.10 19.28 16.11 gain delay 84.92 60.6% 61.5% 83.87 12.14 16.45 jitter (ms) 268.1% 32. 22.04 1. September 2011.g.27 18.Table 1.47 23.de/suehring/tml/download. [4] J.org/video/derf. He and S.19 115. Mitra. 3. [6] Z.70 36.92 49. Ribas-Corbera.1% 72.6% 15.88 48.1% 51. vol. 6.42 23.97 94. no. Information Technology – Generic Coding of Moving Pictures and Associated Audio Information: Video. 264/AVC.89 29.” IEEE Spectrum. Video Buffering Verifier.00 24.93 9. no.org/developers/x264.88 65.41 14.90 36.42 28.hhi.264/AVC video encoder and tested using LTE simulator.03 x264 delay (ms) 183.20 10.52 71. Performance comparison with x264 CBR. [3] “ISO/IEC 138180 − 2. we did not show such results here.87 24.51 55. 2012. 7. Test results suggest that it achieves very good performance and tight control over end-to-end delay.41 23. 674–687.77 4.1% 63.03 25.66 18.4% 44. 50–54. than on other end-to-end delay cases. necessary for real-time applications.8% 25.6% 81. Performance comparison with x264 VBR.24 61.L.2% jitter 94. CONCLUSIONS We derived the sufficient conditions under which an encoder can produce a bitstream for any time-varying channel without decoder buffer overflow and underflow. Sullivan. no. 2002.” http://iphome.1% 67.19 78.12 gain delay 70. “A generalized hypothetical reference decoder for h.” http://media.3% 63. . REFERENCES [1] T.28 jitter (ms) 15.13 28.