134

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006

Rate-Distortion Performance of H.264/AVC Compared to State-of-the-Art Video Codecs
Peter Lambert, Wesley De Neve, Philippe De Neve, Ingrid Moerman, Member, IEEE, Piet Demeester, Member, IEEE, and Rik Van de Walle, Member, IEEE

Abstract—In the domain of digital video coding, new technologies and solutions are emerging in a fast pace, targeting the needs of the evolving multimedia landscape. One of the questions that arises is how to assess these different video coding technologies in terms of compression efficiency. In this paper, several compression schemes are compared by means of peak signal-to-noise ratio (PSNR) and just noticeable difference (JND). The codecs examined are XviD 0.9.1 (conform to the MPEG-4 Visual Simple Profile), DivX 5.1 (implementing the MPEG-4 Visual Advanced Simple Profile), Windows Media Video 9, MC-EZBC and H.264/AVC AHM 2.0 (version JM 6.1 of the reference software, extended with rate control). The latter plays a key role in this comparison because the H.264/AVC standard can be considered as the de facto benchmark in the field of digital video coding. The obtained results show that H.264/AVC AHM 2.0 outperforms current proprietary and standards-based implementations in almost all cases. Another observation is that the choice of a particular quality metric can influence general statements about the relation between the different codecs. Index Terms—DivX, H.264/AVC, just noticeable difference (JND), JVT, MC-EZBC, peak signal-to-noise ratio (PSNR), quality, video, WMV9, XviD.

I. INTRODUCTION

M

ODERN multimedia applications often rely on advanced video compression technologies. This is especially true for video enabled applications that are being deployed on embedded devices (limited storage and battery capacity) and/or in mobile environments (limited bandwidth and often charged by the volume of data transferred). Nevertheless, it is clear that very high coding efficiency is not the only decision parameter anymore nowadays: this is especially emphasized by some recent

initiatives [1], showing a clear interest for coding schemes that can be easily deployed in heterogeneous systems. Such environments do not only require support for error resilience and scalability tools at the level of the codec, but do also require support for negotiation and metadata mechanisms at the level of the framework that is relying on the codec’s adaptation capabilities. Such an architecture is currently under development in the MPEG-21 standardization effort [2]. An issue closely related to the previous problem statement about efficient content representation is the objective assessment of video quality. The latter constitutes the main topic of this paper. By means of peak signal-to-noise ratio (PSNR) and just noticeable difference (JND), several state-of-the-art video compression schemes are compared in terms of delivered quality, hereby making use of a large set of test sequences with distinct characteristics. In this comparison, the H.264/AVC video coding standard is in the centre of interest because the specification in question can be considered as the de facto benchmark in the field of digital video coding. Our research also reflects the impact of a chosen quality metric on the comparison of the different video codecs. The outline of the paper is as follows. After introducing the examined compression schemes in Section II, an overview of the applied methodology is provided in Section III. The obtained results are described in Section IV while the conclusions are drawn in Section V. II. VIDEO CODECS: OVERVIEW The five codecs that were used in our tests are summed up in this section. Due to place constraints, the reader is referred to the references for further information on these codecs. A first codec is the Ad Hoc Model 2.0 (AHM 2.0) implementation of the H.264/AVC standard [3], [4] which extends the JM 6.1 implementation [5] with a rate control algorithm [6]. Two codecs are based on the MPEG-4 Visual specification [7], [8]: XviD 0.9.1 [9] and DivX 5.1 [10], [11]. The fourth codec is Windows Media Video 9 (WMV9) [12], [13]. A last codec is a scalable wavelet-based video codec developed by J. Woods et al. (motion compensated embedded zero block coding—MC-EZBC) [14]. III. MATERIALS AND METHODS A. Encoding Process This section describes how the various codecs were configured and used in order to obtain the bit streams necessary for performing the various measurements.

Manuscript received January 16, 2004; revised May 13, 2005. This work was supported in part by Ghent University, the Interdisciplinary Institute for Broadband Technology (IBBT), in part by the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT), in part by the Fund for Scientific Research-Flanders (FWO-Flanders), in part the Belgian Federal Science Policy Office (BFSPO), and in part by the European Union. This paper was recommended by H. Chen. P. Lambert and W. De Neve are with the Multimedia Lab, Department of Electronics and Information Systems, Ghent University, Interdisciplinary Institute for Broadband Technology (IBBT), B-9050 Ghent, Belgium (e-mail: peter.lambert@ugent.be; wesley.deneve@ugent.be). P. De Neve, I. Moerman, and P. Demeester are with the Intec Broadband Communication Networks Group, Department of Information Technology, Ghent University, Interdisciplinary Institute for Broadband Technology (IBBT)IMEC, B-9050 Ghent, Belgium (e-mail: philippe.deneve@intec.ugent.be; ingrid.moerman@intec.ugent.be; piet.demeester@intec.ugent.be). R. Van de Walle is with the Multimedia Laboratpory, Department of Electronics and Information Systems, s Group, Department of Information Technology, Ghent University, Interdisciplinary Institute for Broadband Technology (IBBT)-IMEC, B-9050 Ghent, Belgium (e-mail: rik.vandewalle@ugent.be). Digital Object Identifier 10.1109/TCSVT.2005.857783

1051-8215/$20.00 © 2006 IEEE
Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on July 13,2010 at 07:08:37 UTC from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006

135

TABLE I SEQUENCES USED IN OUR EXPERIMENTS

TABLE II PARAMETER SETTINGS FOR THE MC-EZBC COMPRESSOR

As input six progressive video sequences were used in raw YCbCr 4:2:0 format.1 These were downloaded from the Hannover FTP server.2 An overview of the sequences is given in Table I. The classes A, B, and C are defined in [8] and are an indication for the amount of movement and/or spatial detail in the corresponding sequences. Two different resolutions were used: the Quarter Common Intermediate Format (QCIF, 176 144) and the Common Intermediate Format (CIF, 352 288), thus resulting in 12 input video sequences. These sequences were encoded by making use of constant bit rate coding (CBR). Thirty different target bit rates were used: both very low and very high bit rates. The bit rates are: 20, 40, 60, 80, 100, 200, 300, , 2500, and 2600 kbps. At each bit rate, encoding was performed at 30 frames per second. The H.264/AVC bit streams are conform to the Main Profile. The GOP structure is IBBBP and the GOP length is 16. For XviD, version 0.9.1 of the core library [9] was used, which provides an API for the encoding and decoding of video content. The source code contains a sample program that uses this interface. As such, it embodies an actual encoder and decoder. The only modification that was made in this code was the key frame interval, which was set to 16. The code of MC-EZBC was downloaded from the MPEG CVS server on May 13th, 2003. Each input video sequence was encoded once and then pulled several times in order to get decodeable bit streams for all target bit rates. Regarding WMV9 [13] and DivX 5.1, things are a bit more complicated since these compressors are part of an entire framework. In case of WMV9, we have been making use of the VCM (Video Compression Manager) compatible encoder in our experiments. Since the encoders of WMV9 and DivX expect that the source video data is encapsulated in an AVI file, an own written tool was used to put planar YCbCr 4:2:0 video data in an AVI container. Furthermore, we have modified VirtualDub 1.5.10 [15] in order to speed up the batch encoding process for WMV9 and DivX, and in order to preserve relevant information that is outputted by both compressors (such as the type and size of compressed frames, if available). For the decoding of the compressed video streams, an own application was developed on top of the DirectShow API that produces planar YUV data. This way, a YUV-only graphics pipeline was achieved for all codecs. To conclude the discussion about the encoding process for WMV9 and DivX 5.1, we have chosen to create bit streams that
1YUV is used in this article as a collective term for digital color spaces based on the separation of luma and chroma. 2[Online] Available: ftp://ftp.tnt.uni-hannover.de

TABLE III PARAMETER SETTINGS FOR THE H.264/AVC AHM 2.0 ENCODER

are compliant with the Main Profile in case of WMV9. In case of the DivX compressor, the quarter-pel and global motion compensation features were enabled during the generation of the bit streams. Because two-pass encoding is mainly used to improve the subjective quality (and not always the objective quality, e.g., PSNR), one-pass constant bit rate coding (CBR) was enabled in case of the WMV9 and DivX 5.1 encoder. The detailed settings for the different encoding parameters can be found in Tables II, III, and the Appendix. A final remark about the encoding process is about rate control. To test the efficiency of a video codec, one could just use variable bit rate coding (VBR) and compare various codecs in this manner. In that case, rate control is no longer part of the overall testing and the intrinsic efficiency of a particular codec might be addressed more accurately. However, even with VBR there are many parameters and uncertainties which are to be dealt with. For instance, because of the different nature of I, P, and B pictures, a different quantization scheme should be used

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on July 13,2010 at 07:08:37 UTC from IEEE Xplore. Restrictions apply.

136

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006

for each type of picture. It is not trivial to find an “optimal” mutual proportion of the respective quantizers (degree of freedom). Moreover, because of the certain amount of intelligence in CBR coding, the efficiency of a codec when using CBR could be even higher compared to the case VBR is applied. We set up a modest comparison which confirms the latter statement at several (but not all) bit rates (DivX and WMV9). Whereas the use of VBR more or less cancels the possible negative impact of the rate control algorithm on the overall coding efficiency of a video codec, it has no effect on other important algorithms that are present in an encoder and that can affect the codec’s efficiency (for instance, a bad motion estimation algorithm can hurt the coding efficiency even more than a bad rate control algorithm). In this respect, we consider the rate control algorithm as a fully fledged part of an encoder (as it will often be used in practice). B. Quality Measurement In the context of image processing and video technology, PSNR-Y is by far the most commonly used quality metric. PSNR-Y is the peak signal-to-noise ratio in the luma component. Color information (as available in the chroma components) is not taken into account. The PSNR-Y is calculated as defined in [4]. In order to get a PSNR value for an entire sequence, the average of the PSNR-Y values of the individual frames is calculated. Note that this is only one way to get a value for an entire sequence. Another method could be, for instance, to take the minimum of the individual PSNR-Y values (because a video sequence may be evaluated based on the worst part). PSNR is based on a distance between two images [derived from the metric3 mean square error (MSE)] and does not take into account any property of the human visual system (HVS). There are quality metrics that do make use of these properties. An important and well-known metric of this kind is JND [16], the second quality metric used in the experiments. The JND metric is developed by Sarnoff Corporation and relies on a vision model that provides estimates of the visibility of differences between original and distorted image sequences. This model is based on known physiological and psychophysical principles of human visual discrimination performance. The JND unit is defined such that 1 JND corresponds to a 75% probability that an observer viewing the two input images would be able to see the difference [17]. Similar, a difference of 2 JND denotes a probability. Note that increasing JND values correspond to a decreasing visual quality, whereas increasing PSNR values correspond to an increasing visual quality. Sarnoff Corporation developed a tool, called JNDmetrix-IQ, that was used to obtain the so called sequence JND. This values of this metric are used in the results section of this paper. The exact calculation of the JND values is not straightforward and will not be given here. The JNDmetrix technology is fully disclosed in [18]. The lack of a temporal dimension in both quality metrics can be considered a disadvantage (e.g., to usefully measure quality of multilayered video coding schemes [19]).
3In

IV. EXPERIMENTAL RESULTS In this results section, the coding efficiency of the various codecs will be discussed. It is clear that due to the size of the experiments and place constraints, not all results can be presented. A subclass of the results is given in Table IV and Table V. Concerning Table III, only 11 (out of 30) bit rates per sequence are incorporated and only 4 (out of 6) CIF sequences are enumerated. The sequences Silent and Mother & Daughter are left out because they contain a ribbon of black pixels. This causes the WMV9 codec to apply an extrapolation filter in order to get rid of this black ribbon, resulting in nonrelevant quality measurements. With respect to Table IV, the complete set of measurements for CIF is assembled and averaged out in order to get a better overview of the results. For each sequence and quality metric, the median is given of the differences between the quality values of H.264/AVC and the corresponding values of a particular codec. If H.264/AVC has a better coding efficiency, this difference is positive for the PSNR metric, and negative for the JND metric. This is similar to calculating the average distance between two rate-distortion curves, and, hence, a measure for comparing the coding efficiency of various codecs. Because H.264/AVC has become a de facto benchmark for video compression, it will play a key role in the discussions in the next subsections. A. H.264/AVC Versus MC-EZBC In this section, the coding efficiency of H.264/AVC AHM 2.0 and MC-EZBC is compared. The measurement results of both codecs can provide an assessment of the coding efficiency of current wavelet-based codecs compared to state-of-the-art single-layered codecs. A first general remark is the fact that, for certain bit rates, there are no measurement points for MC-EZBC (which is also the case for other codecs—see later in this section). For low bit rates, this means that MC-EZBC is not able to encode that particular video sequence at such low target bit rates. On the other hand, other codecs (see later in this section) would then produce a bit stream having the lowest possible bit rate (for that codec), thus producing identical bit streams for a certain range of given target bit rates. The same can happen for high bit rates. In case of low bit rates, a codec may also decide to skip some frames. For video sequences with low movement and/or low spatial detail (Head with Glasses, Mother and Daughter), we see that the quality values of the two codecs overlap each other to a certain extend. The results in Table IV indicate that MC-EZBC performs slightly better than H.264/AVC. These results show that, for this type of video content, wavelet-based video coding schemes can achieve a coding efficiency that is comparable with the coding efficiency of H.264/AVC. For video sequences with a higher amount of movement (Foreman and Stefan), we see that H.264/AVC AHM 2.0 performs significantly better than MC-EZBC in terms of PSNR-Y at almost all bit rates. The JND metric yields the same relation between the codecs, but the relative difference is much smaller. We can observe that MC-EZBC, in contrast to H.264/AVC AHM 2.0, is not able to code these difficult sequences at bit

a strict mathematical context.

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on July 13,2010 at 07:08:37 UTC from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006

137

TABLE IV OVERALL GAINS IN CODING EFFICIENCY OF H.264/AVC (CIF)

Fig. 1. Extrapolation of black ribbon by WMV9 encoder (Stefan CIF). (a) Original. (b) WMV9.

rates below 200 kbps. One of the reasons for this is the fact that the coded bit streams have three levels of spatial scalability and four levels of temporal scalability, thus involving a certain amount of overhead which can be critical at low bit rates. For the Stefan sequence, even the average PSNR-Y values of XviD (MPEG-4 Visual SP) are higher than the ones of MC-EZBC at all bit rates. The disparity in delivered quality is limited for scenes characterized by high spatial detail and low amounts of movement (e.g., Mobile and Calendar: 0.74 dB and 0.02 JND). An interesting phenomenon is observed when coding Mother and Daughter at QCIF resolution. In that case, MC-EZBC reaches its highest possible quality (53,1 dB; 0,78 JND) at 1600 kbps. On the other hand, H.264/AVC AHM 2.0 is able to use the higher bit budget to obtain much higher PSNR-Y and much lower JND values (reaching 59,9 dB; 0,50 JND) up to 2600 kbps. While it may be hard to see the difference between 53 dB and 60 dB, it illustrates that H.264/AVC AHM 2.0 has a very flexible quantization scheme which can reach stunning PSNR values. B. H.264/AVC Versus Proprietary Codecs In this section, an overview is given of the coding efficiency of two commercial video codecs (WMV9 and DivX) and how they compare to H.264/AVC AHM 2.0. One of the goals of our experiments was to investigate the possibilities of H.264/AVC in terms of coding efficiency, and this compared with current available proprietary codecs. As mentioned before, the results for WMV9 contain some significant outliers for the sequences Mother and Daughter and Silent. Concerning Mother and Daughter, the two top pixel rows contain a black ribbon of two pixels in height that is partially extrapolated by the WMV9 compressor, thus ruining PSNR and JND values. Also for the Silent sequence, the top two pixel rows of the video area are heavily affected by WMV9 in such a way that a shaded gray ribbon is created. The reason for this phenomenon may be found in the usage of a psychovisual model incorporated in the WMV9 encoder. This could, however, not be verified by the authors. This very same unusualness is present in the Stefan sequences that are encoded by WMV9 (top right of the image), albeit not leading to extreme low quality measurements. The corresponding values in Table IV are printed cursively. To give a visual impression of this behavior, a screenshot of the top ten pixel rows of the first frame of the Stefan sequence is given in Fig. 1. Based on the PSNR values in Table IV, H.264/AVC outperforms both DivX and WMV9. Moreover, H.264/AVC relatively performs better when encoding “complex” sequences. In other

words, the gains are higher when dealing with video sequences belonging to class C (compared to the ones in class B and, a fortiori, compared to the ones in class A). This, together with the previous paragraph, illustrates that the choice of video test sequences is a very important aspect when comparing the coding efficiency of video coding technologies. Looking more closely at the results in Table IV, there are some peculiarities regarding the two quality metrics. First, the PSNR metric indicates that H.264/AVC has a better coding efficiency for the sequences Foreman and Head with Glasses. In contrast, the JND metric suggests that there are no differences between the three codecs in case of Head with Glasses. On top of that, JND suggests for the Foreman sequence that DivX outperforms H.264/AVC and that the gains of H.264/AVC with respect to WMV9 are only marginal. A visual inspection of the sequence Head with Glasses coded by H.264/AVC, DivX and WMV9 at bit rates above 1000 kbps shows that it is very hard to notice any difference between theses coded sequences (confirming the zero-valued differences in JND) while PSNR indicates that there is an overall difference of 1.64 and 2.18 dB (see Table IV). In this case (high bit rate, high quality, and “simple” scene), the JND metric corresponds best to the actual perceived quality whereas the differences detected by PSNR are more for academic use. At bit rates below 1000 kbps, there are clear visible differences between the sequences and the quality of the sequences coded by H.264/AVC is considerably higher, thus confirming the PSNR results. As a result, in situations involving low bit rates and low qualities, the JND metric may not be tolerable. A second remark is related to the Mobile and Calendar sequence. Whilst the PSNR metric implies that DivX has a better coding efficiency than WMV9 at all bit rates, the JND metric indicates the contrary (note the differences in the obtained bit rates in Table III). Again, a small subjective viewing test was set up and comparing the coded sequences of both encoders at various bit rates revealed that the version coded by WMV9 has a better visual quality, thus confirming the JND scores. The version coded by DivX contains more blocking artifacts that happen to be more annoying. The corresponding sequences coded by H.264/AVC have a significantly higher quality than DivX and WMV9, which is confirmed both by PSNR and JND. Another observation is that WMV9 and DivX do not encode all the frames of a video sequence if the target bit rate is relatively low (300 kbps and below). Instead, a compressed frame, preceding one or more skipped frames, is displayed several times in a row in order to compensate for the frames that were not encoded. As a result, the frames that were actually encoded have a relatively good quality, but the motion smoothness deteriorates. Thus, these codecs make a clear trade-off between the sharpness of individual frames and the smoothness of the video sequence. As a matter of fact, this behavior can

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on July 13,2010 at 07:08:37 UTC from IEEE Xplore. Restrictions apply.

138

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006

TABLE V SUBSET OF QUALITY MEASUREMENTS FOR CIF SEQUENCES

be controlled by a parameter in the respective encoders (the default setting for this parameter was used in the experiments). Because coded sequences containing one or more skipped frames cannot be compared to the original by making use of PSNR or JND, these sequences were left out of the results (represented by dashes in Table V). In fact, there are no quality metrics that can cope with this kind of situations. C. H.264/AVC Versus MPEG-4 Visual The XviD implementation of MPEG-4 Visual SP has a coding efficiency that is in general similar to DivX, with the exception that the differences between the two are slightly larger when coding “complex” sequences (class C). Looking at Table V, we see that XviD can encode the test sequences at lower bit rates than the other codecs without skipping frames (with the exception of H.264/AVC) which is in line with the objectives of

the Simple Profile of MPEG-4 Visual. In Table IV, PSNR and JND contradict each other regarding Stefan when comparing H.264/AVC and XviD (this contradiction is present at bit rates up to 1400 kbps; see also Table V). A visual inspection (at those ‘contradicting’ bit rates) shows that the sequences coded by H.264/AVC contain some bad parts (approximately 70 frames in total) with severe ghosting and blocking artifacts whereas the sequences coded by XviD give evidence of a more stable but clearly lower quality during the entire sequences. The disparity between the two quality measures lies in the fact that JND makes a fuss over those bad parts whereas these bad parts are averaged out by PSNR. Which of the two metrics is right is dependent on the preferences of the end user. In the remainder of this section we try to verify one of the requirements of H.264/AVC, namely “having a capability goal of 50% or greater bit rate savings from H.263v2 (with Annexes

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on July 13,2010 at 07:08:37 UTC from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006

139

Fig. 2. Bit rate savings of H.264/AVC AHM 2.0 (H.264/AVC main profile) compared to DivX 5.1 (MPEG-4 visual ASP). (a) PSNR, CIF; (b) averages per sequence (PSNR).

DFIJ&T) or MPEG-4 Advanced Simple Profile at all bit rates” [20]. In other words, H.264/AVC should use half of the bits to code a video sequence at the same quality compared with MPEG-4 Visual ASP. Using some straightforward calculations, this verification can be done based on our test results. First, it is important to note that one cannot actually compare the two video coding standards in question: only implementations of those standardized specifications can be compared. As for H.264/AVC, only the modified version of the reference software was at our disposal because other implementations of the H.264/AVC specification are still in development. To verify the above mentioned requirement, an implementation of MPEG-4 Visual ASP should be examined, being DivX 5.1 in our case. As mentioned before, the quality measurements cannot be used immediately. For each bit rate, the resulting qualities were measured. However, for each measured quality, the corresponding bit rates as delivered by the codecs are needed here. This can be done by applying a linear interpolation in the following manner. For a given sequence and resolution, a (corresponding with a bit stream coded by DivX quality at a bit rate ) is searched in the set of qualities (corresponding with the same sequence and resolution). Because an exact match is unlikely to exist, this results in two quality and for which applies that values (1) and that there are no and such that

The actual bit rate saving of H.264/AVC AHM 2.0 compared to DivX, expressed as a percentage, is then (4) The value was calculated for all quality values (based on PSNR) of DivX, for all sequences, and for all resolutions. The results are shown in Fig. 2. In Fig. 2(a) the bit rate savings of H.264/AVC AHM 2.0 relative to DivX are shown per measured quality value (in terms of PSNR) and for all CIF sequences. We clearly see that the bit rate savings are heavily depending on the type of content of the video. It is even not so that sequences of the same class yield similar bit rate savings (e.g., Mobile and Calendar, and Stefan). Significant bit rate savings are obtained for very low as well as for very high qualities. This confirms the fact that H.264/AVC is designed to support a wide range of bit rates. The most remarkable aspect of Fig. 2(a) is the fact that the bit rate savings for Mobile and Calendar are higher than the other sequences. Mobile and Calendar is the most complex scene we used in the experiments and it has the notorious reputation of being a benchmark for video coding algorithms. Many of the advanced features of H.264/AVC seem to encode Mobile and Calendar very efficiently. In Fig. 2(b), the average bit rate savings for each sequence are shown. Putting everything together, H.264/AVC AHM 2.0 achieves an average bit rate saving of 42% at QCIF resolution and 38% at CIF resolution, if the quality is measured in terms of PSNR. Although these numbers do not clearly confirm the goal of achieving 50% bit rate savings, it does show that significant savings can be obtained by making use of tools available in the H.264/AVC specification. Note that the above calculations can be done based on the measured JND values in an analogue manner. V. CONCLUSION

Using these two values the relative position and can be calculated

of

between

(2) The bit rate of H.264/AVC AHM 2.0 that corresponds with is then given by (3)

In this paper, an overview was given of the rate-distortion performance of five state-of-the-art video codec technologies in terms of PSNR and JND. The measurements show that the tools that are incorporated in the H.264/AVC standard make it possible to outperform current proprietary and standards-based implementations in almost all cases.

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on July 13,2010 at 07:08:37 UTC from IEEE Xplore. Restrictions apply.

140

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 16, NO. 1, JANUARY 2006

H.264/AVC’s goal of achieving a 50% bit rate savings is more or less satisfied with respect to the DivX 5.1 implementation of the MPEG-4 Visual Advanced Simple Profile if PSNR is used as quality metric. In some cases, the use of PSNR and JND as quality metric leads to different conclusions when comparing the coding efficiency of various codecs. Our results show that the JND metric is not to be used at low qualities which leaves only PSNR as an acceptable (but imperfect) quality metric for those situations. At high qualities, on the other hand, JND is more consistent with the subjective quality compared to PSNR. APPENDIX VIDEO CODEC CONFIGURATIONS This appendix covers the parameter settings of the five video codecs used in the tests. All given values apply to the Foreman test sequence (CIF resolution), encoded at a bit rate of 300 kbps and at a frame rate of 30 Hz. Given the settings, it should be straightforward to use them for the other video sequences and other bit rates. DIVX 5.1 -bv1 300 -psy 0 -key 16 -b -g -q -sc 50 -pq 192 -vbv 3000000,3145728,2 359 296 -profile 0 -nf WINDOWS MEDIA VIDEO 9 -v_interlace 0 -v_frameratedown 1:1 -v_mode 0 -v_bitrate 300 000 -v_buffer 5000 -v_quality 80 -v_keydist 500 -v_profile MP -v_performance 100 XVID 0.9.1 ./xvid_encraw -w 352 -h 288 - b 300 -f 30 -i foreman_cif.yuv -t 0 -n 300 -q 5 -m 1 -o out_tmp.mp4u -mt 1 -mv 0 ./xvid_decraw -w 352 -h 288 -i out_tmp.mp4u -t 1 -d 1 -m 0 MC-EZBC (WOODS) ./pull mo.bit -r 300 -t 0 -s 0 ./3dsbcde mo_300_V01.bit ../foreman_cif.yuv._decoded_frame_%04d.yuv ../foreman_cif.yuv._frame_%04d.yuv stats_foreman_cif_woods_0300.txt

REFERENCES
[1] Applications and Requirements for Scalable Video Coding, MPEG-document ISO/IEC JTC1/SC29/WG11 N5540, Mar. 2003. [2] I. Burnett, R. Van de Walle, K. Hill, J. Bormans, and F. Pereira, “MPEG-21: goals and achievements,” IEEE Multimedia, vol. 4, no. 10, pp. 60–70, Oct. 2003. [3] T. Wiegand, G. J. Sullivan, G. Bjøntegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, Jul. 2003. [4] T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, and G. J. Sullivan, “Rate-constrained coder control and comparison of video coding standards,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 688–703, July 2003. [5] H.264/AVC Reference Software [Online]. Available: http://iphome. hhi.de/suehring/tml/download/ [6] Proposed Draft Description of Rate Control on JVT Standard, ISO/IEC JTC1/SC29/WG11 and ITU-T SG16/Q.6, JVT-document JVT-F086, Dec. 2002. [7] F. Pereira and T. Ebrahimi, Eds., The MPEG-4 Book. Englewood Cliffs, NJ: Prentice Hall, Jan. 2002. [8] MPEG-4 Video Verification Model 19.0 (VM-19), MPEG-document ISO/IEC JTC1/SC29/WG11 N6184, Dec. 2003. [9] XviD Software Package [Online]. Available: http://files.xvid.org/downloads/xvidcore-0.9.1.tar.gz [10] DivXNetworks, Inc. [Online]. Available: http://www.divx.com/divx/ [11] The Official DivX 5.1 Guide [Online]. Available: http://www.divx.com/ divx/ [12] “Windows Media 9 Series Capabilities and Benefits Overview,” Microsoft Corporation, Redmond, WA, Sep. 2002. [13] Windows Media Video 9 VCM [Online]. Available: http://www.microsoft.com/windows/windowsmedia/9series/codecs/vcm.aspx [14] S.-T. Hsiang and J. W. Woods, “Embedded video coding using invertible motion compensated 3-D subband/wavelet filter bank,” Signal Process.: Image Commun., vol. 16, pp. 705–724, May 2001. [15] Virtual Dub [Online]. Available: http://www.virtualdub.org/ [16] “Measuring Image Quality: Sarnoff ’s JNDmetrix Technology,” Sarnoff Corporation, Princeton, NJ, Jul. 2002. [17] “JND: A Human Vision System Model for Objective Picture Quality Measurements,” Sarnoff Corporation, Princeton, NJ, Jun. 2001. [18] “Objective perceptual video quality measurement using a JND-based full reference technique,” Alliance for Telecommunications Industry Solutions ATIS, Tech. Rep. T1.TR.PP.75-2001, Oct. 2001. [19] S. Lerouge, P. Lambert, and R. Van de Walle, “Multi-criteria optimization for scalable bitstreams,” in Proc. 8th Int. Workshop Visual Content Processing and Representation, vol. 9, Madrid, Spain, 2003, pp. 122–130. [20] Requirements for AVC Codec, MPEG Document, ISO/IEC JTC1/SC29/WG11N4672, Mar. 2002.

Authorized licensed use limited to: Korea Advanced Institute of Science and Technology. Downloaded on July 13,2010 at 07:08:37 UTC from IEEE Xplore. Restrictions apply.