J. Vis. Commun. Image R. 17 (2006) 358–375 www.elsevier.

com/locate/jvci

Flexible macroblock ordering in H.264/AVC
P. Lambert *, W. De Neve, Y. Dhondt, R. Van de Walle
Ghent University—IBBT, Multimedia Lab, Department of Electronics and Information Systems, Sint-Pietersnieuwstraat 41, B-9000 Ghent, Belgium Received 30 April 2004; accepted 2 May 2005 Available online 11 August 2005

Abstract H.264/AVC is a new standard for digital video compression jointly developed by ITU-TÕs Video Coding Experts Group (VCEG) and ISO/IECÕs Moving Picture Experts Group (MPEG). Besides the numerous tools for efficient video coding, the H.264/AVC specification defines some new error resilience tools. One of them is flexible macroblock ordering (FMO) which is the main focus of this paper. An in-depth overview is given of the internals of FMO. Experiments are presented that demonstrate the benefits of FMO as an error resilience tool in case of packet loss over IP networks. The flexibility of FMO comes with a certain overhead or cost. A quantitative assessment of this cost is presented for a number of scenarios. FMO can, besides for pure error resilience, also be used for other purposes. This is also addressed in this paper. Ó 2005 Elsevier Inc. All rights reserved.
Keywords: H.264/AVC; FMO; Error resilience; Video coding

1. Introduction When looking at the development of previous video coding standards, one observes that more tools for error resilience are adopted with each new standard. The main reason for this is that more recent standards are also designed to be used
*

Corresponding author. E-mail address: peter.lambert@ugent.be (P. Lambert).

1047-3203/$ - see front matter Ó 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.jvcir.2005.05.008

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

359

in error prone environments. Video coding standards such as MPEG-1 or H.262/ MPEG-2 were mainly designed for storage and digital television and, hence, for reliable media. More recent standards such as H.263 [1] or MPEG-4 Visual have some extra features to combat possible transmission errors since they are also developed to be used in lossy networks. Likewise, H.264/AVC was designed to be deployed in a wide variety of networks and applications. As a result, one of the requirements was to adopt robust error resilience tools, as stated in [2]. For an overview of the H.264/AVC specification, the reader is referred to [3] or to other papers in this issue. Flexible macroblock ordering (FMO) is the most striking new tool for error resilience within the H.264/AVC specification. It allows great flexibility to define the coding order of macroblocks within a picture. A similar, albeit much simpler, concept is defined in H.263 Annex K (Slice Structured mode). This mode defines two submodes: rectangular slices and arbitrary slice ordering (ASO). The latter specifies that slices may appear in any order within the bit stream whilst the former specifies that slices can occupy a rectangular region of a picture. Such slices contain macroblocks in raster scan order within that region. Section 2 explains why FMO offers much more than the possibilities of these two submodes. We implemented some of the new possibilities that are supplied by FMO by modifying version JM7.3 of the reference software of H.264/AVC [4]. The fact that macroblocks can be arbitrarily grouped in slices offers new possibilities regarding error concealment. The benefits thereof can be assessed by simulating transmission errors on H.264/AVC bit streams and by calculating the quality of the decoded results by means of peak signal-to-noise ratio (PSNR). In case of severe losses, the PSNR value may be meaningless and a visual inspection of the result will be of greater value. This is a first aspect of FMO that will be examined in this paper. A second aspect is the fact that the flexibility of FMO introduces extra syntax elements and hence some overhead in terms of bits. This overhead depends on many parameters. By encoding video sequences with varying parameters (once with and once without FMO) one can get a quantitative notion concerning the cost of FMO. This information is useful when one wants to make the trade-off between the extra overhead and an improved error resilience. Measurements concerning this overhead are presented in the results section of this paper. The rest of this paper is organized as follows. In the next section, an overview is given of new tools for error resilience incorporated by the H.264/AVC specification. Section 3 provides an overview of the algorithmic changes necessary to implement and enable FMO, and how we did this based on the reference software. The results of our experiments are presented in Section 4 and the conclusions are given in Section 5.

2. Overview of error resilience tools in H.264/AVC The H.264/AVC standard not only comprises tools for the efficient compression of digital video data, but it also specifies a number of tools that are aimed at improved error resilience. Many of these tools are present in previous video coding

360

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

standards and will not be described in this section. The reader is referred to [5] for a short overview of these tools. H.264/AVC introduces three new tools for error resilience, namely redundant slices, parameter sets, and flexible macroblock ordering. The latter two tools are extensively used in our tests whereas the former will only be described briefly in this section. The available tools can be useful for the transmission of video over both IP networks [5] and wireless networks [6]. A first new tool is the use of redundant slices. A coded macroblock is by default contained by only one slice. By making use of redundant slices, an encoder can place one or more coded representations of a particular macroblock into the bit stream. It is important to note that this is not equivalent to duplicating packets (transport based redundancy). Indeed, the redundant representations of a macroblock can be coded using different settings such as the quantization parameter. If a decoder does not receive a certain macroblock, it can roll back and use a redundant representation of the same macroblock (usually coded with a lower quality, i.e., a higher quantization parameter value). A second tool is the concept of parameter sets, namely Sequence Parameter Sets (SPSs) and Picture Parameter Sets (PPSs). These sets contain information that is unlikely to change. A SPS applies to a series of consecutive coded pictures (defined as a coded video sequence), i.e., pictures between two IDR (Instantaneous Decoding Refresh) pictures. A PPS applies to all slices of one or more pictures within a coded video sequence. The use of parameter sets implies that infrequently changing information is not sent together with the coded representation of the video data. Since every conforming H.264/AVC bit stream requires that at least one SPS and one PPS is available (via out-of-band transmission or embedded within the bit stream itself), the use of parameter sets is not a tool specifically aimed at error resilience, but the clever use of it can notably enhance error resilience. For instance, parameter sets can be sent repeatedly within the bit stream or they can be sent separately over a reliable channel. Each slice header contains a pointer to the PPS in use and the PPS, in turn, has a reference to the SPS in use at that time. A third tool, which is the main focus of this paper, is flexible macroblock ordering. While this tool is available in the Baseline Profile and in the Extended Profile of H.264/AVC, this paper also focuses on FMO in the context of the Main Profile (in this case, the resulting bit streams are non-compliant). Using FMO, macroblocks are no longer assigned to slices in raster scan order. Instead, each macroblock can be assigned freely to a specific slice group (i.e., a set of slices) using a macroblock allocation map (MBAmap). There can be up to eight slice groups in one picture and within a slice group, macroblocks are coded in default scan order. The macroblocks within a certain slice group can, furthermore, be grouped into several slices. The case where there is only one slice group within a picture is identical to the case that no FMO is used at all. In other words, setting the parameter num_slice_groups to 1 disables FMO. FMO is a very powerful tool for error resilience. For instance, slice groups can be constructed in such a way that, if one slice group is not available at the decoder, each ÔlostÕ macroblock may be surrounded by macroblocks of other slice groups (above, below, right, and left). In that case, the missing macroblock can be reconstructed in a

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

361

very effective way using interpolation based on surrounding (available) sample values. This also keeps the effects of prediction drift under control compared to the case no FMO was used. This technique is closely related to audio interleaving in the case of streaming audio [7]. An example of using FMO to interleave macroblocks is shown in Fig. 1. It is clear that, if a slice is lost, a spatial interpolation algorithm will be much more effective in the case that FMO is applied. The use of FMO is, more generally speaking, an example of a multiple description code with each slice group acting as a description: slice groups are independent from each other and a picture can be decoded even if not all slice groups are available (or even if slices are missing within one or more slice groups). Another important quality of FMO-based error resilience is that, unlike techniques that require feedback from the communication channel (e.g., channel-adaptive source coding techniques) it is particularly well suited to real-time, ultra-low-delay applications (e.g., video conferencing). H.264/AVC specifies seven types of FMO. FMO type 6 is the most general type where the entire MBAmap is actually coded into a picture parameter set (if the signaled MBAmap is smaller than the picture size in macroblocks, one could decide to repeat the signaled MBAmap to avoid sending the whole MBAmap). The other six types are special cases where the coded representation of the MBAmap is much smaller due to certain patterns in the MBAmap. Examples of these six types are illustrated in Fig. 2. In the case of FMO type 0, each slice group has a maximum number of macroblocks that it can contain in raster scan order before another slice group is started (interleaved slice groups). Coded in a PPS are the number of slice groups and an integer value for each slice group by means of the syntax elements run_length_minus1[iGroup]. FMO type 1 is also known as scattered slices or dispersed slices. Macroblocks are assigned to a slice group based on the total number of slice groups and the following formula: macroblock number ! slice group number i ! ði mod wÞ þ ðði=wÞ Á nÞ=2Þmod n;

Fig. 1. Using FMO to interleave macroblocks (1 slice lost).

362

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

Fig. 2. Different types of FMO.

where n is the number of slice groups (num_slice_groups_minus1 + 1; coded in a PPS) and w is the width of the picture in terms of macroblocks (pic_ width_in_mbs_minus1 + 1; coded in an SPS). Note that the operator ‘‘/’’ denotes an integer division with truncation. The MBAmap does not have to be coded because this formula is known to both the encoder and decoder. Fig. 2B shows the most simple situation with only two slice groups, whereas Fig. 1B shows an example with four slice groups. FMO type 2 uses one or more rectangular slice groups and a background. The rectangular slice groups are allowed to overlap each other but macroblocks in overlapping areas can be part of only one slice group. The order in which the slice groups are declared in a PPS determines which slice group a macroblock resides in. The macroblock number of the top left macroblock and the bottom right macroblock of each slice group is coded into a PPS by means of the syntax elements top_left_iGroup and bottom_right_iGroup. FMO types 3, 4, and 5 are commonly known as evolving slice groups. In these situations, the configuration of slice groups can change every picture by making use of certain periodic patterns. Notwithstanding the fact that the slice group configuration is to be coded in a PPS, a syntax element called slice_group_change_cycle is included in the header of each slice to keep track of the current position within the cycle of changes. The reason for this is the fact that it would be inefficient to insert a PPS for every picture in the bit stream to reflect these changes. Another motivation to code this into the slice header is the fact that this syntax element describes something that can be different for every picture and, hence, not appropriate to be included in a PPS.

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

363

The latter three types of FMO only have two slice groups of which one enlarges with a number of macroblocks as opposed to the other. In the case of FMO type 3 (box-out), one slice group starts off at the center macroblock and it will grow by a number of macroblocks (slice_group_change_rate_minus1 + 1; coded in a PPS) in a rotational manner. The direction can be clockwise or counterclockwise depending on the value of the syntax element slice_group_change_direction _flag (coded in a PPS). FMO type 4 indicates that one slice group evolves in a raster scan manner. In other words, a certain amount of macroblocks (slice_ group_change_rate_minus1 + 1 to be more precise) will be added to this slice group in raster scan order or in reverse raster scan order (again depending on the syntax element slice_group_change_direction_flag). This means that one slice group will evolve from top to bottom or vice versa. FMO type 5 is the horizontal counterpart of FMO type 4. One slice group will evolve from left to right or vice versa. The same syntax elements are used as in type 4 to code the direction and the size of the successive enlargements. FMO can, besides for pure error resilience, also be used for other purposes. For instance, rectangular slice groups can be used as regions of interest (ROIs) containing ÔinterestingÕ parts of a picture within a video sequence. The areas outside the ROIs could then be coded with a lower quality or even omitted to save bandwidth. It is even possible to change the coordinates of the ROIs using picture parameter sets within the bit stream. FMO can also be used to create isolated regions within video sequences [8]. To do so, the encoder not only has to disable intra prediction across slice boundaries, but also has to restrict the motion vectors in such way that they only refer to decoded macroblocks belonging to the same slice group. Note that an encoder may use the motion-constrained slice group set Supplemental Enhancement Information (SEI) message to signal to the decoder that it has restricted motion vectors in this way (see Section D.1.19 in [9]). Slice groups that are encoded in this way are fully self-contained1 which, in turn, improves error resilience. This fits very well if every picture has the same number of slice groups. If the number of slice groups fluctuates throughout the video sequence, isolated regions will introduce many intra coded regions, namely in those pictures where the number of slice groups changes. Those intra coded regions will have an impact on the coding performance. Another drawback is the fact that the coding performance could also drop due to a smaller or dispersed search space for inter prediction. A less flexible technique for isolated regions was defined in H.263 Annex R. The application example of video mixing Multipoint Control Unit (MCU) is particulary interesting in this context.

1 With the notable exception of the deblocking filter which crosses slice boundaries; this deblocking filter can be turned off by changing the parameter disable_deblocking_filter_idc.

364

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

3. Implementation of FMO In this section, we discuss the conceptual differences in an H.264/AVC encoder and decoder if one wants to use FMO from an implementation point of view. There are no major differences between an FMO enabled H.264/AVC decoder and an H.264/AVC decoder without support for FMO. A high-level overview of the operation of an H.264/AVC decoder is given in Fig. 3 by means of pseudo code, something that is also reflected by our changes to the reference software. The pseudo code that is given in this section is based on the pseudo code that can be found in [10,11]. Both decoders have to decode macroblocks slice-by-slice. The only difference in case of FMO is that the decoder has to check what slice group a macroblock is part of and it has to determine the number of the next macroblock in the current slice. It does this by making use of the MBAmap. The algorithm at the encoder-side is a bit more complicated because macroblocks are no longer assigned to slices in raster scan order (see Fig. 4). For each picture that is to be encoded, our encoder checks if an update of the MBAmap is necessary (defined in a configuration file). If so, a PPS has to be constructed and inserted into the bit stream. After that, our encoder calculates the macroblock numbers that are part of each slice group; encodes them one after the other; and, if necessary, further arranges them into slices within the current slice group. It is important that the encoder does not perform intra prediction across slice boundaries to guarantee that slices are self-contained (note that slice group boundaries are also slice boundaries). On top of that, the encoder needs to reset the contexts of the context adaptive entropy coding scheme.

Fig. 3. Pseudo code of H.264/AVC decoder without (left) and with (right) FMO.

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

365

Fig. 4. Pseudo code of H.264/AVC encoder without (left) and with (right) FMO.

To experiment with different types of FMO, we modified version JM7.3 [4] of the reference software of H.264/AVC (the latest version at the time of the experiments). Our current implementation of the encoder allows one to encode a video sequence using all seven types of FMO. Furthermore, it is possible to change the slice group configuration or even the FMO type in the course of time by inserting PPSs into the bit stream. This is done by means of a configuration file with an inhouse developed syntax which is read by the encoder. Our modified decoder is able to decode bit streams encoded with any type of FMO (most of this functionality was already present in the JM7.3 decoder). This decoder also has some error concealment techniques: missing macroblocks are reconstructed using an interpolation technique available in the reference software; slices may arrive out of order; and the modified decoder is capable of handling an event in which any given coded slice may be lost. The interpolation algorithm is explained in [12]. Note that most of the functionality to decode FMO was already present in the JM7.3 decoder, but our modifications prevent that this decoder crashes if slices are lost or received out of order. Also note that in the meantime, the latest version of the reference software (at the time of writing) has support for FMO at the encoder (with the restriction that only one PPS can be created per coded sequence).

366

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

4. Experimental results 4.1. Visual quality enhancements in case of packet loss In this section, we describe how FMO type 1 can be used to perform efficient error concealment. An experiment, which was conducted to illustrate this, is described in this first paragraph. In what follows, a description is given about how some bit streams were encoded; how they were put through a packet loss simulation; and how they were decoded with some error concealment techniques. The video sequences Stefan and Table were first extended by pasting each of them six times in a row sequentially, generating video sequences of 60 s at 30 frames/s. These extended sequences were encoded by making use of a slice size of 50 macroblocks; a GOP (Group of Pictures) length of 18 pictures; a GOP structure IBBP and IP; and quantization parameter (QP) values of 40/40/42 and 28/28/30 (for I/P/B slices). Note that the chosen QP values represent a very low quality and a more moderate quality, respectively, and that a QP value that represents a very high quality was not incorporated in the test due to time constraints (the meaning of quality also depends on the application in question). This was done once without FMO and once with FMO type 1 using four slice groups. Also note that the first picture of the sequence was intra coded to compensate for the artificial scene cut. It is important to know that Context-Adaptive Binary Arithmetic Coding (CABAC) was used as entropy coding scheme. This means that the resulting bit streams are non-compliant. Hence, these experiments investigate FMO in the context of the Main Profile of H.264/AVC. This test setup is similar to the one described in [10], where FMO is examined in the context of the Baseline Profile. Packet loss was simulated in the following way. The size of every NAL unit of the generated bit streams (Annex B syntax) is smaller than 1500 bytes, which is a typical size for the Maximum Transfer Unit (MTU) when transmitting data over the Internet. Because of that, we assume that every Network Abstraction Layer (NAL) unit will be transmitted as the payload of only one network packet. As a result, all simulations of packet loss are done on the level of NAL units within the generated bit streams, namely by dropping certain NAL units. Note that every coded slice is put in a separate NAL unit. Based on a measurement methodology described in [13,14], we chose 1, 2, and 5% as typical values for average packet loss on the Internet. The simulation only addresses uniform packet loss and does not cover burst packet loss. Because FMO is used as a tool that facilitates spacial error concealment, it is beneficial if at least a few packets per picture arrive. When all packets of a given picture are lost in a burst of losses, other techniques than FMO should be used to conceal this kind of errors. By making use of a random number generator, a packet loss pattern was generated for the three values of packet loss by means of a bit-map file. These three files were used to corrupt each encoded bit stream to make a fair comparison. Note that NAL units containing an SPS or a PPS were never dropped during the simulation. For the decoding of the corrupted bit streams, we used our modified version of the reference software decoder, as described in Section 3. If the decoder detects that a

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

367

NAL unit is missing, the macroblocks of the corresponding slice are reconstructed using a spatial interpolation algorithm provided by the reference software [12]. If no FMO was used, then an area covered by connected macroblocks will be interpolated whereas the macroblocks are dispersed if FMO was applied by the encoder. Interpolation of large areas could possibly be more efficiently performed along the temporal axis by using decoded macroblocks that are present in a previous picture. However, because the reference software would have to be substantially rewritten, we did not apply this kind of error concealment. Note that, within the scope of this paper, the aim of these experiments was not to simulate current error prone networks with the highest possible accuracy nor to present state-or-the-art error concealment techniques, but rather to illustrate the benefits of FMO as an error resilience tool for H.264/AVC video coding. By improving both FMO enabled encoders and intelligent decoders, the gains of FMO will be even greater than the results that are presented in this section. The results of the experiments are illustrated in Fig. 5 by means of the average Y-PSNR of each decoded bit stream. The abbreviation HQ (resp. LQ) in the legend stands for high quality (resp. low quality) which means that a low (resp. high) quantization parameter was used. In all graphs, we observe that the difference in average PSNR between the cases with and without FMO becomes bigger when the percentage of uniform packet loss rises. The differences in PSNR are also bigger when the bit streams are encoded with high quality. The gains in Y-PSNR in the case

Fig. 5. Decrease in objective quality due to uniform packet loss.

368

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

of Stefan (resp. Table) range from 0.2 to 3.4 dB (resp. 0.2–2.1 dB) with an average of 1.9 dB (resp. 1.3 dB). Another observation is that in all cases the fluctuation of the PSNR in the course of time is much larger when no FMO is used: both the standard deviation of the PSNR and the difference between maximum and minimum PSNR is larger within each decoded video sequence. This is given in Table 1. It is important to note that average PSNR may not be the best or most accurate measure to express the resulting visual quality. Therefore, a small informal subjective test was conducted to evaluate the human perception of the decoded bit streams by means of some expert viewers. The most annoying aspect was the ÔflickeringÕ due to the reconstruction of lost slices. This effect is much worse when no FMO is applied, mainly because the errors are located along the entire width of the video pane. In all cases, the viewing subjects rated the FMO coded video sequences as more pleasant to watch. A screenshot of a typical frame is shown in Fig. 6. A last remark about the presented experiment is the fact that the comparison between the cases with and without FMO is, among other things, based on the use of equal (and constant) quantization parameters. Because FMO introduces some overhead (see next sections), the resulting bit rates of the cases with and without FMO are not the same. A more correct method would be to make sure that these bit rates are the same. However, this would require an intelligent rate control mechanism at the encoder-side, which we do not have at our disposal. Nevertheless, we tried to estimate the differences in PSNR for bitstreams having the same bit rate, once encoded with and once encoded without FMO. This was done by trial and error.
Table 1 Standard deviation and range of PSNR with and without FMO (dB) Stefan FMO IBBP HQ 1% 2% 5% std.dev. max–min std.dev. max–min std.dev. max–min 2.6 14.2 3.8 16.5 4.3 18.0 Table HQ 1% 2% 5% std.dev. max–min std.dev. max–min std.dev. max–min 2.7 13.2 3.8 14.5 3.8 17.7 LQ 1.2 7.6 1.8 8.5 2.0 11.3 HQ 3.7 14.8 4.1 16.4 3.7 18.2 LQ 1.7 8.8 2.2 11.4 2.4 11.8 HQ 3.8 16.5 4.7 17.5 4.4 19.6 LQ 1.7 10.1 2.6 11.2 2.7 13.2 HQ 4.6 16.5 4.4 19.2 4.1 20.9 LQ 2.4 19.4 2.8 12.7 2.8 15.1 LQ 1.1 6.1 1.4 7.9 1.8 8.9 IPPP HQ 3.9 14.6 4.5 18.6 4.3 18.9 LQ 1.4 6.5 1.9 8.9 2.2 9.6 No FMO IBBP HQ 4.2 18.2 5.0 20.3 5.3 21.4 LQ 1.7 9.0 2.3 11.5 2.6 13.0 IPPP HQ 5.3 18.5 5.5 21.8 4.7 22.4 LQ 2.3 9.6 2.8 12.1 2.8 13.5

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

369

Fig. 6. Picture 11 from Stefan, 5% packet loss, no B slices, high quality.

By decreasing the quantization parameters step-by-step, the bit rate of the bit stream without FMO was compared with the bit rate of the corresponding bit stream with FMO until a match was found within a 1% margin of error. The average PSNR raised by 0.9 dB, which means that the dotted lines of the graphs in Fig. 5 would rise. This would mean that FMO is only useful if the packet loss exceeds 2% (which is confirmed by most of the graphs in Fig. 5). Again, this is only an indication because the PSNR of all cases could be raised by making use of an efficient rate control algorithm. 4.2. Overhead of FMO In this section, we present the results of a number of experiments that were set up to get an accurate estimate of the overhead introduced by FMO in a number of scenarios. To be more specific, two video sequences (Stefan and Table at CIF resolution) were encoded multiple times with varying parameters. This was done with and without the use of FMO. The cost of FMO in terms of bits can then be examined by looking at the sizes of the bit streams that were encoded with the same parameters, with the exception of the syntax elements that are related to FMO. It is important to note that no rate control algorithm was applied in the encoder. A fixed set of quantization parameters was chosen instead. In a first series of tests, FMO type 2 was used by the encoder to ÔfollowÕ an object of interest in the video sequence by means of a rectangular ROI. The object in question was not tracked in an automatic way, rather we implemented a tool that allows one to manually specify the ROI frame-by-frame. The changing coordinates of the rectangle throughout the video sequence are given in Table 2. Note that our encoder does not encode the ROI as an isolated region as explained in Section 2. Implementing this functionality would require far-reaching modifications in the reference software of H.264/AVC. It is, however, possible to encode the entire ROI using only intra prediction but this has a severe impact on the coding performance. All bit streams are coded with a GOP length of 18 (if applicable); CABAC as the entropy

370

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

Table 2 Coordinates (top left and bottom right macroblock number) of changing ROI for Stefan Frame number 0–5 6–11 12–33 34–64 65–94 95–176 177–190 191–226 227–235 236–247 248–300 Coordinates (99,348) (96,345) (69,344) (69,341) (93,345) (73,350) (114,327) (68,342) (93,346) (95,349) (98,351)

coding algorithm; and five reference frames. The encoding parameters that were changed are the GOP structure (I only, IP, IBP, and IBBP), the quantization parameter (28/28/30 and 40/40/42 for I/P/B slices), the number of macroblocks per slice (50, 100, and 400), and whether or not FMO was used. Note that a picture at CIF resolution contains 396 macroblocks. The results of these tests are summarized in Table 3. It should be noted that more accurate results could be obtained by varying a larger number of encoding parameter or by using more values for those parameters. However, processing time becomes an issue if one considers a fairly large number of cases. Nevertheless, these results give a good impression of the overhead of FMO. A first observation is that the overhead of FMO is higher if there are few slices in a picture (i.e., many macroblocks per slice). Indeed, if there is only one slice per picture (without FMO), then the use of FMO implies the creation of at least one extra slice to fill the slice group embodying the ROI, which has a relatively high cost. This overhead is much larger compared to the case where there are already many slices

Table 3 Relative cost of FMO type 2 for CIF resolution with different slice sizes (%) GOP QP (I/P/B) Stefan MBs per slice 50 I IBP IBBP IPP 28/—/— 40/—/— 28/28/30 40/40/42 28/28/30 40/40/42 28/28/— 40/40/— 0.4 0.8 0.6 1.8 0.1 0.6 À0.3 0.6 100 0.7 1.4 1.7 2.5 0.5 0.8 À0.3 1.9 400 1.0 1.8 4.1 7.0 3.8 8.6 2.3 7.7 Table MBs per slice 50 À0.5 À0.7 0.2 1.6 0.1 2.0 0.1 1.4 100 0.0 0.6 0.4 2.9 0.3 3.0 0.4 2.1 400 0.5 1.6 1.0 4.5 0.9 5.0 0.9 4.1

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

371

per picture. In the latter case, the fact that FMO uses one or two slices more does not make much difference. Another observation is the fact that the quantization parameter, and hence the bit rate, also has a clear impact on the relative overhead of FMO. To be more precise, the relative overhead of FMO is larger at low bit rates. If a higher QP is used, then there are less bits utilized for the coding of the video data while the overhead of FMO (slice headers and extra PPSs) remains constant. Comparing the cases with a high and low QP in Table 3, one observes that the relative overhead significantly differs in almost all cases. In contrast to the slice size and quantization parameter, the GOP structure does not have a clear impact on the overhead of FMO. The median (resp. mean) of all costs in this test is 0.9% (resp. 1.7%) which means that in this experiment FMO introduced an overall overhead of approximately 1%. There are some high costs (max. 8.6%) resulting in a standard deviation of 2.0. An interesting peculiarity is the fact that the cost of FMO is negative in some cases. This means that the ÔfixedÕ overhead of extra PPSs is counterbalanced by gains in coding efficiency, presumably caused at altered slice boundaries. These are side effects that are based on coincidence and, hence, cannot be generalized. It can be concluded that the overhead introduced by FMO in this scenario is more than acceptable given the extra possibilities that can be exploited. Notwithstanding the fact that the results are valid for FMO in the context of the Main Profile, they can be an indication for the overhead in the same scenario using another Profile. This scenario of a time-varying rectangular ROI can also be achieved by using FMO type 6. Because the slice structure is identical to the case where FMO type 2 is applied, the coding efficiency will be exactly the same. The only change in overhead will be the coding of the MBAmap in the PPSs. Every macroblock is assigned to a certain slice group using three bits (there is a maximum of eight slice groups per picture). As a result, the overhead of FMO type 6 compared to FMO type 2 will be constant when using the same slice groups. In our tests, this boils down to 529 bytes in the case of the Stefan sequence and 439 bytes in the case of the Table sequence. In general, these numbers represent a minuscule overhead when considered against the total number of bytes contained in a coded sequence. An application of FMO that is often mentioned is the concept of scattered slices (FMO type 1). In this configuration, a macroblock of a certain slice group is surrounded by macroblocks of one or more other slice groups. In the case where two slice groups are used, the macroblocks are arranged in a checkerboard fashion. Organizing the macroblocks by making use of scattered slices has some very nice error resilience properties as explained in Section 2. A question that arises is, again, what is the cost thereof in terms of bit rate overhead. To gain some insight into this problem, a test similar to the one described above was conducted. The video sequences Stefan and Table were encoded several times with different GOP structures (I only, IP, IBP, and IBBP); different quantization parameters (28/28/30 and 40/40/42 for I/P/B slices); different slice sizes (50, 100, and 400 macroblocks); different number of slice groups (2, 4, and 6); and of course with and without making use of FMO. Note that the case with 400 macroblocks per

372

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

slice corresponds with one slice per picture without FMO and with one slice per slice group in the case FMO is applied. Also note that the GOP length was 18 in all cases. This test was once done with CAVLC as entropy coding scheme (Baseline Profile without B slices and Extended Profile with B slices) and once with CABAC (as mentioned before, FMO in the context of the Main Profile). The results of this experiment are summarized in Tables 4 and 5. All costs that are mentioned in these tables are relative to the corresponding case where no FMO was used, in other words, the other parameters are identical. The bit rates of the cases without FMO are also given in the tables for reference. Note that a more ÔcomprehensiveÕ comparison could be made if we would have an accurate rate control mechanism at our disposal because in that case one could compare various cases on a per bit rate basis. An obvious observation is the fact that the cost of FMO is significantly higher compared to the situation where FMO type 2 was used to accomplish a ROI. Note that, in this scenario, there is only one SPS and one PPS because the configuration of the slices and slice groups remains constant throughout the video sequence. This means that the overhead is due to extra slice headers; due to a reduced coding efficiency because intra prediction cannot cross slice boundaries; and due to the fact that
Table 4 Relative cost of FMO type 1 for CIF resolution (%)—CABAC GOP QP (I/P/B) Slice size Stefan kbps # Slice groups 2 I 28/—/— 50 100 400 50 100 400 50 100 400 50 100 400 50 100 400 50 100 400 50 100 400 50 100 400 4313 4255 4212 1374 1336 1309 1392 1323 1268 330 293 272 1351 1277 1214 319 284 264 1499 1439 1382 315 282 248 10 10 11 14 15 16 10 12 16 16 21 26 9 12 16 16 20 26 7 9 12 19 20 31 4 10 11 12 15 16 19 17 21 27 26 36 47 16 21 27 25 35 46 14 17 22 35 45 65 6 11 11 12 17 18 20 18 22 28 31 40 51 17 22 28 30 39 50 15 18 23 40 49 70 2944 2887 2845 802 762 734 517 497 483 124 108 95 495 475 461 120 103 91 588 568 552 134 118 105 Table kbps # Slice groups 2 10 11 12 22 24 27 6 7 8 12 15 22 7 7 9 12 15 21 6 7 8 14 16 23 4 11 12 14 23 25 30 7 9 12 14 17 33 8 9 12 14 18 33 8 9 12 17 20 34 6 12 13 15 26 28 33 10 10 14 26 24 41 11 11 15 26 25 41 10 10 13 27 26 42

40/—/—

IBP

28/28/30

40/40/42

IBBP

28/28/30

40/40/42

IP

28/28/—

40/40/—

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375 Table 5 Relative cost of FMO type 1 for CIF resolution (%)—CAVLC GOP QP (I/P/B) Slice size Stefan kbps # Slice groups 2 I 28/—/— 50 100 400 50 100 400 50 100 400 50 100 400 50 100 400 50 100 400 50 100 400 50 100 400 5065 5010 4965 1666 1630 1601 1664 1590 1524 397 366 339 1610 1544 1476 389 361 332 1774 1705 1648 378 336 300 15 16 17 23 25 26 11 13 17 19 22 28 10 12 16 17 19 26 8 10 12 22 26 37 4 15 16 17 23 25 27 18 22 28 30 37 48 17 21 27 28 34 46 15 19 23 41 55 74 6 15 16 17 23 25 27 18 23 28 33 39 50 18 22 27 31 36 48 16 20 24 45 57 76 3447 3386 3341 984 947 918 603 585 571 148 131 119 580 561 547 143 127 115 684 664 650 160 143 130 Table kbps # Slice groups 2 19 21 23 35 39 43 9 10 11 17 20 27 9 10 12 17 21 27 8 9 10 18 22 28 4 19 21 23 35 39 44 11 12 14 19 24 37 11 12 15 19 24 37 10 12 13 21 26 39

373

6 20 21 23 36 40 44 13 13 15 29 29 42 13 13 16 29 29 42 12 13 14 30 30 44

40/—/—

IBP

28/28/30

40/40/42

IBBP

28/28/30

40/40/42

IP

28/28/—

40/40/—

the contexts of CAVLC or CABAC are broken across slice boundaries (which might have an even bigger impact on the coding efficiency than the broken intra prediction). For instance, a checkerboard-like FMO pattern is effective for error resilience but it offers the worst possible case for coding efficiency due to the fact that all intraframe prediction mechanisms are broken. A better trade-off between error resilience and coding efficiency could be achieved by clustering small groups of macroblocks into slice groups. The number of macroblocks in a slice and the quantization parameter have the same kind of effect on the overhead of FMO (and again for the same reasons). Just like the case of a ROI, the relative overhead of FMO heavily depends on the bit rate, i.e., the quantization parameters used to encode the video. It is important to note that very large percentages (more than 35%) only occur when a very low quality is used (QP = 40), and they may hence not be representative for many application. The overhead in the cases with a more moderate quality is much lower and much more acceptable for an error resilience tool. For higher qualities, it can be expected that the overhead will become even lower. As a result, one has to be careful when interpreting the given numbers of Tables 4 and 5. When using scattered slices, one can choose to use up to eight slice groups. The more slice groups that are used,

374

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

the more overhead is produced because every slice group entails the creation of at least one slice and, hence, a slice header. A quantitative indication of this effect can be found by comparing the numbers of Tables 4 and 5 in a column-wise manner. Note that the resulting overhead is also a result of the fact that the contexts of CABAC or CAVLC are broken across slice boundaries. The latter may explain why the difference in overhead when comparing the 2 and 4 slice group cases is often larger than the difference when comparing the 4 and 6 slice group cases. Indeed, as from a certain number of slice groups (in Tables 4 and 5, this number would be four), the ÔstrengthÕ of the contexts may be damaged so badly that it does not get much worse even if more slice boundaries are introduced. Looking at the above mentioned table, the overhead of FMO is also related to the GOP structure that was used to encode the video sequence. Using only intra prediction leads to relatively high bit rates because the coding efficiency is limited in that case. As a result, the relative overhead is rather small. When inter prediction is also used, the relative overhead of FMO is higher on average because of the higher coding efficiency. There are no notable differences between the cases with one and two consecutive B pictures (i.e., and to be more precise: pictures that consist only of B slices). We conclude this section with an important remark. The comparisons in Tables 4 and 5 are based on an equal slice size in the cases with and without FMO, and are not based on an equal number of slices per picture. If we calculate the number of slices, we see that a slice size of 50, 100, and 400 macroblocks corresponds with 8, 4, and 1 slice(s) if no FMO is used. In the case FMO is used, the numbers of slices are respectively 8, 4, and 2 (2 slice groups); 8, 4, and 4 (4 slice groups); 12, 6, and 6 (6 slice groups). We observe that in the case of a slice size of 400 macroblocks there is a significant difference in the number of slices within a picture (up to a factor 6). This means that the percentages that are given in Table 4 and 5 not only comprise the overhead (or cost) of FMO, but also an overhead due to extra slice headers. This explains why the percentages may seem very high. For instance, if we compare (in Table 4) the FMO cases of Stefan having four slice groups and a slice size of 400 macroblocks with the no-FMO cases having a slice size of 100 macroblocks (i.e., both cases have four slices per picture), the percentages (12, 19, 27, 47, 27, 46, 22, and 65) would become (11, 16, 24, 36, 22, 32, 20, and 47). The later percentages give a more precise idea of the actual loss in coding efficiency introduced by FMO (see above). Note that the percentages as they are given in the two tables are particularly useful if the slice size is specified (i.e., limited) by the MTU of the network. The percentages as calculated in this paragraph can be computed based on the numbers given in Tables 4 and 5.

5. Conclusion This paper provides a description of some novel error resilience tools that are incorporated in the H.264/AVC specification. One of them is flexible macroblock ordering and a detailed overview is given of this feature. Also, an implementation of our FMO

P. Lambert et al. / J. Vis. Commun. Image R. 17 (2006) 358–375

375

enabled H.264/AVC encoder and decoder is given and discussed from an implementation point of view. Using our modified version of the reference software of H.264/ AVC, some experiments were conducted to measure some aspects of FMO (also in the context of the Main Profile of H.264/AVC). First, it is shown that FMO enhances both the objective and subjective visual quality of video sequences when the corresponding bit stream was subject to packet loss, especially in case of higher losses. Our measurements also show that the overhead that is introduced by FMO is acceptable in most cases compared to the benefits FMO brings along. Some marginal configurations, however, show that this overhead can become an issue.

Acknowledgments The research activities that have been described in this paper were funded by Ghent University, the Interdisciplinary Institute for Broadband Technology (IBBT), the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT), the Fund for Scientific Research-Flanders (FWO-Flanders), the Belgian Federal Science Policy Office (BFSPO), and the European Union.

References
[1] International Telecommunication Union (ITU), Geneva, Switzerland, ITU-T Rec. H.263 (Video coding for low bit rate communication), February 1998. URL http://www.itu.int. [2] JVT, Requirements for AVC codec, JVT-C156, May 2002. URL ftp://standards.polycom.com/ 2002_05_Fairfax/JVT-C156.doc. [3] T. Wiegand, G.J. Sullivan, G. Bjøntegaard, A. Luthra, Overview of the H.264/AVC video coding standard, IEEE Trans. Circ. Syst. Video Technol. 13 (7) (2003) 560–576. [4] JVT/AVC reference software, http://iphome.hhi.de/suehring/tml/download/. [5] S. Wenger, H.264/AVC over IP, IEEE Trans. Circ. Syst. Video Technol. 13 (7) (2003) 645–656. [6] T. Stockhammer, M.M. Hannuksela, T. Wiegand, H.264/AVC in wireless environments, IEEE Trans. Circ. Syst. Video Technol. 13 (7) (2003) 657–673. [7] C. Perkins, O. Hodson, V. Hardman, A survey of packet loss recovery techniques for streaming audio, IEEE Network 12 (5) (1998) 40–48. [8] M.M. Hannuksela, Y.-K. Wang, M. Gabbouj, Isolated regions in video coding, IEEE Trans. Multimedia 6 (2) (2004) 259–267. [9] JVT, Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC), JVT-G050r1, May 2003. URL ftp://standards. polycom.com/2003_03_Pattaya/JVT-G050r1.zip. [10] S. Wenger, M. Horowitz, Flexible MB ordering—a new error resilience tool for IP-based video, in: International Workshop on Digital Communications (IWDC 2002), Capri, Italy, 2002. [11] JVT, FMO 101, JVT-D063, July 2002. URL ftp://standards.polycom.com/2002_07_Klagenfurt/ JVT-D063.doc. [12] JVT, Non-normative error concealment algorithms, JVT-N62, September 2001. URL: http:// ftp3.itu.ch/av-arch/video-site/0109_San/VCEG-N62.doc. [13] S. Vanhastel, B. Duysburgh, P. Demeester, Performance measurements on the current internet, in: 7th IFIP ATM&IP Workshop, Antwerp, Belgium, 1999. [14] B. Duysburgh, An active networking based service for the distribution of voice with transcoding capabilities in the multicast tree, Ph.D dissertation, Ghent University, 2004.