Comparison of Prediction Schemes With Motion Information Reuse For Low Complexity Spatial Scalability

Comparison of prediction schemes with motion information reuse for low complexity spatial scalability
Koen De Wolfa , Robbie De Suttera , Wesley De Nevea , and Rik Van de Wallea University - IBBT, Department of Electronics and Information Systems - Multimedia Lab, Sint-Pietersnieuwstraat 41, B-9000 Ghent, Belgium;
ABSTRACT
Three low complexity algorithms that allow spatial scalability in the context of video coding are presented in this paper. We discussed the feasibility of reusing motion and residual texture information of the base layer in the enhancement layer. The prediction errors that arise from the discussed filters and schemes are evaluated in terms of the Mean of Absolute Differences. For the interpolation of the decoded pictures from the base layer, the presented 6-tap and bicubic filters perform significantly better than the bilinear and nearest neighbor filters. In contrast, when reusing the motion vector field and the error pictures of the base layer, the bilinear filter performs best for the interpolation of residual texture information. In general, reusing the motion vector field and the error pictures of the base layer gives the lowest prediction errors. However, our tests showed that for some sequences that have regions with complex motion activity, interpolating the decoded picture of the base layer gives best result. This means that an encoder should compare all possible prediction schemes combined with all interpolation filters in order to achieve optimal prediction. Obviously this would not be possible for real-time content creation. Keywords: Scalable video coding, multi-layer motion prediction, spatial scalability, interpolation filters
a Ghent
1. INTRODUCTION
The Moving Picture Experts Group (MPEG, ISO/IEC JTC 1/SC 29/WG 11) and the Video Coding Experts Group (VCEG, ITU-T SG16) recently started exploring the field of Scalable Video Coding (SVC).1 The proposed technologies are mainly focusing on temporal, spatial, and quality scalability. At the moment, several indications exist that the high complexity of the algorithms might be a problem for (real-time) content creation.24 Motion estimation and compensation is considered as the main temporal decorrelation technique for interpicture prediction in conventional video compression schemes. This technique is typically one of the most complex and time-consuming parts of a video encoder. In scenarios where live content is streamed in a scalable manner, it may not be possible to determine the ideal motion vectors for all spatial resolution levels due to the time-complexity of the motion estimation process. In this paper, we present three different low complexity prediction schemes for which we discuss the possibility of reusing motion vectors and residual texture information. The latter are gathered at the lowest spatial resolution (generally called base layer) for the prediction of higher resolution versions (enhancement layers). In these schemes, interpolation filters will be used for the prediction of high resolution pictures both for the upsampling of the decoded pictures as for the upsampling of the prediction errors of the base layer. Therefore, an important issue will be the choice of interpolation filters that can be used. More in particular, the question is whether an increase of the number of filter taps - and therefore also an increase of the complexity of the filter
Further author information: (Send correspondence to Koen De Wolf) Koen De Wolf: E-mail: koen.dewolf@ugent.be, Telephone: +32 9 331 49 57 Robbie De Sutter: E-mail: robbie.desutter@ugent.be, Telephone: +32 9 331 49 59 Wesley De Neve: E-mail: wesley.deneve@ugent.be, Telephone: +32 9 331 49 57 Rik Van de Walle: E-mail: rik.vandewalle@ugent.be, Telephone:+32 9 331 49 12
Visual Communications and Image Processing 2005, edited by Shipeng Li, Fernando Pereira, Heung-Yeung Shum, Andrew G. Tescher, Proc. of SPIE Vol. 5960 (SPIE, Bellingham, WA, 2005) 0277-786X/05/$15 doi: 10.1117/12.633358 Proc. of SPIE Vol. 5960 59605L-1
Downloaded from SPIE Digital Library on 08 Jul 2010 to 143.248.227.93. Terms of Use: http://spiedl.org/terms
- will lead to a better prediction? In this paper, four interpolation filters will be used with a different number of filter taps in combination with one decimation filter. The mentioned schemes and filters are all evaluated in the context of H.264/AVC. This state-of-the-art video coding standard achieves up to 50% bit rate savings for equivalent perceptual quality compared to prior standards.5 This video coding specification now also serves as a base for the development of several scalable video coding compression algorithms.6 This paper is organized as follows, section 2 discusses some important issues concerning spatial scalability and describes the used prediction schemes and the interpolation filters used for the evaluation of the prediction schemes. Section 3 discusses the conducted experiments and results. Final conclusions and some remarks are made in section 4.
2. SPATIAL SCALABILITY 2.1. Motion vector scalability

In case of spatial scalability, the energy of an error picture can only be minimized when the motion field is estimated in every spatial layer. Because these motion vector fields are highly correlated for all spatial layers, scalable coding of these fields is highly recommendable. Briey, three approaches exist; a top-down, a bottom-up, and a hybrid solution.7, 8 In case of a top-down approach, the same motion vector field of the highest resolution is used for lower resolutions and with that, appropriate scaling of the original motion vector fiels. This, however, results in unnecessary motion vector precision for the lower spatial resolutions. For the bottom-up approach, estimating the best motion vector field is only necessary for the lowest spatial resolution. The obtained motion vectors will then be scaled up for the higher resolutions. Obviously, the scaled motion vectors will have lower precision. In order to increase the precision and to minimize the energy in higher resolution error pictures, the upscaled motion vectors may be further refined. Fast search algorithms may use the motion vector field of the lowest resolution as an indication for the motion vector field of higher resolution. When using the hybrid approach, the ideal motion vectors are determined for a version of the sequence which has a spatial resolution between the largest and the smallest resolution version available. The motion vectors will then be upscaled for the higher resolution versions and downscaled for the smaller resolution versions.
2.2. Prediction schemes

In this paper three prediction schemes are discussed as well as the usage of four interpolation filters. The diagrams of the discussed prediction schemes are given in Fig. 1 and Fig. 2. The first scheme is the simplest; every decoded picture of the base layer is interpolated to match the target resolution. These obtained pictures serve as a prediction for the corresponding pictures of the higher resolution sequence. After this, the prediction error is transformed, quantized, and entropy coded. This means that this scheme doesnt require a decoding step and that no pictures at a higher resolution must be kept in buffers to serve as reference for other pictures. In scheme 2, only the decoded I-pictures of the base layer are interpolated. The difference between the interpolated I-pictures and the corresponding high resolution pictures is transformed, quantized, and entropy coded. In contrast with the first scheme the inverse operations are applied and the decoded picture is stored in a buffer. P-pictures are then predicted by reusing motion vectors of the base layer. These motion vectors, together with the macroblock partitioning mode, are used for the motion compensation of the high resolution pictures. The prediction errors are treated in the same way as the prediction errors of the I-pictures. This means that decoding has to be done for all pictures and that the decoded pictures must be kept in buffers so they are available as a reference for other pictures. The latter also implies that every last picture preceding an I-picture doesnt need to be decoded nor stored. Scheme 3 shows great resemblance to scheme 2, but in case of P-pictures, the prediction error of the base layer is also interpolated and added to the motion compensated pictures of the enhancement layer. This means that for the second prediction scheme, only I-pictures are interpolated while for the third scheme both I-pictures and residual texture information are interpolated.
Proc. of SPIE Vol. 5960 59605L-2

Scheme 1 ENH. LAYER

Input Video Transformation & Quantization Entropy Coding
Interpolated
Decimation by 2
Transformation & Quantization
Entropy Coding
Inverse Transformation & Inverse Quantization
BASE LAYER
Motion Compensated Prediction
Memory
MV Motion Estimation
Figure 1. Prediction scheme 1.
In case a block of a picture in the base layer is intra predicted and therefore no motion information is available, the corresponding block in the enhancement layer can also be predicted using techniques such as intra macroblok prediction .5, 9 The intra prediction mode of the block from the base layer may serve as an estimation of the best intra prediction mode for the associated block in the enhancement layer. In that case, no extra parameters regarding the applied intra prediction mode of that block need to be encoded for the enhancement layer. The motion vectors of the base layer can be further refined in order to obtain better coding performance.6, 10 However, in our schemes no further refinement is present. This means that the precision of the motion vectors remains the same for both spatial layers; i.e., for the luminance component, motion vectors have 1/4 pixel accuracy for the base layer and 1/2 pixel accuracy for the enhancement layer. Motion compensation in the enhancement layer is done on macroblock partitions which are doubled in size compared to the corresponding macroblock partitions of the base layer. The difference between the original high resolution pictures and the predicted pictures (using one of the discussed schemes) can be coded using the same concepts, techniques, and partition modes as denoted in the H.264/AVC specification.5, 9 More in particular: integer-valued transformations (DCT and Hadamard), quantization, and context adaptive entropy coding. Doing so allows the reuse of already available components at the encoder. To allow quality scalability, the obtained residual pictures can be coded using a Fine-Granular Scalability (FGS) based coding mechanism. This is not discussed in this paper. The effect on temporal scalability is tackled by Schwarz et al.11 .
2.3. Interpolation Filters

We compared four symmetric filters for the interpolation of the reference pictures and the residual texture information: nearest neighbor, bilinear (1, 1)/2, bicubic (-1, 5, 5, -1)/8 and the 6-tap filter as defined in the H.264/AVC specification (1, -5, 20, 20, -5, 1)/32. These filters were evaluated during the standardization phase of H.264/AVC for the purpose of half-sample interpolation for usage in motion estimation with sub-pixel accuracy.12 The latter filter was incorporated in the H.264/AVC specification. All described filters can be implemented using bit shifts, additions, and subtractions as discussed in prior work.13

Schemes 2 & 3
Input Video Transformation & Quantization Inverse Transformation & Inverse Quantization
Entropy Coding
ENH. LAYER
+ Interpolated Motion Compensated Prediction Memory
Decimation by 2
Transformation & Quantization Inverse Transformation & Inverse Quantization
Entropy Coding
BASE LAYER
+ Motion Compensated Prediction
Memory MV Active in Scheme 3
Motion Estimation
Figure 2. Prediction schemes 2 & 3.
3. RESULTS 3.1. Test set-up

For the performance evaluation of the described schemes and filters, several sequences with different motion characteristics (ranging from low to high motion activity) were encoded. A short description concerning the motion features of the tested sequences can be found in Table 1. All sequences consist of 300 pictures, except for Bus which has 150 pictures. I-pictures are inserted every 32 or 300 pictures. Furthermore, 2 values for the fixed Quantization Parameter (QP) of the base layer and 5 values for the fixed Quantization Parameter (QP) of the enhancement layer are used. The exact numbers are shown in Table 2. All macroblocks of P-pictures were inter-coded using only one reference picture; no B-pictures were used. The pictures of the low resolution sequence were generated using an 11-tap downsampling filter (1, 0, -5, 0, 20, 32, 20, 0, -5, 0, 1)/64.14 Schemes 2 and 3 are using uni-directional prediction. The error pictures are coded as denoted in the H.264/AVC specification.5, 9
3.2. Evaluation
The Mean of Absolute Differences (MAD) between the obtained predicted pictures and the original spatial high resolution pictures is determined. These values are used for the evaluation of the prediction schemes. This gives us the average deviation per pixel between the predicted picture and the picture to be encoded. Furthermore, in Fig. 4 and 5, the differences between the measured MAD values are plotted in terms of percentages. The relative MAD deviation of M ADA compared to M ADB is given by
We define an I-picture in the context of H.264/AVC as a picture consisting entirely of intra predicted slices.

Table 1. Description of motion features of the tested sequences.
Sequence Name Stefan Foreman Mother & daughter Bus Crew
Description Tennis player; camera is following the player, high motion from camera and subject, complex textures Man presenting a construction yard; moderate movement from both the subject and the camera Mother and daughter sitting in front of a camera; no camera movement Riding bus; camera is following the bus, homogeneous motion of camera (panning) and subject Astronauts followed by the camera, lots of ash lights
Table 2. Test conditions.
Sequences Intra period QP base layer QP enhancement layer
Bus, Crew, Foreman, Mother & Daughter, Stefan 32, 300 16, 32 0, 12, 24, 36, 48
(M ADA M ADB ) 100. M ADB This allows us to conclude on how certain configurations are performing relative to one another. relative M AD deviation = The most interesting observations are given in the next section.
3.3. Observations
3.3.1. Scheme 1 In Fig. 3, we see that there is only a small difference between the bilinear, the bicubic and the 6-tap filter when using scheme 1. As could be expected, the nearest neighbor filter is for all sequences performing worst since MAD values are up to 25% higher than those of the other three interpolation filters. Also note that the bicubic filter performs best on average for all tested sequences. This can be due to the fact that 6 filter taps might be too much for the interpolation of detailed textures. This can clearly be seen for the Mother & Daughter sequence (hair) and the Foreman sequence (construction yard). 3.3.2. Scheme 2 In Fig. 4, the deviation of the MAD compared with the bicubic filter is plotted for both the Stefan and Bus sequences. We have choosen to use the bicubic filter as a reference since that filter performs best of all four as seen in the previous section. The impact of the interpolation filters is only visible in the first pictures following an I-picture. The deviation of the MAD stabilizes for the subsequent pictures. Although the bilinear filter performs second worst for the interpolation of I-pictures (see Fig. 3), its interpolation result together with the motion compensated coding produces the best prediction for all other pictures (See Fig. 4). The fact that motion compensation is done on the decoded pictures of the enhancement layer results in rather small prediction errors. This is especially true for sequences with low to moderate motion activity.

3.3.3. Scheme 3 Adding the interpolated residual of the base layer - as described in scheme 3 - gives lower prediction errors for sequences with substantial motion activity. In Fig. 5, we plotted the results of the different interpolation filters for the error pictures relative to the MAD values of the bilinear filter. For all plotted configurations, Ipictures are interpolated using the bicubic filter for the same reason as explained in 3.3.2; the fixed quantization parameter of the base layer and enhancement layer is 16 and 24 respectively. As can be seen in Fig. 5, the bilinear interpolation of the error picture gives the best result. The bicubic and 6-tap filter perform simular. For some sequences (e.g., Mother & Daughter) the nearest neighbor filter performs second best, whilst for other (e.g., Bus and Crew) it performs worst. This is due the fact that for the former sequence there is no camera movement and for the latter two sequences there is a panning of the camera. This finding is confirmed by the results of the Foreman sequence. In the first half of the sequence there is hardly no camera movement; for this part the nearest neighbor filter performs well. In the second half of the sequence, when the camera starts to pan, the performance of the nearest neighbor filter drops. Overall, scheme 3 is particularly beneficial for sequences with high motion activity or abrupt changes in luminance, such as ashing lights in the crew sequence as can be seen in Fig. 6. In Fig. 8 the predicted pictures using scheme 2 and scheme 3 are plotted next to the original picture during an ashlight. A distinct prediction improvement can be seen on the wall in the right picture. Adding the interpolated residual also softens block artifacts created by the motion compensation process. 3.3.4. Group of Pictures length For scheme 1, the Group of Pictures (GOP) length has no inuence on the quality of the prediction, as no motion compensated prediction is applied. The occurrence of an I-picture for scheme 2 and 3 has an adverse impact on the first few predicted pictures immediately following an I-picture. This is due to the fact that I-pictures of the enhancement layer are predicted by interpolating the corresponding I-picture of the base layer. This effect disappears for the subsequent pictures because of the coding loop in the enhancement layer. 3.3.5. Quantization The quantization parameter of the base layer has an obvious impact on the quality of the interpolated picture used as prediction in the enhancement layer. In Fig. 7 we see in the top left graph that for the Foreman sequence, using 32 as fixed quantization parameter instead of 16 increases the MAD with about 33%. Lower quantization for the coding of the error picture in the enhancement layer results in more accurate motion compensated prediction. Although PSNR values of the decoded pictures in the enhancement layer may be higher for lower quantization parameters, MAD values will not decrease proportionally as clearly can be seen in Fig. 7. For instance, the difference between QP=0 and QP=12 can hardly be seen. The difference between QP=12 and QP=24 is less then 1. On the other hand, for higher quantization the MAD seems to accumulate (QP=36, QP=48). This may be caused by the fact that for such high quantization value, the MAD of the decoded high resolution picture is higher then the MAD of the predicted picture. 3.3.6. Context adaptive interpolation If we look at the left graph in Fig. 6, we see that for a sequence with low motion activity (e.g., Mother & Daughter), scheme 2 and 3 outperform scheme 1. As can be seen in the graph on the right, for some sequences (e.g., Foreman), scheme 1 performs better in regions with complex motion activity. For this graph we used the bicubic filter for the interpolation of the decoded pictures and the bilinear filter for the interpolation of the error pictures of the base layer as these filters tend to give best results (see sect. 3.3.1 and 3.3.3).
A GOP groups I-, B- and P-pictures into a specified sequence in order to reduce the temporal redundancy. A GOP typically starts or ends with an I-picture. A group can be made of different lengths to suit the type of video being encoded. Note that in our experiments no B-pictures were used.

Ideally, an encoder should compare all possible prediction schemes combined with all interpolation filters in order to achieve optimal prediction. This has of course a major drawback on the complexity of the encoder. Using the discussed filters and schemes, this means that already 36 configurations should be evaluated for every macroblok. Obviously this would not be possible for real-time content creation. 3.3.7. Joint Scalable Video Model As already mentioned above, MPEG and VCEG are developing a scalable extension on top of the state-of-theart H.264/AVC standard. In this specification, motion vectors can be refined for prediction in higher spatial resolutions. This allows better coding performance, but at the cost of a higher complexity and results in more modes to be evaluated.
3
Scheme 1: Mother & Daughter

10 9 8
Scheme 1: Bus
7 6
MAD
MAD
nearest neigbor bilinear bicubic 6-tap
5 4 3 2 1 0 nearest neighbor bilinear bicubic 6-tap
Frame
Frame
Scheme 1: Foreman
7 6 5
Scheme 1: Crew
6 5 4
MAD
4 3 2 1 0 nearest neigbor bilinear bicubic 6-tap
MAD
3 2 1 0 nearest neighbor bilinear bicubic 6-tap
Frame
Frame
Figure 3. Scheme 1; QP base layer=16; Intra period=300; Sequences: Mother & Daughter (top left), Bus (top right), Foreman (bottom left) and Crew (bottom right).
Scheme 2: Stefan
25 nearest neighbor bilinear 20 bicubic 6-tap 12 14 16
Scheme 2: Bus
nearest neighbor bilinear bicubic 6-tap
15
MAD deviation
MAD deviation
10 8 6 4 2
10
0 0 -5 -2
Frame
Frame
Figure 4. Scheme 2; QP base layer=32; QP enhancement layer=36; Intra period=32; Sequences: Stefan (left) and Bus (right).

Scheme 3: Mother & Daughter

5
6
Scheme 3: Bus
5
nearest neighbor bicubic
bilinear 6-tap
MAD deviation
MAD deviation
-1
Frame
Frame
Scheme 3: Foreman
4,5 4 3,5 10
Scheme 3: Crew
8
MAD deviation
MAD deviation
3 2,5 2 1,5 1
0 0,5 0 -2
Frame
Frame
Figure 5. Scheme 3; QP base layer=16; QP enhancement layer=24; Intra period=300; Sequences: Mother & Daughter (top left), Bus (top right), Foreman (bottom left) and Crew (bottom right).
Mother & Daughter

4 3,5 3 2,5
10 9 8 7 6
Foreman
MAD
MAD
scheme 1 scheme 2 scheme 3
5 4 3 2 1 0
scheme 1 scheme 2 scheme 3
1,5 1 0,5 0
Frame
Frame
Figure 6. Comparison of schemes; QP base layer=32; QP enhancement layer=24; Intra period=300; Interpolation lters: bicubic (decoded picture) & bilinear (error picture); Sequences: Mother & Daughter (left) and Stefan (right).

Scheme 1
8 7 6 5 20 30
Scheme 2
25
QP=16 QP=32
QP=12 QP=24 QP=36 QP=48
MAD
4 3
MAD
15
10 2 1 0 5
Frame
Frame
Scheme 3
30 25
20
MAD
QP=0 QP=12 QP=24 QP=36 QP=48
15
10
Frame
Figure 7. Inuence of the quantization parameter on the accuracy of the prediction. Intra period=300; Sequence: Foreman; interpolation decoded pictures: bicubic; interpolation error pictures: bilinear; Scheme 1 (top left), Scheme 2 (top right), Scheme 3 (bottom)
Figure 8. Crew sequence picture 2. Original (left), predicted by scheme 2 (center) and by scheme 3 (right).

4. CONCLUSION AND REMARKS

In this paper, we have described three low complexity algorithms that allow spatial scalability. We discussed the feasibility of reusing motion and residual texture information of the base layer. The bicubic and 6-tap filters outperform the nearest neighbor and the bilinear filters for scheme 1 and the former filters have a similar performance in terms of MAD. The choice for the interpolation filter for the upscaling of the residual data as available in the base layer is more difficult. We see however that the bilinear filter performs slightly better. In general scheme 3 performs best on average for all tested configurations and sequences. Although for pictures with complex motion activity, scheme 1 performs best. The latter indicates that the choice of interpolation filter and prediction scheme is highly content dependent. Although selecting the best interpolation filter and prediction scheme for every picture gives best prediction results, it also constructs a search space that becomes very large and therefore not useful.
ACKNOWLEDGMENTS
The research activities that have been described in this paper were funded by Ghent University, the Interdisciplinary Institute for Broadband Technology (IBBT), the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT), the Fund for Scientific Research-Flanders (FWO-Flanders), the Belgian Federal Science Policy Office (BFSPO), and the European Union.
REFERENCES
1. ISO/IEC JTC1/SC29/WG11, Applications and requirements for scalable video coding. ISO/IEC JTC1/SC29/WG11 N6880, Januari 2005. 2. F. Wu, S. Li, R. Yan, X. Sun, and Y.-Q. Zhang, Ecient and universal scalable video coding, in IEEE International Conference on Image Processing (ICIP), 2, pp. 3740, 2002. 3. G. Landge, M. van der Schaar, and V. Akella, Complexity analysis of scalable motion-compensated wavelet video decoders, in Applications of Digital Image Processing XXVII, A. G. Tescher, ed., Proc. SPIE 5558, pp. 444453, 2004. 4. S. Saponara, C. Blanch, K. Denolf, and J. Bormans, The JVT advanced video coding standard: Complexity and performance analysis on a tool-by-tool basis, in IEEE Workshop Packet Video (PV03), 2003. 5. T. Wiegand, G. Sullivan, G. Bjntegaard, and A. Luthra, Overview of the H.264/AVC video coding standard, IEEE Trans. CS Video Technology 13, pp. 560576, 2003. 6. J. Reichel, M. Wien, and H. Schwarz, eds., Joint Scalable Video Model JSVM 1, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, 2005. 7. G. V. der Auwera, A. Munteanu, P. Schelkens, and J. Cornelis, Bottom-up motion compensated prediction in the wavelet domain for spatially scalable video coding, in IEE Electronics Letter, 38(21), pp. 12511253, 2002. 8. M. Mrak, G. Abhayaratne, and E. Izquierdo, Scalable generation and coding of motion vectors for highly scalable video coding, in Picture Coding Symposium 2004 (PCS-04), 2004. 9. JVT, Draft ITU-T recommendation and nal draft international standard of joint video specication (ITUT Rec. H.264 ISO/IEC 14496-10 AVC). JVT-G050r1, May 2003. 10. M. Mrak, G. Abhayaratne, and E. Izquierdo, On the inuence of motion vector precision limiting in scalable video coding, in International Conference on Signal Processing (ICSP04), Proc. ICSP 2, pp. 11431146, 2004. 11. H. Schwarz, D. Marpe, and T. Wiegand, MCTF and scalability extension of H.264/AVC, in Picture Coding Symposium 2004 (PCS-04), 2004. 12. G. Bjontegaard, Motion compensation with 1/4 pixel accuracy. ITU-T SG16/Q15, Februari 2000. 13. K. De Wolf, Y. Dhondt, J. De Cock, and R. Van de Walle, Complexity analysis of interpolation lters for scalable video coding, in To appear in: Proceedings of Euromedia 2005, 2005. 14. H. Schwarz, D. Marpe, and T. Wiegand, Scalable extension of H.264/AVC, in ISO/IEC JTC 1/SC 29/WG11 MPEG2004/M10569/S03, 2004.


Comparison of Prediction Schemes With Motion Information Reuse For Low Complexity Spatial Scalability

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparison of Prediction Schemes With Motion Information Reuse For Low Complexity Spatial Scalability

Uploaded by

Copyright:

Available Formats

Comparison of prediction schemes with motion information reuse for low complexity spatial scalability

2. SPATIAL SCALABILITY 2.1. Motion vector scalability

2.2. Prediction schemes

Proc. of SPIE Vol. 5960 59605L-2

Scheme 1 ENH. LAYER

Transformation & Quantization

Inverse Transformation & Inverse Quantization

Motion Compensated Prediction

Figure 1. Prediction scheme 1.

2.3. Interpolation Filters

Proc. of SPIE Vol. 5960 59605L-3

+ Interpolated Motion Compensated Prediction Memory

Transformation & Quantization Inverse Transformation & Inverse Quantization

+ Motion Compensated Prediction

Memory MV Active in Scheme 3

Figure 2. Prediction schemes 2 & 3.

3. RESULTS 3.1. Test set-up

Proc. of SPIE Vol. 5960 59605L-4

Table 1. Description of motion features of the tested sequences.

Sequence Name Stefan Foreman Mother & daughter Bus Crew

Sequences Intra period QP base layer QP enhancement layer

Proc. of SPIE Vol. 5960 59605L-5

Proc. of SPIE Vol. 5960 59605L-6

Scheme 1: Mother & Daughter

5 4 3 2 1 0 nearest neighbor bilinear bicubic 6-tap

4 3 2 1 0 nearest neigbor bilinear bicubic 6-tap

3 2 1 0 nearest neighbor bilinear bicubic 6-tap

Proc. of SPIE Vol. 5960 59605L-7

Scheme 3: Mother & Daughter

nearest neighbor bicubic

nearest neighbor bilinear bicubic 6-tap

nearest neighbor bilinear bicubic 6-tap

Mother & Daughter

Proc. of SPIE Vol. 5960 59605L-8

QP=12 QP=24 QP=36 QP=48

QP=0 QP=12 QP=24 QP=36 QP=48

Proc. of SPIE Vol. 5960 59605L-9

4. CONCLUSION AND REMARKS

Proc. of SPIE Vol. 5960 59605L-10

You might also like