Temporal Video Segmentation On H.264/AVC Compressed Bitstreams

Temporal Video Segmentation on H.
264/AVC Compressed Bitstreams

Sarah De Bruyne1 , Wesley De Neve1 , Koen De Wolf1 , Davy De Schrijver1 , Piet Verhoeve2 , and Rik Van de Walle1
1
Department of Electronics and Information Systems - Multimedia Lab Ghent University - IBBT Gaston Crommenlaan 8 bus 201, B-9050 Ledeberg-Ghent, Belgium sarah.debruyne@ugent.be http://multimedialab.elis.ugent.be 2 Televic, Belgium
Abstract. In this paper, a novel method for temporal video segmentation on H.264/AVC-compliant video bitstreams is presented. As the H.264/AVC standard contains several new and extended features, the characteristics of the coded frames are dierent from former video specications. Therefore, previous shot detection algorithms are not directly applicable to H.264/AVC compressed video bitstreams. We present a new concept, in particular, Temporal Prediction Types, by combining two features: the dierent macroblock types and the corresponding display numbers of the reference frames. Based on this concept and the amount of intra-coded macroblocks, our novel shot boundary detection algorithm is proposed. Experimental results show that this method achieves high performance for cuts as well as for gradual changes.
Introduction
Recent advances in multimedia coding technology, combined with the growth of the internet, as well as the advent of digital television, have resulted in the widespread use and availability of digital video. As a consequence, many terabytes of multimedia data are stored in databases, often insuciently cataloged and only accessible by sequential scanning. This has led to an increasing demand for fast access to relevant data, making technologies and tools for the ecient browsing and retrieval of digital video of paramount importance. The prerequisite step to achieve video content analysis is the automatic parsing of the content into visually-coherent segments, called shots, separated by shot boundaries [1]. The denition of a shot change is important to stress, since the object or camera motions may drastically change the content of a video sequence. A shot is dened as a sequence of frames continuously captured from the same camera [2]. According to whether the transition between consecutive shots is abrupt or not, boundaries are classied as cuts or gradual transitions, respectively. Algorithms for shot boundary detection can be roughly classied in two major groups, depending on whether the operations are done on uncompressed data
T.-J. Cham et al. (Eds.): MMM 2007, LNCS 4351, Part I, pp. 112, 2007. c Springer-Verlag Berlin Heidelberg 2007
S. De Bruyne et al.
or whether they work directly with compressed domain features. The two major video segmentation approaches operating in the uncompressed domain are based on color histogram dierences [3] and changes in edge characteristics [4]. On the other hand, full decompression of the encoded video and the computational overhead can be avoided by using compressed domain features only. Since most video data are compressed to preserve storage space and reduce band width, we focus on methods operating in the compressed domain. Existing techniques in this domain mostly concentrate on the MPEG-1 Video, MPEG-2 Video, and MPEG-4 Visual standards. These algorithms are for the most part based on the correlation of DC coecients [5], macroblock prediction type information [6,7], or the bit consumption (or bit rate) of a frame [8]. Due to the compression performance of the newest video compression standard H.264/AVC [9], more video content will probably be encoded in this format. This video specication possesses features like intra prediction in the spatial domain and multiple reference frames, which were not included in previous standards. In this paper, we investigate whether the earlier mentioned compressed domain methods are still applicable for H.264/AVC compressed data. Since these methods turn out to be inadequate, we propose a new shot detection algorithm for H.264/AVC compressed video. The outline of this paper is as follows. In Sect. 2, the main characteristics of H.264/AVC are elaborated from a high-level point of view. Section 3 discusses the inuences of these characteristics on existing compressed domain algorithms. A new shot boundary detection algorithm based on temporal prediction types is proposed in Sect. 4. Section 5 discusses a number of performance results obtained by our method. Finally, Sect. 6 concludes this paper.
Design Aspects of H.264/AVC
The H.264/AVC specication contains a lot of new technical features compared with prior standards for digital video coding [9]. With respect to shot boundary detection, H.264/AVC has three important design aspects, which are either new or extended compared to previous standards: intra prediction, slice types, and multi-picture motion-compensated prediction. In contrast to prior video coding standards, intra prediction in H.264/AVC is conducted in the spatial domain, by referring to neighboring samples of previously-decoded blocks [9]. Two primary types of intra coding are supported: Intra 44 and Intra 1616 prediction. In Intra 44 mode, each 44 luma block is predicted separately. This mode is well suited for coding parts of a picture with signicant detail. The Intra 1616 mode uses a 1616 luma block and is more suited for coding very smooth areas of a picture. Another intra coding mode, I PCM, enables the transmission of the values of the encoded samples without prediction or transformation. Furthermore, in the H.264/AVC Fidelity Range Extensions (FRExt) amendment, Intra 88 is introduced. The latter two types are hardly used and are therefore not supported in the following algorithms, but these algorithms can easily be extended to cope with this prediction type too.
Temporal Video Segmentation on H.264/AVC Compressed Bitstreams
In addition, each picture is partitioned into MBs, which are organized in slices [9]. H.264/AVC supports ve dierent slice types. In I slices, all MBs are coded using intra prediction. Prior-coded images can be used as a prediction signal for MBs of the predictive-coded P and B slices. Whereas P MB partitions can utilize only one frame to refer to, B MB partitions can use two reference frames. The remaining two slice types, SP and SI, which are specied for ecient switching between bitstreams coded at various bit rates, are rarely used. The third design aspect, multi-picture motion-compensated prediction [9], enables ecient coding by allowing an encoder to select the best reference picture(s) among a larger number of pictures that have been decoded and stored in a buer. Figure 1 illustrates this concept. A multi-picture buer can contain both short term and long term reference pictures and allows reference pictures containing B slices. When using inter prediction for a MB in a P (or B) slice, the reference index (or indices) are transmitted for every motion-compensated 1616, 168, 816, or 88 luma block.
Fig. 1. Multi-picture motion-compensated prediction. In addition to the motion vector, picture reference parameters () are transmitted [9].
Temporal Segmentation Algorithms for H.264/AVC
In this section, we verify whether the existing compressed domain algorithms are still applicable to H.264/AVC compressed video bitstreams, keeping the new and improved characteristics of this specication in mind. 3.1 DC Coecients
Shot boundary detection methods in compressed domain often use DC coecients to generate DC images [5]. For intra-coded MBs, these DC coecients represent the average energy of a block (i.e., 88 pixels), that can be extracted directly from the MPEG compressed data. For P and B frames, the DC coecients of the referred regions in the reference frame are used to obtain the corresponding DC image. Based on these DC images, shot detection algorithms, such as color histograms, can be directly transformed to the compressed domain. Unlike previous MPEG standards, DC coecients of intra-coded MBs in H.264/AVC only present an energy dierence between the current block and the adjacent pixels instead of the energy average. In case we want to apply the
S. De Bruyne et al.
proposed algorithm, we need to calculate the predicted energy from adjacent pixels to obtain the average energy. Therefore, almost full decoding is inevitable, which diminishes the advantages of this compressed domain method. 3.2 Bit Rate
In previous coding standards, frames located at a shot boundary consist for the greater part of intra coded MBs using prediction conducted in the transform domain only. This leads to high peaks in the bit rate, which makes shot boundary detection possible [8]. When looking at H.264/AVC-compliant bitstreams, frames coincided with shot boundaries normally have much lower bit rates than those of MPEG-2 Video for example, due to the intra prediction in the spatial domain. The height of these peaks decreases, which makes is more dicult to make a distinction between shot boundaries and movement. From these observations, one can conclude that this algorithm is hard to apply to H.264/AVC. 3.3 Macroblock Prediction Type Information
The distribution of the dierent MB prediction types [6,7] was used to detect shot boundaries in previous coding standards. This method exploits the decisions made by the encoder in the motion estimation phase, which results in specic characteristics of the MB type information whenever shot boundaries occur. As shown in Fig. 2, when a B frame does not belong to the same shot as one of its reference frames, the B frame will hardly refer to this reference frame. It is clear that the amount of the dierent prediction types of the MBs in a B frame can be applied to dene a metric for locating possible shot boundaries.
I,P
I,P
I,P
I,P
I,P
I,P
Fig. 2. Possible positions of a shot boundary
Due to the multi-picture motion-compensated prediction and the possibility of B frames to be used as reference pictures, the existing methods based on macroblock prediction types cannot directly be applied to H.264/AVC. However, features such as MB types, MB partitions, and the display numbers of reference pictures contain important information regarding the semantic structure of a video sequence. Moreover, these features can be extracted directly from the compressed data. In Sect. 4, a shot boundary detection algorithm is presented based on the above mentioned features.
Shot Boundary Detection in H.264/AVC
Within a video sequence, a continuous strong inter-frame correlation is present, as long as no signicant changes occur. As a consequence, the dierent prediction
types and the direction of the reference frames in a frame can be applied to dene a metric for locating possible shot boundaries. To determine the direction of the reference frames, the display number of the frames needs to be checked. This number represents the location of a frame in the decoded bitstream and can be derived from the Picture Order Count (POC) of the frame and the display number of the last frame prior to the previous Instantaneous Decoding Refresh (IDR) picture [9]. By comparing the display number of the current frame and the reference frames, we can derive whether the reference frames are displayed before or after the current frame. In the context of shot boundary detection, we present the concept Temporal Prediction Types combining the dierent macroblock types and the direction of the reference frames. Each MB type in a P or B slice corresponds to a specic partitioning of the MB in xed-size blocks. For each macroblock partition, the prediction mode and the reference index or indices can be chosen separately. As each MB partition corresponds to exactly one prediction mode and the smallest partition size is 88 pixels, the following discussion is based on this 88 blocks. Depending on the prediction mode of a MB partition, this partition consists of zero to two reference indices. In case no reference pictures are used, we speak of intra temporal prediction. Partitions that only use one reference picture to refer to, belong to one of these two temporal prediction types: Forward temporal prediction in case the display number of the referred frame precedes the display number of the current frame. Backward temporal prediction in case the current frame is prior to the referred frame. This subdivision is used for MB partitions in a P slice or for partitions in a B slice that only use one reference frame. In case a MB partition in a B slice refers to two frames, the following classication is applied: Forward temporal prediction in case the display numbers of both referred frames are prior to the current frame. Backward temporal prediction in case the current frame is displayed earlier than both the referred frames. Bi-directional temporal prediction in case the current frame is located in between the reference frames (which is very similar to the well-know concept used for B frames in MPEG-2 Video). Summarized, we have four possible temporal prediction types, i.e. intra, forward, backward, and bi-directional temporal prediction. According to the specication, it is allowed to construct coded pictures that consist of a mixture of dierent types of slices. However, in current applications where shot boundary detection is applicable, frames will normally be composed of slices with similar slice types. Therefore, in the remainder of this paper, we will refer to I, P, and B slice coded pictures as I, P, and B frames.
S. De Bruyne et al.
As mentioned before, there are two major types of shot changes: abrupt and gradual transitions. Since their characteristics are divergent, the detection of these transitions needs to be separated. 4.1 Detection of Abrupt Changes
One could expect that abrupt changes always occur at I frames, but it should be mentioned that this notion is not enough to detect shot boundaries. This is due to the fact that a certain type of I frames, in particular IDR pictures, are often used as random access points in a video. Therefore, I frames do not always correspond to shot boundaries. Further, depending on the encoder characteristics or the application area, the GOP structure of the video can either be xed or adapted to the content, which can result in shot boundaries occurring at P or B frames. Considering this observation, a distinction is drawn between I, P, and B frames in order to detect the transitions. Shot Boundaries Located at an I Frame. All MBs in an I frame are coded without referring to other pictures within the video sequence. As a consequence, they do not represent the temporal correlation between the current frame (Fi ) and the previous depicted frame (Fi1 ). However, in case this previous frame is a P or B frame, it contains interesting information, such as the temporal prediction types of the blocks. When the percentage of blocks with backward and bi-directional temporal prediction in Fi1 is large, there is a high correlation between Fi1 and Fi . As a consequence, the amount of blocks with intra and forward temporal prediction is low and the chance that a shot boundary is located between these two frames is very small. On the other hand, when the previous frame does not refer to the current frame, which results in a high percentage of intra and forward temporal predicted blocks, we cannot conclude that these two frames belong to dierent shots. During the encoding of the previous frame, for example, the current and following frames are not always at hand in the multi-picture buer. In this case, backward and bi-directional temporal prediction in the previous frame are impossible. To solve this problem, a second condition, based on the distribution of the intra prediction modes within two successive I frames, is added [10]. (These two I frames do not need to be located next to each other, as there can also be P and B frames in between them). Whereas MBs with 1616 prediction modes are more suited for coding very smooth areas of a picture, those with 44 prediction are used for parts of a picture with signicant detail, as can be seen in Fig. 3. When two successive I frames belong to dierent shots, the distribution of the intra prediction modes of the two frames will highly dier. Consequently, comparing the intra prediction modes between the consecutive I frames at corresponding positions will reect the similarity of the frames. However, when there are fast moving objects or camera motion, this approach would lead to false alarms. Instead, the MBs are grouped in sets of 55 MBs, named subblocks, which are then compared to each other. Now, let S k be the set of MBs
Sub-block Sk
Intra_4x4
Intra_16x16
Fig. 3. The distribution of intra prediction modes
included within the k th sub-block of an I frame and i the current and j the previous I frame. The decision function between two consecutive I frames can be dened as follows: (i) = 1 #MB M ode 4 4l i
k l S k l S k
M ode 4 4l j
(1)
If the percentage of blocks with intra and forward temporal prediction in the previous frame is higher than a predened threshold T1 and the dissimilarity (i) between the current frame and its preceding I frame is higher than a second predened threshold T2 , we declare a shot boundary located, at the current I frame. The values of both thresholds were selected in order to maximize the performance of the algorithm and were set to 80% and 15% respectively. Shot Boundaries Located at a P or B Frame. P and B frames, in contrast to I frames, use temporal prediction to exploit the similarity between consecutive frames in a shot. In case the current frame is the rst frame of a new shot, this frame will have hardly any resemblance to the previously depicted frames. Therefore, the current frame will mainly contain blocks with intra and backward temporal prediction. Blocks in the previous frame, on the other hand, will mostly use intra and forward temporal prediction. Bi-directional temporal prediction will hardly be present in this situation, since this type is only advantageous when the content of the neighboring pictures is resemblant. If the percentage of blocks with intra and forward temporal prediction in the previous frame and the percentage of blocks with intra and backward temporal prediction in the current frame are both higher than the predened threshold T1 , we declare a shot boundary located at the current P or B frame. This threshold is the same as for I frames as the principle behind the metric is similar. It is insucient to take only the percentage of intra coded blocks into account. In case a future depicted frame is already coded and stored in the buer at the moment the current frame is coded, this future frame can be used as a reference. This reference picture will represent the content of the new shot, which makes the use of backward temporal predicted blocks in the current frame preferable to intra coded MBs.
S. De Bruyne et al.
Generally speaking, the intra mode is only used to code a MB when motion estimation gives no satisfactory results. In H.264/AVC, even if a block can be predicted well, the encoder might prefer intra coding when the block can be better predicted by adjacent pixels instead of temporal prediction. As a result, statistical information for shot boundary detection, based on the percentage of intra coded MBs only, is insucient to draw a conclusion. By making use of the distribution of the dierent temporal prediction types, a more accurate detection of shot boundaries can be accomplished. Summary. Let (i), (i), (i), and (i) be the number of blocks with intra, forward, backward, and bi-directional temporal prediction, respectively, i and i 1 the current and previous frame, and #B the number of blocks in a frame. Using (1), the detection of abrupt transitions can be summarized as follows: i f ( fi i s an I frame ) i f ( #1B ((i 1) + (i 1)) > T1 and (i) > T2 ) { d e c l a r e a s h o t boundary } i f ( fi i s a P o r B frame ) i f ( #1B ((i 1) + (i 1)) > T1 and { d e c l a r e a s h o t boundary } 4.2 Detection of Gradual Changes
1 #B ((i)
+ (i)) > T1 )
Another challenge is the detection of gradual changes as they take place over a variable number of frames and consist of a great variety of special eects. A characteristic, present during most gradual changes, is the increasing amount of intra-coded MBs. The distribution of the percentage of intra-coded MBs in a frame is connected with the duration of the transition. If the transition consists of a few frames, the mutual frame dierence is relatively big and most frames will consist of intra-coded MBs. In case the transition is spread out over a longer interval, the resemblance is higher and therefore, a lot of B frames may use bi-directional temporal prediction as well. To smooth the metric (i) dened by the percentage of the intra-coded MBs and to diminish the peaks, a lter with Gaussian impulse response is applied. The result can be seen in Fig. 4(a). In contrast to the detection of abrupt changes, a xed threshold cannot be applied in the context of gradual changes, since the height of the peaks in this metric is linked to the duration of the transition. Instead, we make use of two variable thresholds Ta and Tb based on characteristics of preceding frames. Within a shot, the frame-to-frame dierences are normally lower than during a gradual change. Therefore, the mean and variation of the metric for a number of preceding frames is taken into account to determine the adaptive threshold Ta . Once a frame is found which exceeds this threshold Ta (Fig. 4(b)), the following frames are examined to determine whether or not they also belong to the transition. This is done by comparing each frame to a threshold Tb based on the mean and
Fig. 4. Gradual changes. (a) Smoothening of the gradual metric, (b) detection of the beginning of a gradual change, (c) detection of the end of a gradual change.
variation of the previous frames belonging to the gradual change (Fig. 4(c)). When the value (i) for this frame is below the threshold Tb , the end of the gradual change is found. For both thresholds, a lower boundary and a minimal variation are taken into account to avoid small elevations in a smooth area being wrongly considered as a shot boundary. Without this adjustment, a gradual transition would be falsely detected around frame 86. Furthermore, the angles of inclination corresponding to the anks of the gradual changes are taking into consideration to determine the actual length of the transition. Afterwards, the duration of the obtained transition is examined to remove false alarms, such as abrupt transitions or xed I frames. The detection of gradual transitions is executed before the detection of abrupt changes to avoid that gradual changes would be falsely considered to be multiple abrupt changes. In Fig. 4, for example, the narrow peaks at frames 63 and 142 correspond to abrupt changes, while the wide peaks around frames 45 and 122 represent gradual changes. For video sequences coded with former MPEG standards, shot detection algorithms in the compressed domain were not able to distinguish the dierent types of gradual changes. Since H.264/AVC supports several intra coding types, a dierence can be made. In smooth frames, most of the time, MBs using Intra 1616 or Skipped mode are utilized. By examining the distribution of the MB coding types, a distinction is made between fade ins, fade outs, and other gradual changes. Nowadays, long term reference pictures belonging to previous shots are seldom used. As the computational power increases tremendously and more intelligent encoding algorithms are developed, these long term reference pictures could be
10
S. De Bruyne et al.
used in the future to store the backgrounds of recurring scenes. As a result, our algorithm needs to be extended since forward temporal prediction to this long term reference frame can then be used in the rst coded frame belonging to a new shot. The display numbers therefore need to be compared to the previous detected shot boundary.
Experiments
To evaluate the performance of the proposed algorithm, experiments have been carried out on several kinds of video sequences. Five trailers with a resolution around 848448 pixels were selected as they are brimming of abrupt and gradual transitions and contain a lot of special eects. Friends with money mainly contains shots with lots of moving objects and camera motions alternated with dialogs. Shes the man, Little miss sunshine, and Accepted are all trailers brimming with all kinds of shot changes, variations in light intensity, and motion. Especially Basic instinct 2 is a challenge, as it is full of motion, gradual changes, et cetera. These sequences were coded with variable as well as with xed GOP structures in order to evaluate the inuence hereof on the algorithm. 5.1 Performance
The evaluation of the proposed algorithm is performed by comparing the results with the ground truth. For this purpose, the recall and precision ratios based on the number of correct detections (Detects), missed detections (M Ds), and false alarms (F As) are applied: Recall = Detects Detects + M Ds P recision = Detects Detects + F As
In Table 1, the performance of the proposed algorithm is presented for the above mentioned video sequences coded with a variable GOP structure based on the content of the video. This table also depicts the performance for these video sequences coded with a xed GOP structure described by the regular expression IB(PB)* and an intra period of 20 and 200 frames. This table shows that the proposed algorithm performs well for video sequences coded with a variable as well as with a xed GOP structure. The causes of the missed detections and the false alarms are similar in both cases. For these test results, the major part of the missed detections are caused by long gradual changes, since there is almost no dierence between two consecutive frames. This is a problem which most of the shot boundary detection algorithms have to cope with. Furthermore, brief shots containing quite a lot of motion will sometimes be considered as a gradual changes between the previous and the following shot as their characteristics bear resemblance to gradual changes. Consequently, this shot will not be detected. The false alarms have various reasons. Sudden changes in light intensity, such as lightning, explosions, or camera ashlights often lead to false alarms. This
11
Table 1. Performance based on Recall (%) and Precision (%) of the algorithm on sequences coded with a variable as well as a xed GOP structure. A distinction is made between the abrupt (CUT) and the gradual changes (GC). Test sequences # original shots CUT GC 1 41 24 6 47 20 1 41 24 6 47 200 1 41 24 6 47 CUT Precision Recall 96.00 95.83 86.36 94.12 86.46 100.00 96.46 100.00 95.87 95.18 100.00 96.61 95.12 93.60 92.05 100.00 95.83 93.83 95.73 91.21 97.92 90.83 97.53 99.15 86.81 95.83 95.00 96.30 100.00 89.01 GC Precision Recall 50.00 89.13 92.00 38.46 91.49 50.00 95.24 81.48 41.67 97.06 100.00 88.89 91.30 60.00 89.13 100.00 100.00 95.83 83.33 91.49 100.00 97.56 91.67 83.33 70.21 100.00 97.56 87.50 100.00 87.23
Variable GOP structure Friends with money 48 Shes the man 120 Little miss sunshine 81 Accepted 117 Basic instinct 2 91 Fixed GOP structure: Intra period Friends with money 48 Shes the man 120 Little miss sunshine 81 Accepted 117 Basic instinct 2 91 Fixed GOP structure: Intra period Friends with money 48 Shes the man 120 Little miss sunshine 81 Accepted 117 Basic instinct 2 91
is due to the fact that the current image cannot be predicted from previous reference frames since the luminance highly diers. Afterwards, future frames could use reference frames located before the light intensity change for prediction. However, nowadays, video sequences usually do not consist of a large buer and therefore do not contain these reference frames. When a shot contains lots of movement, originating from objects or the camera, false alarms will sometimes occur. Due to this motion, successive frames will have less similarity and it will be more dicult for the encoder to nd a good prediction. This leads to a lot of intra-coded MBs, and therefore, the structure of the MB type information in successive frames bears resemblance to gradual changes. Experiments have shown that looking at the distribution of the motion vectors does not oer a solution to this problem since the vectors do not always give a good representation of real movement. Here, a trade-o must be made between recall and precision.
Conclusion
This paper introduces an algorithm for automatic shot boundary detection on H.264/AVC-compliant video bitstreams. Therefore, a new concept Temporal Prediction Types was presented combining two features available in a compressed bitstream, i.e., the dierent macroblock types and the corresponding
12
S. De Bruyne et al.
display numbers of the reference frames. These features can easily be extracted from compressed data, making the decompression of the bitstream unnecessary and thereby avoiding computational overhead. Moreover, the experimental results show that the performance is promising for sequences coded with xed as well as with variable GOP structures.
Acknowledgements
The research activities as described in this paper were funded by Ghent University, the Interdisciplinary Institute for Broadband Technology (IBBT), the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT), the Fund for Scientic Research-Flanders (FWO-Flanders), the Belgian Federal Science Policy Oce (BFSPO), and the European Union.
References
1. Gargi, U., Kasturi, R., Strayer, S.: Performance Characterization of Video-ShotChange Detection Methods. IEEE Transactions on Circuits and Systems for Video Technology 10(1) (2000) 113 2. Lelescu, D., Schonfeld, D.: Statistical Sequential Analysis for Real-Time Video Scene Change Detection on Compressed Multimedia Bitstream. IEEE Transactions on Multimedia 5(1) (2003) 106117 3. Zhang, H.J., Kankanhalli, A., Smoliar, S.: Automatic Partitioning of Full-Motion Video. Multimedia Systems 1(1) (1993) 1028 4. Zabih, R., Miller, J., Mai, K.: A Feature-Based Algorithm for Detecting and Classifying Scene Breaks. In: Proceedings of ACM 95. (1995) 189200 5. Yeo, B.L., Liu, B.: Rapid Scene Analysis on Compressed Video. IEEE Transactions on Circuits and Systems for Video Technology 5(6) (1995) 533544 6. Pei, S.C., Chou, Y.Z.: Ecient MPEG Compressed Video Analysis Using Macroblock Type Information. IEEE Transactions on Multimedia 1(4) (1999) 321333 7. De Bruyne, S., De Wolf, K., De Neve, W., Verhoeve, P., Van de Walle, R.: Shot Boundary Detection Using Macroblock Prediction Type Information. In: Proceedings of WIAMIS 06. (2006) 205208 8. Li, H., Liu, G., Zhang, Z., Li, Y.: Adaptive Scene-Detection Algorithm for VBR Video Stream. IEEE Transactions on Multimedia 6(4) (2004) 624633 9. Wiegand, T., Sullivan, G., Bjontegaard, G., Luthra, A.: Overview of the H.264/AVC Video Coding Standard. IEEE Transactions on Circuits and Systems for Video Technology 13(7) (2003) 560576 10. Kim, S.M., Byun, J., Won, C.: A Scene Change Detection in H.264/AVC Compression Domain. In: Proceedings of PCM 05. (2005) 10721082

Temporal Video Segmentation On H.264/AVC Compressed Bitstreams

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Temporal Video Segmentation On H.264/AVC Compressed Bitstreams

Uploaded by

Copyright:

Available Formats

Temporal Video Segmentation on H.

264/AVC Compressed Bitstreams

Design Aspects of H.264/AVC

Temporal Video Segmentation on H.264/AVC Compressed Bitstreams

Temporal Segmentation Algorithms for H.264/AVC

Fig. 2. Possible positions of a shot boundary

Shot Boundary Detection in H.264/AVC

Temporal Video Segmentation on H.264/AVC Compressed Bitstreams

Temporal Video Segmentation on H.264/AVC Compressed Bitstreams

Fig. 3. The distribution of intra prediction modes

Temporal Video Segmentation on H.264/AVC Compressed Bitstreams

Temporal Video Segmentation on H.264/AVC Compressed Bitstreams

You might also like