You are on page 1of 6

2011 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) December 7-9,2011

Automatic Comic Strip Generation Using Extracted


Keyframes from Cartoon Animation
Pakpoom Tanapichet\ Nagul Cooharojananone2, Rajalida Lipikorn*3
Department of Mathematics and Computer Science
Faculty of Science, Chulalongkom University
Bangkok 10330, Thailand
1 pakpoom.t@student.chula.ac.th, 2 nagul.c@chula.ac.th, 3 rajalida.l@chula.ac.th

Abstract-This paper proposes a novel method to generate comic


strips from cartoon animation with the aim to cover more
contents featured in each panel with more systematic
representation. Consider that general techniques on video
summarization usually drop some important contents due to its
restriction on aspect ratio; this paper thus proposes a new
method using panorama technology to include more details in (a) (b)
each keyframe. The concept is to obtain various sizes keyframes
generated from panning and non-panning shots. Each of these +
generated keyframes is treated as a comic strip panel. They are
organized to be represented in comic books style. The results of
this proposed method are the comic strip pages consist of comic
panels derived from generated keyframes of various cartoon
animations which are more aesthetic than lining ordinary
keyframes up, and also have their contents similar to their
respective comic adaptations.
Figure 1. Comparison between (a) and (b) fixed-aspect-ratio keyframes and
Keywords-video summarization; cartoon animation; comic (c) a panorama keyframe with two important elements circled.
strip; panorama image; shot boundary; optical flow; panel
organizing In this paper, a new technique to cover multiple important
contents from cartoon animation is introduced. The proposed
I. INTRODUCTION
technique does not select keyframes from a collection of
Video summarization has become popular topics among frames, but generate new keyframes according to the rule­
researchers as it allows the users who consume digital videos to based optical flow. The proposed method will generate
gather the specific content from the video quickly and panorama keyframes when a camera is detected as panning to
efficiently. Several techniques on video summarization have combine multiple important elements into one keyframe as
been introduced recently. Most of them tend to select the most shown in Fig. l(c). As a result, the keyframes representing one
suitable frames (also known as "keyframes") from a video and video will have their dimension altered depending on optical
use them as the representatives. These techniques restrict flows, and the sizes of panorama keyframes are supposed to be
themselves to the video aspect ratio [1], [2], [3], [4], [5], [6], wider or taller.
[7]. The keyframes are usually selected based on user-defmed
importance. These techniques select the keyframes directly Fig. 1(c) shows a panorama keyframe generated by the
from a shot sequence where they maintain the frame dimension proposed technique. It includes every important element from
according to the video aspect ratio [1], [2], [3], [4], [5], [6], [7]. Fig. l(a) and 1(b). A frame restricted by the aspect ratio is
obviously unable to contain both important elements, while a
The problem occurs when a shot sequence is panning for a panorama keyframe can depict continuity of information better.
period of time and some important elements might be missing
from a keyframe if only one keyframe is selected. On the The main objective of this paper is to propose a new
contrary, if two keyframes are selected to cover all contents, technique to make use of these generated panorama keyframes
the elements that are supposed to be on the same keyframe will by converting each of them into a comic strip panel. These
be separated into two keyframes due to the restriction on aspect panels will be arranged into a page of comic strips that give the
ratio. This can make the continuous content in those readers more aesthetic representation.
consecutive keyframes to be misinterpreted as shown in Fig.
II. RELATED WORKS AND INFORMATION
1 (a)-(b). These existing techniques also represent the
keyframes without neat organizing. In general, researches on video summarization field deal
with shots and frames. It is deployed to extract only important
information in order to save time and storage spaces. This can
be achieved by determining the importance of each frame, and

* Corresponding author

978-1-4577-2166-3/11/$26.00 ©2011 IEEE


Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:42:15 UTC from IEEE Xplore. Restrictions apply.
2011 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) December 7-9,2011

discarding unimportant frames, leaving only keyframes that are


important. Latest approaches on video summarization field
concentrate on importance of elements represented in each
keyframe. One approach [1], [2] proposed a histogram with
false detection, another approach based on Entropy- Difference ( a)
[3]. These approaches manage to retrieve important keyframes
as they defined. Another two approaches improved the
efficiency by reducing the number of keyframes [4], and
brought audio feature in to account [5]. There is also an effort
to improve the algorithm to be online [6]. Nevertheless, all of
(b)
these approaches still make aspect ratio of keyframes remains
consistent. They usually drop some information that goes out Figure 2. Examples of time code marking. (a) marking using shot boundary
of the frame boundary or try to cover them by splitting a shot (b) marking using optical flow.
sequence into mUltiple keyframes unnecessarily. Some
approaches may not strict to the size of the keyframes, but still
restrict themselves to the aspect ratio [7].
There are some characteristics that differentiate cartoon
animations from general movie videos that motivate us to
propose this method. Some existing algorithms deal with
luminance, perspectives, dimensions, or view angles in the
movies that cannot be applied to cartoon animations which are (a) (b)
sometimes drawn to be physically surreal, and unlike the
movies, cartoon animations usually do not contain a long shot
sequence. The shot boundary detection is a technique that can
be applied to separate a video into shot sequences based on
transition of shots. The results of shot boundary detection
proposed by Le, et al. [8] managed to satisfy general cases of
shot transitions. Their method is therefore applied to detect
shot boundary for general cases in our proposed method. Shot
sequences can also be separated by using optical flow which is
a field of motion vectors between two consecutive frames. In Figure 3. Optical flow visualization (a) the camera is panning downward,
(b) the camera is zooming in, (c) the camera is panning to the right.
this paper, the classical Lucas-Kanade method [9] is
implemented to retrieve the optical flows of sample videos.
• Shot Boundary: The time code is marked based on
To solve the formulated problem in this paper, panorama shot transition. The time is marked when the shot
stitching technique [10] that stitches two or more overlapped transition occurs according to Le, et al. [8], as shown in
panning frames together is applied. The result is a frame that Fig. 2(a). As shot transition occurs where the contents
contains contents from all frames involved in the stitching. It in two consecutive frames abruptly change, the
usually has different aspect ratio compared to input images, panorama image stitched from these frames will give
expectedly wider or taller depending on the direction of some garbled keyframe.
panning.
• Optical Flow: While the proposed method uses the
III. PROPOSED METHOD shot boundary, it also needs the optical flow to mark
the time codes in case there is no shot transition but a
This section describes how the comic strips can be
shot sequence is multi-directional panning. If a shot
generated and organized. First, the input video will have its
sequence is panning, most of the vectors in the optical
time code marked as described in section A. A collection of
flow, also called "global optical flow", point to the
frames between two consecutive marked time codes will be
same direction with a same magnitude, and the objects
referred to as a "shot sequence" from now on. Each shot
in a shot sequence are moving to opposite direction.
sequence will be used to generate a keyframes using panorama
Optical flow with vectors pointing in other directions is
stitching as proposed in section B. These keyframes will then
called local optical flow. If a panning sequence
be treated as comic panels and arranged into comic strips as
changes direction, the global optical flow changes its
described later in section C.
direction as well. Since each shot sequence should
A. Time Code Marking contain a shot panning in only one direction, the time
In this step, a shot sequence is formed by using shot code is marked whenever the global optical flow
boundary and optical flow to mark time codes. Shot boundary changes the direction. Fig. 2(b) shows a frame
is applied to separate two shot sequences when they contain sequence with black arrows indicate the panning
distinct contents, whilst optical flow is applied to separate two direction. The time code is marked when the shot
shot sequences from camera panning action. sequence is panning to the left, then suddenly changes
the direction downward. This marking divides the
sequence into two shot sequences.

Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:42:15 UTC from IEEE Xplore. Restrictions apply.
2011 International Symposium on Intelligent Signal Processing and Communication Systems (lSPACS) December 7-9,2011

Fig. 3 shows a visualization of optical flows where the

I I
arrows illustrate the motion of objects moving in the opposite
direction of a camera motion. The length of each arrow A Single
indicates the magnitude of motion vector. Fig. 3(a) illustrates
the global optical flow points upward while the camera is
panning down. Fig. 3(b) illustrates the optical flow points out

�0
to the boundary of the frame while the camera is zooming in.
Double
Fig. 3(c) illustrates the global optical flow points to the left
while the camera is panning to the right. All the vectors in

0
global optical flow in both cases point to the same direction
with the same magnitude.
B. Keyframe Generating F Sidebar

In this step, keyframes are generated from shot sequences


obtained in section A using rule-based optical flow. Each shot
sequence should contain global optical flow which usually
0
represents the major component of a frame. The proposed Figure 4. Types of arrangements.
method will generate one keyframe for each shot sequence to
cover information featured in the video as much as possible. C. Panel Organizing
Shot sequences can be classified into three main categories: After the keyframes are generated from section B, they will
be organized into a comic strip for a systematic representation.
1) Non-panning: When the global optical flow of a shot
Each keyframe obtained from section B becomes a comic panel
sequence stays still, the shot sequence is said to be "non­ in this section. Initially, panel organizing is restricted to no
panning". There are also some cases that the optical flows more than two columns and four rows for a clear depiction;
point to many directions making global optical flows otherwise, a respective panel will be too small. Each page of a
undetermined. In other words, the panning direction is comic strip is initialized to a standard dimension of the comic
unknown. It is assumable that a shot sequence is not panning. books which is 2:3 (width by height).
When a shot sequence is classified as non-panning, only one There are three types of arrangements as shown in Fig. 4,
frame in the shot sequence is selected as a keyframe. Since and there are five types of classified panels:
there may be some object motions occur during a shot
sequence, it is suitable to select the middle frame to depict the 1) Normal Panel: obtained from a non-panning keyframe.
object during its movement. Panel D, and E in Fig. 4 are examples of normal panel.
C,

2) Panning: When the global optical flow of a shot 2) Wide Panel: obtained from a horizontal panning
sequence is pointing to one direction with a stable magnitude, keyframe. Panel A in Fig. 4 is an example of wide panel.
the shot sequence is said to be "panning". If the shot sequence 3) Semi-Wide Panel: obtained from a vertical panning
is panning, the algorithm generates a panorama keyframe. In keyframe with dimension wider than 11:9.
order to generate those keyframes, the first and the last frames 4) Square Panel: obtained from a vertical panning
must be intuitively selected from the shot sequence with keyframe with dimension narrower than 11:9 but not narrower
additional frames as needed. Once the first frame is selected, than a square. Panel B in Fig. 4 is an example of square panel.
the magnitude of a vector between two consecutive frames 5) Tall Panel: obtained from a vertical panning keyframe
starting from the first keyframe is determined. If the with dimension narrower than a square. Panel F in Fig. 4 is an
magnitude is greater than half of the frame width (or height) example of tall panel.
then the algorithm will select the latter as an additional frame.
The algorithm then uses the same criteria to recursively select It is regulated that a page contains four rows for
optimization, except the cases where a page contains no more
additional keyframes as necessary until the last frame of a shot
than one double arrangement. When a panel is single
sequence is reached. These selected keyframes are then arrangement, it consumes more spaces in a page, thus the
stitched together [10]. The keyframe generated from this type number of rows in a page is reduced to three in order to fit the
of shot sequence is a panoramic frame that covers every spaces.
element in the respective panning sequence.
With the definitions and initial settings mentioned earlier,
3) Zooming: This type of shot sequence occurs when most
the panel organizing is executed as described in Algorithm 1,
of the optical flows are surrounding around some specific
where P indicates all generated panels, and Pi is the ith panel
location, depicted objects are shrinking or enlarging. When and n is the number of all panels obtained from previous
zooming is detected, only one frame will be selected to sections. The algorithm is executed starting from the first panel
represent the shot sequence. In this case, the first frame will be PI in a collection and the value i is incremented along the
selected when the shot sequence is zooming in, and the last process. This algorithm is designed to organize the panels
frame will be selected when the shot sequence is zooming out. using general comics as an example, whereas it tries to
The frame that covers most contents is selected as a keyframe. optimize the page spaces as much as possible. It also orders the
panels so as to reduce confusion to the very least.

Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:42:15 UTC from IEEE Xplore. Restrictions apply.
2011 International Symposium on Intelligent Signal Processing and Communication Systems (lSPACS) December 7-9,2011

Algorithm 1. Panel Organizing

panelOrganizing(P)
while i < n

if (isWide(Pi) ) II(isSemiwide(Pi) )
setSingle(Pi)

i=i+l ..
else if (isNormal(Pi)&& isNormal(Pi+l»

if(isNormal(Pi+2)&& isNormal(Pi+3»

setDouble(Pi,Pi+l)
(b)
setSingle(Pi+2)

i=i+3

else if(isTall(Pi+2»

setLeftsidebar(Pi+2) (a)
i=i+3
Figure 5. Comparison between (a) a shot sequence and (b) a stitched
else if(isSquare(Pi+2) I l isSquare (Pi+3» panorama keyframe.
setDouble(Pi,Pi+l)

setDouble(Pi+2,Pi+3)

i=i+4

else

setSingle(Pi,Pi+l,Pi+2)

i=i+3

else if(isNormal(Pi)

&& (isSquare(Pi+l) IIisTall(Pi+l) ) )

setDouble(Pi,Pi+l)

i=i+2

else if(isSquare(Pi»

setDouble(Pi,Pi+l)
(b)
i=i+2

else if(isTall(Pi»

if(isNormal(Pi+l)&& isNormal(Pi+2»

setRightsidebar(Pi)
(a)
i=i+3 Figure 6. An example of a distorted panorama keyframe. (a) The hair is
else if (isSquare(Pi+l) I l isTall(Pi+l» swaying while the camera is panning up (b) Panorama keyframe generated
from two frames with distorted points circled.
setDouble(Pi,Pi+l)

i=i+2
The sample videos are decoded into MPEG videos that
else
contain only chunks of frames without audio feature. Each
setSingle(Pi,Pi+l,Pi+2)
video is 20 minutes long and consists of 28,800 frames. There
i=i+3 are four major types of videos being samples for testing: high­
else quality action, low-quality action, high-quality non-action, and
setSingle(Pi,Pi+l,Pi+2) low-quality non-action in which each type holds different
i=i+3 characteristics. The actions usually consist of more shot
transitions and dynamic shots compared to non-action, while
quality of the video determines how elaborate the frames are
IV. EXPERIMENTAL RESULTS AND DISCUSSION depicted. The high-quality video usually comes with detailed
The accuracy and performances of our proposed method motions that give better details to the resulted keyframes.
are evaluated on various types of cartoon animation videos. Fig. 5 shows how the proposed method passes three
Some limitations in keyframe generating step are also selected frames from the shot sequence as seen in Fig. 5(a) into
discussed in this section. The proposed method is evaluated the algorithm and generates a panorama keyframe that contains
based on contents accuracy using the comic books counterpart all important elements as seen in Fig. 5(b). It can be seen that a
of the samples. Note that the sample cartoon animation videos keyframe looks naturally stitched. The proposed keyframe
are chosen under the criteria that they have their own comic generating process gives a good result when a shot sequence is
book adaptation with the same contents, so that they can be panning with minimal amount of local optical flows; i.e.
used as an ideal model to compare with the results of the objects are not moving too much while the camera is panning.
proposed method. When some objects are moving along the panning scene, the

Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:42:15 UTC from IEEE Xplore. Restrictions apply.
2011 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) December 7-9,2011

stitched frames may generate a panorama keyframe that TABLE 1. GENERATED KEYFRAMES COMPARED WITH COMIC PANELS.
contains extra or distorted elements. Fig. 6 shows that the Ideal Panels Generated Panels
keyframe has some distortions on the girl's hair. This is Matched
Video Source % %
Panels Number Number
because the algorithm detects a panning shot sequence and Accuracy Accuracy
captures the fIrst and the last frames from the shot sequence, HQ-action 320 342 93.57 387 82.69
while the girl's hair is swaying. This panorama keyframe is LQ-action 349 366 95.36 432 80.79
HQ-non-action 232 245 94.69 255 90.98
classifIed as a distorted keyframe. LQ-non-action 214 224 95.54 271 78.97
Fig. 7 shows how the contents are exactly generated from
the proposed method compared with the comic books
adaptation, where the circled numbers represent the order of
each panels in the comic book and the order of organized
panels obtained from the proposed method. Note that Japanese
comic books are read from right to left and top to bottom, so do
the proposed method's organized panels. This comparison
shows that most of the comic strips generated from cartoon
animation by the proposed method have their contents matched
those of the respective comic adaptation. The order of scenes
and their contents are depicted and ordered accurately
according to their examples. Some frames even have their
aspect ratios altered by panorama process to match their
respective panels (pair no. 8).
The degree of accuracy is shown in Table I. The numbers in
the Matched Panels column denote the total number of
generated strip panels that have their contents matched exactly
with their respective comic adaptation panels. The panels are
considered matched when they share the same contents in
common sense. The numbers in the Ideal Panels column denote
the total number of panels featured in the comic books together
with the degree of accuracy when compared with the number
of matched panels. Likewise, the numbers in the Generated
Panels column denote the total number of strip panels
generated by the proposed method together with the degree of
accuracy when compared with the number of matched panels.
These results show that the degree of accuracy in Generated
Panels column is lower because the number of keyframes
generated along the process is always greater than the number
of comic panels. This is because when cartoons are animated, (a) (b)
they usually use several shot sequences to cover the contents
Figure 7. Comparison between (a) comic book version and (b) results from
that are able to be compressed into one panel in comic books.
the panel organizing algorithm.
Nevertheless, the degree of accuracy in Generated Panels
column is less signifIcant than in the Ideal Panels column The proposed method fIrstly marks the time code of a
because the main objective of this proposed method is to cover cartoon animation video based on shot boundary and optical
contents in the ideal model as much as possible. The surplus flow direction to separate periods of time into shot sequences.
panels aside from those contents usually add more details into Each shot sequence is then passed through the keyframe
it, and the proposed method manages to cover at least 90% of generating algorithm. By using the optical flow information,
the ideal contents in any cases. The comic strips are neatly the algorithm classifIes shot sequences and generates
generated, and its order is easy to follow without confusion. keyframes with various sizes. Each of these keyframes is then
treated as a comic strip panel and is organized together into
V. CONCLUSIONS
comic pages according to the proposed algorithm that intends
This paper proposes a novel method to generate comic to optimize space usage.
strips from cartoon animations. Concerning that the existing
fIxed-aspect-ratio video summarization methods [1], [2], [3], The results are examined by comparing the comic strips
[4], [5], [6], [7] lacks contents coverage, this paper proposes to obtained from the proposed method to their comic book
generate panorama keyframes with the aim to cover more adaptations. The results show that the generated comic strips
contents and to be more systematically represented. This manage to cover most of the contents featured in the comic
method is designed especially for video summarization of a books. Almost all of the panels have their contents exactly
cartoon animation. It exploits some characteristics unique from matched those in the ideal model which represent how the
other types of video media. results are supposed to be. The generated comic strips are also
easy to follow and comprehend by the users.

Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:42:15 UTC from IEEE Xplore. Restrictions apply.
2011 International Symposium on Intelligent Signal Processing and Communication Systems (lSPACS) December 7-9,2011

REFERENCES [6] W. Abd-Almageed, "Online, Simulteneous Shot Boundary Detection


and Key Frame Extraction for Sports Videos Using Rank Tracing," 15th
[1] B. Ionescu, V. Buzuloiu, P. Lambert, and D. Coquin, "Improved Cut IEEE International Conference on Image Processing. IEEE Computer
Detection for the Segmentation of Animation Movies," International Society, San Diego, California, 2008.
Conference on Acoustics, Speech, and Signal Processing. IEEE
[7] 1. Boreczky, A. Girgensohn, G. Golovchinsky, and S. Uchihashi, "An
Computer Society, Honolulu, Hawaii, 2007.
Interactive Comic Book Presentation for Exploring Video," CHI Letters,
[2] B. Ionescu, P. Lambert, and D. Coquin, and V. Buzuloiu, "The Cut ACM, The Hague, Amsterdam, 2000.
Detection Issue in the Animation Movie Domain," Journal of
[8] D.D. Le, S. Satoh, D.N. Thanh, and A.D. Duc, "A Text Segmentation
Multimedia 2(4), Academy Publisher, 2007.
Based Approach to Video Shot Boundary Detection," International
[3] M. Mentzelopoulos and A. Psarrou, "Key-Frame Extraction Algorithm Workshop on Multimedia Signal Processing. IEEE Computer Society,
using Entropy Difference," International Workshop on Multimedia Cairns, Queensland, 2008.
Information Retrieval. ACM, New York, 2004.
[9] B.D. Lucas and T. Kanade, "An Iterative Image Registration Technique
[4] E. Dumont and 8. Merialdo, "Sequence Alignment for Redundancy with an Application to Stereo Vision," Proceedings of Imaging
Removal in Video Rushes Summarization," Proceedings of the 2nd Understanding Workshop, Washington, D.C., 1981, pp. 121-130.
ACM TRECVid Video Summarization Workshop. ACM, Vancouver,
British Columbia, 2008. [10] SJ. Ha, H.I. Koo, S.H. Lee, N.I. Cho, and S.K. Kim, "Panorama Mosaic
Optimization for Mobile Camera Systems," IEEE Transactions on
[5] Y. Liu, Y. Liu, 1. Ren, and K.C.C. Chan, "Rushes Video Summarization Consumer Electronics 53(4), IEEE Computer Society, 2007.
using Audio-Visual Information and Sequence Alignment," Proceedings
of the 2nd ACM TRECVid Video Summarization Workshop. ACM,
Vancouver, British Columbia, 2008.

Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:42:15 UTC from IEEE Xplore. Restrictions apply.

You might also like