You are on page 1of 5

2013 12th International Conference on Document Analysis and Recognition

Specific Comic Character Detection


Using Local Feature Matching
Weihan Sun∗ , Jean-Christophe BURIE† , Jean-Marc OGIER† and Koichi Kise∗
∗ Department of Computer Science and Intelligent Systems, Osaka Prefecture University, Osaka, Japan
Email: sunweihan@m.cs.osakafu-u.ac.jp, kise@cs.osakafu-u.ac.jp
† L3i laboratory, University of La Rochelle, Avenue Michel Crépeau, 17042 La Rochelle cedex 1, France
Email: {jean-christophe.burie, jean-marc.ogier}@univ-lr.fr

Abstract—Comic books are a kind of storytelling graphic


publications mainly expressed by abstract line drawings. As a
clue of story lines, comic characters play an important role
in the story, and their detection is an essential part of comic
book analysis. For this purpose, the task includes (1) locating
characters in comics pages and (2) identifying them, which is
called specific character detection. Corresponding to different
scenes of comic books, one specific character can be represented
by various expressions coupled with rotations, occlusions, and
other perspective drawing effects, which challenge the detection.
In this paper, we focus on stable features regarding the possible
transformations and proposed a framework to detect them.
Specifically, some discriminative features are selected as detectors
for characterizing characters, on the basis of a training dataset.
Based on the detectors, the drawings of the same characters
in different scenes can be detected. The methodology has been
experimented and validated on 6 titles of comics. Despite the ter-
rific changes for different scenes, the proposed method achieved
detection of 70% comic characters.
Keywords—comic book, comic analysis, comic character, spe-
cific character detection, local feature matching

I. I NTRODUCTION
Comic books are a kind of narrative graphic publication
Fig. 1. An example of comic pages.
made up of comics pages in the form of separate frames
to represent individual scenes. As shown in Fig. 1, basically comics by panel and speech balloon extractions. Tanaka et
comics are expressed by distinct straight and curved lines in al. [7] analyzed the layouts of comics for the detection of
few colors. Since characters can be drawn in a surreal way with scene divisions. However, for the applications which require
dialogues in text balloons, comic books can provide visual im- to bridge the semantic gap between drawings and stories, such
ages of stories. Therefore, they have a large audience through- as the translation from comics to novels, these methods are
out the world. With the development of digital techniques, not sufficient.
the comics are not only limited in printed publications, but
also converted to eComics (digitalized comics) and distributed In comic books, cartoon characters are an essential part
through the Internet, which takes up 75% of the entire eBooks’ and usually act as cues of the story unfolds. As a consequence,
market [1]. The explosion of mobile devices offers some new their detection in comic pages is important for any semantic
opportunities to develop interactive reading of comics, through analysis. This detection task includes (1) locating characters in
the analysis of their content. comic pages and (2) identifying them, which corresponds to
specific character detection. Unlike specific object retrieval [8],
Besides being an important reading material, as the ninth besides rotations, occlusions and perspective transformations,
art, comics also show a strong impact on the fields of ed- the drawings of a specific comic character, such as face
ucations [2], culture promotions [3], entertainments [4] and expressions or poses, may be much different according to the
so on. For these applications, and more specifically for the numerous scenes of comic books. It is quite similar to the
implementation of new paradigms of reading, based on inter- detection task in Visual Object Classes (VOC) Challenge [9].
activity between content and reader, comic analysis technique However, we have to deal with the abstract line drawings with
is required. Based on the typical structure of comics, many few colors.
methods have been proposed in the literature. In et al. [5]
proposed a frame decomposition method for viewing comics For comic character detection, Sun and Kise [10] focus on
in cell phones. Ho et al. [6] tried to analyse the contents of their faces regions and proposed a method to detect similar

1520-5363/13 $26.00 © 2013 IEEE 275


DOI 10.1109/ICDAR.2013.62
Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:39:48 UTC from IEEE Xplore. Restrictions apply.
(a) (b) (c)
Fig. 3. Examples of the processing. (a) shows detected SIFT keypoints. (b)
shows a labelled character by a bounding box. (c) shows the segmentation of
a frame (Red regions are the ground truth of the object character).

On the other hand, by the same processing, the query


comics pages are also decomposed into frames. Then, the
feature extraction and the image segmentation are applied.
By scoring according to the feature matchings between query
Fig. 2. Outline of the proposed method. frames and the detectors, finally, the regions belonging to the
specific characters will be reported by the system.
copies of manga (Japanese style comics) characters. However,
this approach cannot be applied in a context where characters III. L OCAL FEATURE MATCHING
are not uniquely represented by their faces, and requires to The fundamental idea of the proposed method is based on
have a larger view on the whole body to represent them. local feature matching. As explained earlier in this paper, draw-
Especially in American and European comics, instead of using ings of the same comic character are changing corresponding
significant faces, features from the other parts of the characters to the scenes, such as face expressions or poses. However, there
like bodies, clothes and decorations are typically used for their are some characteristic parts for their recognition. To obtain
representations. these parts, we need to find the similar features belonging to
In this paper, we propose a framework to detect specific the same character in different scenes. We propose to apply
characters considering all parts of the characters. Our first stage local feature matching for this purpose, since local features of
consists in defining a training dataset allowing to characterize images are robust to image transformations.
each character by local features. The training dataset relies In this research, we apply the SIFT (Scale-Invariant Feature
on bounding boxes surrounding specific characters. Using the Transform) [12] algorithm to detect the local features, which
local features matching, the discriminative features are used as has been proved to be invariant to image rotation, scaling,
detectors in order to detect the drawings of the same characters translation, and partial illumination change. The algorithm
in different scenes (or panels). The experiments have been done includes two parts: (1) the keypoints detection from the Differ-
on 6 titles of comics. As results, 70% of the comic characters ence of Gaussians (DoG) images of multiple scales, and (2) the
were correctly detected by the proposed method. The contri- feature description based on the regions around the keypoints.
butions of this paper is two-fold: (1) discriminative features Thanks to the feature extraction, we can obtain a group of
from all parts of specific comic characters are revealed, and features Fi = {xi , yi , f i } from the whole image as shown in
(2) their extraction and their use for the detection of specific Fig. 3(a), where (xi , yi ) is the position of the keypoint i, and
comic characters are detailed. f i is a 128-dimensional feature vector corresponding to the
The rest of this paper is arranged as follows: Section 2 keypoint.
provides an overview of the approach, section 3 introduces
For the matching of similar keypoints between two images
the local feature matching method. Section 4 and section 5
(I1 and I2 ), Euclidean distances between their feature vectors
describe the details of the detector training and the character
are applied. Denote q as one feature vector extracted from I1 ,
detection. Experiments and results are shown in Section 6.
a reliable matching is defined as the follows.
Finally, conclusions and future work are given in section 7.
• q is only matched with its nearest neighbor p1 in I2 ,
II. A PPROACH OUTLINE
To detect specific comic characters, we propose a frame- • D(q, p1 ) < T1 , where D(q, p1 ) is the distance
work using local feature matching. The approach includes between q and p1 , T1 is the threshold,
two parts: the detector training and the character detection, • In addition, we apply the matching strategy of distance
as shown in Fig. 2. ratio between the first and second nearest neighbors
like [12] to make the matching more discriminative.
For detector training step, we first build a training dataset Thus, q and p1 are matched iff D(q, p1 )/D(q, p2 ) <
of comics pages. Frame decomposition allows to divide comics T2 , where p2 is the second nearest neighbor of q and
pages into individual frames. For each frame, the object T2 is the threshold.
characters are labelled by bounding boxes. Meanwhile, the
feature extraction is applied to every frame. By local feature In this research, T1 and T2 are set empirically. The examples
matching, the discriminative features are selected as represen- of local feature matchings are shown in Fig. 4. Similar features
tative detectors for each character. are matched from not only the characters but also backgrounds.

276

Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:39:48 UTC from IEEE Xplore. Restrictions apply.
detectors are defined as follows.

positive : Npos − Nneg > Tpos
(1)
negative : Nneg − Npos > Tneg
where Tpos and Tneg are the thresholds chosen by the experi-
ments (section VI-B). For the balance between positive detec-
tors and negative detectors, the weights of positive detectors
i i
Wpos and negative detectors Wneg are defined as
(a) ⎧  i
⎨W i =  Mneg Mpos
pos M pos
 i (2)
⎩W i =  Mpos Mneg
neg Mneg

where M represents the matching number of detector i,


i

Mpos and Mneg are the number of total positive and negative
matchings, respectively.

V. C HARACTER DETECTION
(b) In detection step, the same frame decomposition and the
feature extraction are applied to query comics pages. For each
Fig. 4. Examples of local feature matching between two scenes (frames). frame, we apply normalized cut algorithm [13] for segmenta-
The similar SIFT keypoints are connected by the lines (matchings). Not only
features from the character, but also some similar ones from backgrounds are tion. In this research, the boundary detector in [14] is utilized
matched. instead of Canny detector. N is set to 20, therefore, each frame
is segmented into 20 regions, as shown in Fig. 3(c).
On the other hand, the features from the query frames
IV. D ETECTOR TRAINING
are matched with the detectors by the same matching method
To define the detectors of specific comic characters, we described in section III. The score Sr of region r is defined as
need to select their discriminative features from all the ex-  
tracted features. The selection is based on the frequencies of Sr = Wpos
r
− Wneg
r
(3)
features in the comics. As shown in Fig. 1, the drawings of where Wposr
and Wneg
r
represent the weights of positive and
the main characters appear in most of the frames. In another negative detectors matched with region r. The regions with the
word, their features are also applied repeatedly throughout the score S > Ts are reported as the detected results. Ts is the
comics, and the occurrence frequency of their stable features threshold which is decided by experiments (section VI-B).
is relatively higher than the unstable ones.
The similar features are searched by local feature matching VI. E XPERIMENTS
and selected by their frequency in the training dataset of the A. Conditions
comics. Specifically, first, we collect some comics pages as
the training dataset, in which the object characters are labelled To prove the effectiveness of the proposed method, we
by bounding boxes, as shown in Fig. 3(b). Next, the comics collected 6 titles of French comics 1 . For each title, there are
pages are decomposed into frames by using the method in [11]. 10 comics pages, in which 8 pages (8 panels by pages = 64
Between every two frames, local feature matching is done to panels) were applied as the training dataset and the rest 2
obtain the similar features. As shown in Fig. 4, there are many pages (16 panels) were treated as queries. Therefore, queries
similar features detected from the same character in different contain the same object characters in the training dataset but
frames, while there are also some matched features from outer different expressions. We labelled one of main characters for
parts, such as text, tones and other characters. To make sure each title by bounding boxes. The ground truths are defined as
that the matched features are what we need, we define the the regions whose centroids are in the bounding boxes, such
matchings that both of the matched features are extracted from as the red regions shown in Fig. 3(c). The results are reported
bounding boxes of the same character as positive matchings, by the recall and precision of the detected regions defined as
otherwise negative ones.
Recall = Rc /Rg , Precision = Rc /Rd
After matching all the frames in the training dataset,
the number of positive matchings Npos and the number of where Rc , Rd , Rg represent the numbers of correctly detected
negative matchings Nneg are counted for every keypoint. The regions, detected regions and ground truths, respectively. The
discriminative features should have high-frequency positive thresholds of the proposed method were decided in the first
matchings and low-frequency negative matchings. On the other experiments (section VI-B) by one title of comics. Then, we
hand, the patterns with high-frequency negative matchings tested the performance of the proposed method on the other
like tones and texts have adverse effect on the detection. In comics in the second experiment (section VI-C).
this research, not only positive features are selected, but also 1 They are ’Bubblegôm Gôm’, ’Cosmozone’, ’Les aventures de MouetteMan
negative ones are applied for the suppression of false positive et TeckelBoy’, ’Les Noeils’, ’Zig Et Puce’ and ’Game Over’. These pages are
detection. The certifications of positive detectors and negative extracted from the eBDtheque database [15].

277

Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:39:48 UTC from IEEE Xplore. Restrictions apply.
TABLE I. PARAMETER SETTING .
Ts 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Tpos 1, 2, 3, 4, 5
Tneg 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

(a) (b)

(c) (d)
Fig. 5. Interpolated recall-precision graph of detected regions for title 1.

TABLE II. R ESULTS OF DETECTED REGIONS .


Title 1 2 3 4 5 6
Recall [%] 23 36 50 33.3 23.6 47
Precision [%] 72 81.8 68 100 57 97.8

B. Threshold selection
(e) (f)
First, we selected the parameters of the proposed method
Fig. 6. Example of detectors. (The images are from the training dataset. The
based on its performance on one title of comics (title 1). Other red keypoints are positive detectors, and the green ones are negative detectors.
tests have been done with other titles and slight variations The characters full of negative detectors (on the left of (b) and (c)) are not
of the thresholds do not affect the results of detection. The object characters.)
parameter settings are shown in Table I. With different com-
binations of Ts , Tpos and Tneg , we obtained the interpolated
recall-precision graph as shown in Fig. 5. The examples of the detection results are shown in Fig. 7.
The red, green and blue regions represent correctly detected,
In most cases, the recall and precision are interchangeable false positive and the false negative regions, respectively. The
by adjusting parameters. For character detection, if one region red keypoints are matched with positive detectors, and the
is correctly detected, we can recognize the character and green ones are matched with negative detectors. We can see
obtain its approximate position. Therefore, we focused on that
high-precision parts and chose the parameters which achieved
the maximum F-measure with the precision above 70%. Tpos , • For most of the characters, at least one part of their
Tneg and Ts were set as 1, 5, 3, respectively for the rest of drawings is correctly detected.
experiments. • The negative detectors successfully suppressed the
false positive detections. Such as shown in Fig. 7(f),
(g) and (h), although there are some features from
C. Character detection
backgrounds matched with positive detectors, the re-
In this experiment, we tested the proposed methods on gions are not falsely detected considering the match-
other titles of comics. ings with negative detectors.
Based on the parameters issuing from the first experiment, The low recall of the detected regions is mainly due to
we trained the detectors for the 6 titles of comics. The exam- the fact that some characters are segmented into small regions
ples of the detectors are shown in Fig. 6. The red keypoints are in which there are no discriminative features. In addition, the
positive detectors, and the green ones are negative detectors. drawings of comic characters contain many terrific changes for
We can see that (1) the positive detectors are not located only different frames. As shown in Fig. 7(h) which is from title 5
on the face regions but also on other parts of the characters, and contains the same character in Fig. 7(g), the character
and (2) there are many negative detectors extracted from the is wearing a hat which covers its features from the hair,
parts like texts, tones and non-object characters. meanwhile the grass patterns from the backgrounds have the
similar features that led to the false positive detections.
Then, we applied the detectors to detect the object char-
acters in the queries. The recall and precision of the detected Although many regions were not detected, most of the
regions are shown in Table II. Except title 5, the proposed characters contain similar features as positive detectors. If
method achieved the over 20% recall with the precision above taking account of the character detection (at least one part
68%. of the characters was detected), 70% of the characters were

278

Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:39:48 UTC from IEEE Xplore. Restrictions apply.
(a) (b) (c) (d)

(e) (f) (g) (h)


Fig. 7. Example of detection results. (The red regions are the correctly detected. The green regions are false positive regions and the blue ones are false
negative ones. The red keypoints are matched with positive detectors and the green keypoints are matched with negative detectors.)

TABLE III. R ESULTS OF CHARACTER DETECTION .


This work is also supported by the region Poitou-Charentes
Title 1 2 3 4 5 6 (France), the General Council of Charente Maritime (France)
Detection rate [%] 78 85.7 100 100 64.7 61 and the town of La Rochelle (France).
R EFERENCES
successfully detected by the proposed method. The character [1] “eBook Business 2008 Report”, Impress R&D, 2009.
detection rates for titles are shown in Table III. [2] http://www.teachingcomics.org
[3] http://www.animenewsnetwork.com/news/2007-02-13/japanese-
government-promote-anime-and-manga
VII. C ONCLUSION AND FUTURE WORK [4] http://goodereader.com/blog/electronic-readers/digital-reading-allows-
game-backstory-to-become-comics/
In this paper, we focus on the problem of specific comic [5] Y. In, T. Oie, M. Higuchi, S. Kawasaki, A. Koike and H. Murakami,
character detection, which is important for the comic analysis. “Fast frame decomposition and sorting by contour tracing for mobile
It is a difficult task, because of the abstract expressions and phone comic images”, International Journal of Systems Applications,
Engineering and Development, vol. 5(2), pp. 216–223, 2011.
large variability and transformations of comic characters. Con- [6] A. Ho, J. Burie and J. Ogier, “Comics page structure analysis based
sidering the stable parts of the characters in different scenes, on automatic panel extraction”, Proceedings of the 9th International
we proposed a framework to extract the discriminative features Workshop on Graphics Recognition, 2011.
and detect specific characters by local feature matching. By [7] T. Tanaka, K. Shoji, F. Toyama, and J. Miyamichi, “Layout Analysis
of Tree-structured Scene Frames in Comic Images”, Proceedings of the
experiments, we revealed the discriminative features of specific 20th Int. Joint Conf. on Artifical Intelligence, pp. 2885–2890, 2007.
comic characters, by which the characters can be detected even [8] R. Arandjelović, and A. Zisserman, “Multiple queries for large scale
if there are strong changes in different scenes. specific object retrieval”, British Machine Vision Conference, pp. 92.1–
92.11, 2012.
The future work for this research includes : [9] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn and A.
• importing other features and spatial organization of Zisserman, “The Pascal Visual Object Classes (VOC) Challenge”,
International Journal of Computer Vision, vol. 88(2), pp. 303–338,
keypoints for the descriptions of comic characters, 2010.
• increasing the recall and precision of the proposed [10] W. Sun and K. Kise, “Similar Partial Copy Detection of Line Draw-
method, ings Using a Cascade Classifier and Feature Matching”, International
• applying more precise segmentation method for [11]
Workshop on Computational Forensics, pp. 121–132, 2010.
C. Rigaud, N. Tsopze, J. Burie and J. Ogier, “Robust Frame and Text
comics, Extraction From Comic Books”, Lecture Note in Computer Science,
• building a larger dataset to test the proposed method. special issue GREC’11, vol. 7423, pp. 129–138, 2011.
[12] D. G. Lowe, “Distinctive Image Features From Scale-invariant Key-
• comparing to other object detection methods. points”, International Journal of Computer Vision, vol. 60(2), pp.91–
110, 2004.
[13] J. Shi and J. Malik, “Normalized Cuts and Image Segmentation”, IEEE
ACKNOWLEDGMENT Trans. PAMI, vol 22(8), pp. 888–905, 2000.
[14] D. Martin, C. Fowlkes, and J. Malik, “Learning to Detect Natural Image
This research was supported in part by the Grant-in-Aid for Boundaries Using Brightness and Texture”, NIPS 02, 2002.
Scientific Research (B)(22300062) and JSPS Fellows (248683) [15] eBDtheque database, L3i, University of La Rochelle,
from Japan Society for the Promotion of Science (JSPS). http://ebdtheque.univ-lr.fr/database/

279

Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:39:48 UTC from IEEE Xplore. Restrictions apply.

You might also like