Professional Documents
Culture Documents
I. I NTRODUCTION
Comic books are a kind of narrative graphic publication
Fig. 1. An example of comic pages.
made up of comics pages in the form of separate frames
to represent individual scenes. As shown in Fig. 1, basically comics by panel and speech balloon extractions. Tanaka et
comics are expressed by distinct straight and curved lines in al. [7] analyzed the layouts of comics for the detection of
few colors. Since characters can be drawn in a surreal way with scene divisions. However, for the applications which require
dialogues in text balloons, comic books can provide visual im- to bridge the semantic gap between drawings and stories, such
ages of stories. Therefore, they have a large audience through- as the translation from comics to novels, these methods are
out the world. With the development of digital techniques, not sufficient.
the comics are not only limited in printed publications, but
also converted to eComics (digitalized comics) and distributed In comic books, cartoon characters are an essential part
through the Internet, which takes up 75% of the entire eBooks’ and usually act as cues of the story unfolds. As a consequence,
market [1]. The explosion of mobile devices offers some new their detection in comic pages is important for any semantic
opportunities to develop interactive reading of comics, through analysis. This detection task includes (1) locating characters in
the analysis of their content. comic pages and (2) identifying them, which corresponds to
specific character detection. Unlike specific object retrieval [8],
Besides being an important reading material, as the ninth besides rotations, occlusions and perspective transformations,
art, comics also show a strong impact on the fields of ed- the drawings of a specific comic character, such as face
ucations [2], culture promotions [3], entertainments [4] and expressions or poses, may be much different according to the
so on. For these applications, and more specifically for the numerous scenes of comic books. It is quite similar to the
implementation of new paradigms of reading, based on inter- detection task in Visual Object Classes (VOC) Challenge [9].
activity between content and reader, comic analysis technique However, we have to deal with the abstract line drawings with
is required. Based on the typical structure of comics, many few colors.
methods have been proposed in the literature. In et al. [5]
proposed a frame decomposition method for viewing comics For comic character detection, Sun and Kise [10] focus on
in cell phones. Ho et al. [6] tried to analyse the contents of their faces regions and proposed a method to detect similar
276
Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:39:48 UTC from IEEE Xplore. Restrictions apply.
detectors are defined as follows.
positive : Npos − Nneg > Tpos
(1)
negative : Nneg − Npos > Tneg
where Tpos and Tneg are the thresholds chosen by the experi-
ments (section VI-B). For the balance between positive detec-
tors and negative detectors, the weights of positive detectors
i i
Wpos and negative detectors Wneg are defined as
(a) ⎧ i
⎨W i = Mneg Mpos
pos M pos
i (2)
⎩W i = Mpos Mneg
neg Mneg
Mpos and Mneg are the number of total positive and negative
matchings, respectively.
V. C HARACTER DETECTION
(b) In detection step, the same frame decomposition and the
feature extraction are applied to query comics pages. For each
Fig. 4. Examples of local feature matching between two scenes (frames). frame, we apply normalized cut algorithm [13] for segmenta-
The similar SIFT keypoints are connected by the lines (matchings). Not only
features from the character, but also some similar ones from backgrounds are tion. In this research, the boundary detector in [14] is utilized
matched. instead of Canny detector. N is set to 20, therefore, each frame
is segmented into 20 regions, as shown in Fig. 3(c).
On the other hand, the features from the query frames
IV. D ETECTOR TRAINING
are matched with the detectors by the same matching method
To define the detectors of specific comic characters, we described in section III. The score Sr of region r is defined as
need to select their discriminative features from all the ex-
tracted features. The selection is based on the frequencies of Sr = Wpos
r
− Wneg
r
(3)
features in the comics. As shown in Fig. 1, the drawings of where Wposr
and Wneg
r
represent the weights of positive and
the main characters appear in most of the frames. In another negative detectors matched with region r. The regions with the
word, their features are also applied repeatedly throughout the score S > Ts are reported as the detected results. Ts is the
comics, and the occurrence frequency of their stable features threshold which is decided by experiments (section VI-B).
is relatively higher than the unstable ones.
The similar features are searched by local feature matching VI. E XPERIMENTS
and selected by their frequency in the training dataset of the A. Conditions
comics. Specifically, first, we collect some comics pages as
the training dataset, in which the object characters are labelled To prove the effectiveness of the proposed method, we
by bounding boxes, as shown in Fig. 3(b). Next, the comics collected 6 titles of French comics 1 . For each title, there are
pages are decomposed into frames by using the method in [11]. 10 comics pages, in which 8 pages (8 panels by pages = 64
Between every two frames, local feature matching is done to panels) were applied as the training dataset and the rest 2
obtain the similar features. As shown in Fig. 4, there are many pages (16 panels) were treated as queries. Therefore, queries
similar features detected from the same character in different contain the same object characters in the training dataset but
frames, while there are also some matched features from outer different expressions. We labelled one of main characters for
parts, such as text, tones and other characters. To make sure each title by bounding boxes. The ground truths are defined as
that the matched features are what we need, we define the the regions whose centroids are in the bounding boxes, such
matchings that both of the matched features are extracted from as the red regions shown in Fig. 3(c). The results are reported
bounding boxes of the same character as positive matchings, by the recall and precision of the detected regions defined as
otherwise negative ones.
Recall = Rc /Rg , Precision = Rc /Rd
After matching all the frames in the training dataset,
the number of positive matchings Npos and the number of where Rc , Rd , Rg represent the numbers of correctly detected
negative matchings Nneg are counted for every keypoint. The regions, detected regions and ground truths, respectively. The
discriminative features should have high-frequency positive thresholds of the proposed method were decided in the first
matchings and low-frequency negative matchings. On the other experiments (section VI-B) by one title of comics. Then, we
hand, the patterns with high-frequency negative matchings tested the performance of the proposed method on the other
like tones and texts have adverse effect on the detection. In comics in the second experiment (section VI-C).
this research, not only positive features are selected, but also 1 They are ’Bubblegôm Gôm’, ’Cosmozone’, ’Les aventures de MouetteMan
negative ones are applied for the suppression of false positive et TeckelBoy’, ’Les Noeils’, ’Zig Et Puce’ and ’Game Over’. These pages are
detection. The certifications of positive detectors and negative extracted from the eBDtheque database [15].
277
Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:39:48 UTC from IEEE Xplore. Restrictions apply.
TABLE I. PARAMETER SETTING .
Ts 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Tpos 1, 2, 3, 4, 5
Tneg 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
(a) (b)
(c) (d)
Fig. 5. Interpolated recall-precision graph of detected regions for title 1.
B. Threshold selection
(e) (f)
First, we selected the parameters of the proposed method
Fig. 6. Example of detectors. (The images are from the training dataset. The
based on its performance on one title of comics (title 1). Other red keypoints are positive detectors, and the green ones are negative detectors.
tests have been done with other titles and slight variations The characters full of negative detectors (on the left of (b) and (c)) are not
of the thresholds do not affect the results of detection. The object characters.)
parameter settings are shown in Table I. With different com-
binations of Ts , Tpos and Tneg , we obtained the interpolated
recall-precision graph as shown in Fig. 5. The examples of the detection results are shown in Fig. 7.
The red, green and blue regions represent correctly detected,
In most cases, the recall and precision are interchangeable false positive and the false negative regions, respectively. The
by adjusting parameters. For character detection, if one region red keypoints are matched with positive detectors, and the
is correctly detected, we can recognize the character and green ones are matched with negative detectors. We can see
obtain its approximate position. Therefore, we focused on that
high-precision parts and chose the parameters which achieved
the maximum F-measure with the precision above 70%. Tpos , • For most of the characters, at least one part of their
Tneg and Ts were set as 1, 5, 3, respectively for the rest of drawings is correctly detected.
experiments. • The negative detectors successfully suppressed the
false positive detections. Such as shown in Fig. 7(f),
(g) and (h), although there are some features from
C. Character detection
backgrounds matched with positive detectors, the re-
In this experiment, we tested the proposed methods on gions are not falsely detected considering the match-
other titles of comics. ings with negative detectors.
Based on the parameters issuing from the first experiment, The low recall of the detected regions is mainly due to
we trained the detectors for the 6 titles of comics. The exam- the fact that some characters are segmented into small regions
ples of the detectors are shown in Fig. 6. The red keypoints are in which there are no discriminative features. In addition, the
positive detectors, and the green ones are negative detectors. drawings of comic characters contain many terrific changes for
We can see that (1) the positive detectors are not located only different frames. As shown in Fig. 7(h) which is from title 5
on the face regions but also on other parts of the characters, and contains the same character in Fig. 7(g), the character
and (2) there are many negative detectors extracted from the is wearing a hat which covers its features from the hair,
parts like texts, tones and non-object characters. meanwhile the grass patterns from the backgrounds have the
similar features that led to the false positive detections.
Then, we applied the detectors to detect the object char-
acters in the queries. The recall and precision of the detected Although many regions were not detected, most of the
regions are shown in Table II. Except title 5, the proposed characters contain similar features as positive detectors. If
method achieved the over 20% recall with the precision above taking account of the character detection (at least one part
68%. of the characters was detected), 70% of the characters were
278
Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:39:48 UTC from IEEE Xplore. Restrictions apply.
(a) (b) (c) (d)
279
Authorized licensed use limited to: Kwame Nkrumah Univ of Science and Technology. Downloaded on April 07,2024 at 07:39:48 UTC from IEEE Xplore. Restrictions apply.