You are on page 1of 10

IJDAR (2017) 20:59–68

DOI 10.1007/s10032-016-0277-z

ORIGINAL PAPER

Chinese calligraphic style representation for recognition


Gao Pengcheng1 · Gu Gang1 · Wu Jiangqin1 · Wei Baogang1

Received: 29 July 2015 / Revised: 28 January 2016 / Accepted: 1 December 2016 / Published online: 18 January 2017
© Springer-Verlag Berlin Heidelberg 2017

Abstract Chinese calligraphy draws a lot of attention for ligraphic style recognition. It turns out our method achieves
its beauty and elegance. The various styles of calligraphic about equal accuracy comparing with the fine-tuned alex-net
characters make calligraphy even more charming. But it but with much less training time. Furthermore, the algorithm
is not always easy to recognize the calligraphic style cor- style discrimination evaluation is developed to evaluate the
rectly, especially for beginners. In this paper, an automatic discriminative style quantitatively.
character styles representation for recognition method is pro-
posed. Three kinds of features are extracted to represent Keywords Calligraphy · Style recognition · Global feature ·
the calligraphic characters. Two of them are typical hand- Local feature · Deep feature
designed features: the global feature, GIST and the local
feature, scale invariant feature transform. The left one is
deep feature which is extracted by a deep convolutional neu- 1 Introduction
ral network (CNN). The state-of-the-art classifier modified
quadratic discriminant function was employed to perform Chinese calligraphy is one of the finest Chinese art forms
recognition. We evaluated our method on two calligraphic and the most important and inseparable part of Chinese his-
character datasets, the unconstraint real-world calligraphic tory. Its delicate esthetic effects are generally considered to be
character dataset (CCD) and SCL (the standard calligraphic unique among all calligraphic arts. In the last decade, numer-
character library). And we also compare MQDF with other ous collections of historical Chinese calligraphic works were
two classifiers, support vector machine and neural network, digitized and stored, great progress has been made in the
to perform recognition. In our experiments, all three kinds of area of computer aided calligraphy research, and several
feature are evaluated with all three classifiers, respectively, novel approaches have been proposed to address calligraphic
finding that deep feature is the best feature for calligraphic problems, including calligraphy processing and analysis [2],
style recognition. We also fine-tune the deep CNN (alex- calligraphic character retrieval [3,4] and recognition [5–7],
net) in Krizhevsky et al. (Advances in Neural Information visualization [8] and specific style rendering [9]. Automatic
Processing Systems, pp. 1097–1105, 2012) to perform cal- Chinese calligraphic character generating method [10,11]
and calligraphy beautification method [12] are also devel-
B Wu Jiangqin oped.
wujq@zju.edu.cn
Lots of people are obsessed by the beauty of calligraphy.
Gao Pengcheng The various styles of calligraphic characters make calligra-
gaopengcheng@zju.edu.cn
phy even more charming. For the same character, there are
Gu Gang always several ways of writing, each with special taste, as
gugang@zju.edu.cn
shown in Fig. 1. But how to describe the styles of calligraphic
Wei Baogang character? Generally the style of Chinese calligraphy char-
wbg@zju.edu.cn
acters mainly depends on the posture of the strokes and the
1 College of Computer Science, Zhejiang University, structure of the characters. The width of characters in differ-
Hangzhou, China ent styles is also different. As we can see in Fig. 1, the stokes

123
60 G. Pengcheng et al.

data in [16]. All these methods focused on printed char-


acters, which are usually uniform. But for the handwritten
calligraphic characters, strokes are usually changeful which
results in great challenge in style recognition.
Fig. 1 Five main styles of the same character. From left to right: seal It is very natural that we extract some features from the
script, clerical script, standard script, semi-cursive script, and cursive calligraphic characters. There are mainly two categories of
script hand-engineered feature for image representation: global fea-
ture and local feature. One popular global feature is Gabor
of seal script character are much thinner than the strokes of feature. Zhuang et al.proposed the latent style model to dis-
clerical script character. For those who are familiar with Chi- cover writing styles for calligraphic works [17]. In their
nese calligraphy, they can easily tell the differences between approach, they used 2D Gabor filter to extract texture fea-
different styles. But for the beginner, it is not always the ture and verified its discriminability for calligraphy writing
same case. In CADAL,1 which is the biggest digital library style classification, the recognition ratio is more than 56 %
in China, plenty of digital calligraphic character images are for each style. They also used Gabor feature to classify
collected. Due to the large scale of calligraphic character the writing style of Chinese words in [18]. It proves that
images, automatic method which is used to label all these Gabor feature is feasible for discriminating different cal-
calligraphic images is needed to provide better service to ligraphy writing styles, but the performance still needs to
users. But how to represent the style of the calligraphic char- be improved. Another global feature called GIST descriptor
acters using computer is still a challenging work. which was originally proposed in [19] has recently shown
In this paper, different kinds of feature will be evaluated good results for calligraphic character image searching [6].
quantitatively for the representation of calligraphic style. It does not require any form of segmentation and use low
Qualitative visualization will be performed to show how dimension to represent the image. Besides the global fea-
efficient it could be to represent calligraphic style using ture, local feature is also adopted in many applications. One
different kinds of feature. Furthermore, an automatic recog- of the most famous local feature is scale invariant feature
nition method will be used to perform style recognition on transform (SIFT) descriptor. SIFT descriptor was originally
different kinds of features. developed by David Lowe [20], and it was successfully used
The remainder of this paper is organized as follows: A in recognition, stitching and many other applications because
brief related work is summarized in Sect. 2. Section 3 gives of its robustness. Another local feature called PCA-SIFT
the detail of feature extraction. Section 4 describes the clas- [21] which was proposed by Yan Ke used principal compo-
sifiers used in this paper. Experimental results and analysis nent analysis (PCA) to normalize the gradient patch instead
are given in Sect. 5. And in Sect. 6, we give our conclusion. of histogram. Herbert [22] presented a faster method called
SURF, which used Fast-Hessian detector. Rublee [23] also
proposed the ORB feature based on the well-known FAST
2 Related work keypoint detector [24] and the BRIEF descriptor [25]. All
these local feature extraction methods are usually comprised
For the printed characters, character style is more like the of two stages. In the first stage, detection of interesting salient
character font. Font recognition is a fundamental problem points is performed. In the second stage, descriptors are com-
in document analysis and recognition and closely relates puted to characterize the appearance of the local regions
with optical character recognition (OCR) techniques. For surrounding these salient points. For the calligraphic char-
the printed character, the first step is to classify the font of acter images, we wish to find some keypoints that contain
the character, then recognize the character in the single font the information of style.
character library. The recognition method reduces the influ- It is always a challenging work to discover effective
ence of alternative shapes for each class of font. Besides, representations that capture salient semantics for percep-
font recognition can be used to achieve automatic typeset- tual learning. The conventional visual representation, based
ting. In [13,14], character feature basing on texture analysis on flat feature representations involving quantized gradi-
and wavelet transform was extracted to perform font recogni- ent filter, has achieved impressive performance but has
tion. In [15] sparse discriminative information preservation likely plateaued in recent years. Besides all the above hand-
(SDIP) was proposed select printed character features for engineered representations, there are also some other features
further font recognition. Furthermore, recurrent neural net- generated by deep model. It has long been argued that deep or
work was introduced to solve the font recognition with noisy layered compositional architectures should be able to capture
salient aspects of given domain [26,27]. The deep convolu-
1 China Academic Digital Associate Library - http://www.cadal.zju. tional neural networks (CNN) proposed in [1] have achieved
edu.cn. competition-winning numbers on large benchmark datasets

123
Chinese calligraphic style representation for recognition 61

consisting of more than one million images. But in some


other tasks, the applicability of deep CNN is greatly lim-
ited because of the difficulty of collecting large-scale image
datasets. To address this problem, [28] proposed to transfer
image representations learned with CNNs on large datasets
to other tasks with limited training data.
Fig. 2 Calligraphic character images and their corresponding GIST
In this paper, we chose three kinds of feature to repre- descriptors
sent the style of the calligraphic character. The typical global
feature, GIST descriptor and the typical local feature, SIFT
descriptor are evaluated. The last kind of feature is extracted
by deep network, which is called deep feature in this paper.
The deep feature was generated by model used in [27]. With
all these three features extracted, there are at least two impor-
tant questions remain:

• Which feature could capture the calligraphic styles infor- Fig. 3 Calligraphic character image and its corresponding SIFT key-
mation better? The hand-designed features or the deep points descriptors
feature?
• How the calligraphic styles are related to each other? Is For the global feature, GIST descriptor extraction, the
there a clear distinction between different styles? image is decomposed by a bank of multi-scale oriented filters.
Here in this paper, we use 8 orientations and 3 scales. Next,
We address these questions both qualitatively and quanti- the output magnitude of each filter is averaged over 9 non-
tatively, via visualizations of different kinds of feature, and overlapping windows arranged on a 3 × 3 grid. So the final
experimental comparison to current state-of-the-art method character image representation is a 3 × 8 × 9 = 216 dimen-
in the following sections. sional vector, which can be seen as a point p in d (d = 216).
Figure 2 shows two binary calligraphic character images and
their corresponding GIST descriptors.
3 Style representation
3.2 Local feature
In this section, three kinds of features are extracted to cap-
ture the calligraphic style information from the character As the typical local feature, SIFT descriptor is successfully
images, in which some hidden patterns of style information used in recognition, stitching and many other applications
are expected. Performance based on these features will be because of its robustness. Like many other local feature
evaluated in the following sections. extraction, SIFT descriptor extraction also includes two
stages. In the first stage, detection of interesting salient points
3.1 Global Feature is performed. In the second stage, descriptors are computed to
characterize the appearance of the local regions surrounding
As the typical global feature, GIST, which was initially these salient points. For the calligraphic character images,
proposed in [19] is widely applied in image representa- we wish to find some keypoints that contain the information
tion [6,29,30]. GIST descriptor gives a whole view of the of style and then to extract features from the region around
character image. Because GIST includes all levels of visual the keypoints.
information-ranging from low-level feature (e.g., contours) For the local feature, SIFT descriptor extraction, keypoints
to intermediate (e.g., shapes) and high-level information are detected first. Then the gradient magnitude and orienta-
(e.g., activation of sematic knowledge), it can be represented tion at each character image sample point in a region (16×16
at both perceptual and conceptual levels. Perceptual GIST window around keypoint) around the keypoint location are
refers to the structural representation of the calligraphic computed and weighted by a Gaussian window. We use 4 × 4
character built during perception. Conceptual GIST includes descriptors computed from a 16 × 16 sample window, each
the semantic information that is inferred while viewing a descriptor is accumulated in 8 orientations by summarizing
calligraphic character or shortly after the character has disap- the contents over 4 × 4 subregions. So for each keypoint of
peared from view. Conceptual GIST is enriched and modified the character image, the dimension of the keypoint descrip-
as the perceptual information bubbles up from early stages tor is 4 × 4 × 8 = 128. Figure 3 gives an example of SIFT
of visual processing. The calligraphic style information may keypoints descriptor. Among all these keypoints descriptors,
be captured by GIST descriptor. style information may be captured by the local features.

123
62 G. Pengcheng et al.

ability, statistical methods based on the Bayes decision rule


are widely used in pattern recognition problems. The mod-
ified quadratic discriminant function (MQDF) proposed by
Kimura [33] has been proved to be a state-of-the-art classi-
fier for handwriting characters recognition. Inspired by this,
Fig. 4 Visualization of mid-layer feature, from left to right original we apply the MQDF on the recognition of calligraphic styles
character image; first convolutional layer output (9 channels of 96); for the first time.
fifth convolutional layer output (9 channels of 256); fifth layer output
after ReLU (9 channels of 256) According to Bayes decision theory, by assuming the
probability density function of each class to be multivari-
ate Gaussian and equal a priori probabilities, the quadratic
3.3 Deep feature discriminant function (QDF) can be obtained as:

As the typical perceptual learning, it has long been argued FQDF (x, i) = (x − μi ) Σi−1 (x − μi ) + log |Σi | (1)
that deep or layered compositional architectures should be
able to capture salient aspects of given domain [26,27]. where μi ∈ d and Σi ∈ d×d denote the mean vector
Until recent year, the use of deep convolutional neural net- and the covariance matrix of class i. The QDF can be used
work (CNN) for image classification [1] and object detection as a distance metric in the sense that the class of minimum
[31,32] has led to significant gains in accuracy. These deep distance is assigned to the input pattern. The MQDF makes a
CNN models usually contain several layers including con- modification to the QDF by K-L transform and smoothing the
volutional layers and fully connected layers. Layer by layer, minor eigenvalues to improve the computation efficiency and
feature transfers nonlinearly from the bottom level to the classification performance. By K-L transform, the covariance
upper sematic level. We wish to use these deep CNNs to matrix Σi can be diagonalized as
capture salient semantics which may discover effective cal-
ligraphic style representations. Σi = Φi Λi Φi (2)
For the deep feature extraction, we adopted the DeCAF
[27] framework and employed the deep convolutional neu- where Λi = diag[λi1 , . . . , λid ] with λi j , j = 1, . . . , d
ral network architecture proposed in [1]. This network is being the eigenvalues (ordered in non-ascending order) of
composed of five successive convolutional layers C1…C5 Σi , and Φi = [φi1 , . . . , φid ] with φi j , j = 1, . . . , d being
followed by three fully connected layers FC6 …FC8. The the corresponding eigenvectors. Φi is orthonormal such that
three fully connected layers are computed by Y6 = σ (W6 Y5 + Φi Φi = I .
B6 ), Y7 = σ (W7 Y6 + B7 ) and Y8 = ψ(W8 Y7 + B8 ) where Yk According to Eq. 1, the QDF can be rewritten in the form
denotes the output of the k-th layer, Wk , Bk are the trainable of eigenvectors and eigenvalues:
parameters of the k-th layer, and σ (X )[i] = max(0, X [i])
 
d 
d
and ψ(X )[i] = e X [i] / j e X [ j] are the “ReLU” and “Soft- FQDF (x, i) =
1 
[φi j (x − μi )]2 + log λi j (3)
Max” nonlinear activation functions. There are more details λi j
j=1 j=1
of the architecture of CNN in [1]. We reuse layers trained
on the ImageNet dataset as the pre-trained parameters to By replacing the minor eigenvalues λi j ( j > k) with a
compute mid-level image representation. As the experiments constant δi to stabilize the generalization performance, the
shown in [27], the deep feature generated at layer FC6 per- MQDF can be obtained as
forms better than other layers’ feature, and here we choose
the layer FC6 feature as our deep feature, which has 4096 
k
1 1
dimensions. The dimensions here is much higher than the FMQDF (x, i) = ( − )[φij (x − μi )]2
λi j δi
j=1
GIST and SIFT descriptor. In our experiments, we reduce
the deep feature to 216 dimensions same as GIST descrip- 1
+ x − μi 22
tor using principal component analysis (PCA). Figure 4 gives δi
some visualization of mid-layer feature.  k
+ log λi j + (d − k) log δi (4)
j=1

4 Style recognition and discrimination evaluation where k denotes the number of principal axes. The decision
rule of MQDF for the input pattern x is
After extracting the features of character images, the next
step of style recognition is classification. By assigning the M
input sample to the category with maximum posterior prob- x ∈ class arg min FMQDF (x, i) (5)
i=1

123
Chinese calligraphic style representation for recognition 63

In our implementation of MQDF, we use a unified minor SCL are uniformed while those in CCD are unconstrained,
eigenvalue δi = δ0 for all classes and optimize this parameter written by real people. Both datasets can be downloaded from
by cross-validation on the training data. Inspired by [34], we CADAL calligraphy system2
β  M d
set δ0 as Md i=1 j=1 λ j with β selected from [0,1]. In our experiments, following questions were addressed:
It is also important to find out how the different styles
related to each other. Algorithm 1 shows the method of style • Which feature could capture the calligraphic styles infor-
discrimination evaluation (SDE), which is developed to eval- mation better? The hand-designed features or the deep
uate the discriminative styles based on the MQDF classifier. feature?
Here we discard the wrong test cases. The distance of differ- • How the calligraphic styles are related to each other? Are
ent styles is defined by penalty of the MQDF rank results. there any clear differences between different styles?

Algorithm 1 Style Discrimination Evaluation (SDE) We first visualized all three kinds feature to find their
Input: trained MQDF classifiers on different styles, MQDFseal for the sematic cluster and then evaluated the performance of these
seal scri pt, MQDFclerical for the clerical scri pt, MQDFstandar d different features on SCL and CCD, respectively. We also
for the standar d scri pt, MQDFsemi for the semi −cur sive scri pt evaluated the performance of different classifiers in task of
and MQDFcur sive for the cur sive scri pt; labeled data xi (i =
1, . . . , n) with attribute st yle; style recognition using deep feature. At last, the relations of
1: initiate S = {seal, clerical, standar d, semi, cur sive}; different styles are also evaluated.
2: initiate dist (st yle1 , st yle2 ) = 0, (st yle1 , st yle2 ∈ S);
3: initiate count (st yle1 , st yle2 ) = 0, (st yle1 , st yle2 ∈ S);
4: for i from 1 to n do 5.2 Feature visualization and performance evaluation
5: for j in S do
6: if xi correctly classified then For all the characters in SCL and CCD, SIFT descriptor, GIST
7: dist (xi .st yle, j)+ = Penalt y(FM Q D F j (xi , j));
8: count (xi .st yle, j)+ = 1;
descriptor and deep feature were extracted. We evaluated the
9: end if three kinds of feature performance both on the CCD and the
10: end for SCL character library. t-SNE [35] was employed to visual-
11: end for ize all these three kinds of feature. By projecting these high
12: for st yle1 , st yle2 in S do
dist (st yle1 ,st yle2 )+dist (st yle2 ,st yle1 ) dimensional data to 2 dimensions, we can clearly discover
13: Distance(st yle1 , st yle2 ) = count (st yle1 ,st yle2 )+count (st yle2 ,st yle1 ) ;
14: end for
some patterns. Figure 6 gives the visualization of the SIFT
Output: Distance(st yle1 , st yle2 ), (st yle1 , st yle2 ∈ S). descriptors. There are barely clear differences between dif-
ferent calligraphic styles, no matter on CCD or SCL. There is
a better situation for GIST descriptor, as shown in Fig. 6, but
still different styles can be easily confused. Compared with
SIFT and GIST descriptors, deep feature captures much more
5 Experiments and analysis significant sematic patterns in calligraphic styles, as shown
in Fig. 6. By visualizing all these three kinds of feature, we
5.1 Data preparation can find that the deep feature is much more suitable for cal-
ligraphic style recognition.
In CADAL, numerous collections of historical Chinese cal- We also employed a classifier to evaluate the performance
ligraphic works are digitized and stored. A big database of the three kinds of feature. The MQDF classifier was
Calligraphic Character Dictionary (CCD) is built, which employed to perform style recognition. For the SCL char-
contains calligraphic character images (more than 110,000 acter dataset, we chose 3400 examples of each calligraphic
character images, 8035 seal script, 12,193 clerical script, style to train the MQDF classifier, and another 450 exam-
49,481 standard script, 30,915 semi-cursive scrip and 10,113 ples of each style to test. For the real-world data, we chose
cursive script.) labeled with semantic meaning by calli- 4000 examples of each calligraphic style from CCD for train-
graphic experts. We evaluated the feature performance both ing and another 1000 examples of each style to test. All the
on the unconstrained Chinese calligraphic character dataset test and training examples are randomly chosen. Recognition
CCD and the Standard Character Library (SCL, contains performance of different features on SCL and CCD is shown
more than 18,770 character images, more than 3800 character in Tables 1 and 2, respectively. It is important to note that
images for each style), which contains five different styles of for the style recognition using SIFT feature, we predict the
calligraphic characters, named as seal script, clerical script, label of every point and its corresponding feature. Then for
standard script, semi-cursive script and cursive script. Fig-
ure 5 is the example from CCD and SCL. The main difference 2CADAL calligraphy system - http://www.cadal.zju.edu.cn/
between SCL and CCD is that the calligraphic characters in NewCalligraphy.

123
64 G. Pengcheng et al.

Fig. 5 Calligraphic character examples from (a) CCD (b) SCL. From top line to bottom line cursive, clerical, seal, semi-cursive and standard

the character style prediction, we vote the style of key points codes to represent 5 different calligraphic styles). Neural
with maximum number as the final character style. networks with different number of hidden units were imple-
As we can see in Table 1, deep feature performs best on mented, and the one with best performance (with 10 hidden
SCL. For the semi-cursive and clerical script. the recogni- units) was selected to perform calligraphic style recogni-
tion rate is 100%. The recognition rate of the other three tion. As we can see in Tables 3 and 4, the three classifiers
style is almost 100%. The local feature, SIFT descriptor, is achieved quite close performance using the deep feature on
not stable for the style recognition task. For the semi-cursive SCL. MQDF performs slightly better than the other two
and clerical scripts, the performance of SIFT descriptor is classifiers on average recognition rate. This shows that the
much lower than the deep feature. The global feature, GIST deep feature captured the salient sematic style information
descriptor also achieved promising performance but still got of SCL quite well. The recognition accuracy dropped on
slightly lower recognition rate than deep feature. Table 2 CCD. This may be caused by the variety of writing style
gives the performance of different features on CCD. The in CCD.
performance of deep feature is slightly dropped compar- In our experiments, we also fine-tuned alex-net [1] to
ing to SCL but still performs best. This is mainly because perform calligraphic style recognition on SCL and CCD,
of the complication of unconstraint real-world calligraphic respectively. Alex-net is a fully supervised trained deep CNN
characters. and won the image classification challenge on ImageNet in
We also evaluated the performance of different classi- 2012. In our fine-tuned alex-net, we keep the architecture
fiers on the deep feature for calligraphic style recognition. same as the original network except the last output layer.
We employed the LibSVM [36] using radial basis function The last layer of the original network which has 1000 output
(RBF) kernel to perform calligraphic style recognition. We units was replaced by 5 softmax output units corresponding
selected the optimal value of γ in RBF kernel and penalty to five different calligraphic styles. Parameters of the top 7
parameter C after three times of fivefold cross-validation layers were initialized with the original ones and the layer
on training data. In our experiments, we set γ = 0.07 in and the last softmax layer’s parameters were randomly cho-
RBF kernel and penalty parameter C = 1. We also imple- sen. During the fine-tuning process, we set the learning rate
mented neural networks basing on the MATLAB toolkit of the top 7 layers smaller than the last layer to make the
with one hidden layer to perform recognition. The net- old layers change very slowly with new data, but let the new
works were constructed by three layers: the input layer (216 last layer learn fast. We fine-tuned the alex-net on SCL more
units), the hidden layer (different number of hidden units than 20000 iterations and on CCD more than 50000 itera-
were tested including 5, 10, 15, 20, 25, 30, 35, 40 and 60 tions. It took about 20 seconds every 20 iterations during
units) and the output layer (3 output units, using binary fine-tuning using GPU (GPU Configuration: GeForce GTX

123
Chinese calligraphic style representation for recognition 65

150 150
cursive cursive
semi−cursive semi−cursive
seal seal
100 clerical 100 clerical
standard standard

50 50

0 0

−50 −50

−100 −100

−150 −150
−100 −50 0 50 100 150 −100 −50 0 50 100 150
(a)

150 150
cursive cursive
semi−cursive semi−cursive
seal seal
100 clerical 100 clerical
standard standard

50 50

0 0

−50 −50

−100 −100

−150 −150
−150 −100 −50 0 50 100 150 −150 −100 −50 0 50 100 150
(b)

150 150
cursive cursive
semi−cursive semi−cursive
seal seal
100 clerical 100 clerical
standard standard

50 50

0 0

−50 −50

−100 −100

−150 −150
−150 −100 −50 0 50 100 150 −100 −50 0 50 100 150
(c)

Fig. 6 Feature visualization on CCD (left) and SCL (right) a Feature visualization of SIFT descriptor. b Feature visualization of GIST descriptor.
c Feature visualization of deep feature

123
66 G. Pengcheng et al.

Table 1 Performance of different features on SCL using MQDF Table 4 Performance of different classifiers using deep feature on CCD
Recognition rate(%)
Recognition rate(%)
SIFT GIST Deep feature
SVM Neural network MQDF
Standard script 98.3 97.0 99.6
Standard script 99.4 99.1 99.7
Semi-cursive script 69.2 96.5 100
Semi-cursive script 92.7 86.6 97.2
Seal script 97.5 97.3 99.7
Seal script 83.6 83.4 89.7
Clerical script 61.6 95.8 100
Clerical script 98.7 94.1 97.4
Cursive script 95.4 94.3 99.6
Cursive script 89.3 85.8 87.1
Average 84.40 96.18 99.78
Average 92.74 89.80 94.22
Bold numbers are shown to emphasize the best recognition accuracy
compared with other methods Bold numbers are shown to emphasize the best recognition accuracy
compared with other methods

Table 5 Recognition performance of MQDF with deep feature (DF)


Table 2 Performance of different features on CCD using MQDF
and fine-tuned alex-net on SCL
Recognition rate(%) Recognition rate(%)
SIFT GIST Deep feature DF+MQDF Fine-tuned alex-net
Standard script 91.8 94.2 99.7 Standard script 99.6 99.9
Semi-cursive script 26.7 83.1 97.2 Semi-cursive script 100 99.5
Seal script 79.1 89.3 89.7 Seal script 99.7 99.8
Clerical script 85.8 91.9 97.4 Clerical script 100 99.8
Cursive script 77.9 84.2 87.1 Cursive script 99.6 100
Average 72.26 88.54 94.22 Average 99.78 99.80
Bold numbers are shown to emphasize the best recognition accuracy Bold numbers are shown to emphasize the best recognition accuracy
compared with other methods compared with other methods

Table 6 Recognition performance of MQDF with deep feature (DF)


Table 3 Performance of different classifiers using deep feature on SCL and fine-tuned alex-net on CCD
Recognition rate(%)
Recognition rate(%) DF+MQDF Fine-tuned alex-net
SVM Neural network MQDF
Standard script 99.7 94.2
Standard script 99.3 97.3 99.6 Semi-cursive script 97.2 92.2
Semi-cursive script 96.6 97.1 100 Seal script 89.7 98.0
Seal script 100 99.3 99.7 Clerical script 97.4 97.0
Clerical script 99.3 99.2 100 Cursive script 87.1 93.6
Cursive script 95.6 99.1 99.6 Average 94.22 95.00
Average 98.16 98.40 99.78
Bold numbers are shown to emphasize the best recognition accuracy
Bold numbers are shown to emphasize the best recognition accuracy compared with other methods
compared with other methods

5.3 Styles discrimination evaluation

TITAN). Tables 5 and 6 give the performance the fine-tuned After addressing the question of which kind of feature and
alex-net on task of calligraphic style recognition. We can classifier are most suitable for calligraphic character style
see that our method achieved very close performance com- representation and recognition, another question is popping
paring with the fine-tuned alex-net on SCL. And on CCD, out: how the different calligraphic style related to each other.
fine-tuned alex-net achieved slightly better performance than SDE (Algorithm 1) was employed to calculate the average
our method, too. It is deserved to be noticed that our method distance between different styles. Table 7 gives the average
was quite flexible and much easier to implement while the distance of different styles on SCL and CCD, respectively.
alex-net cost dozens of hours to deploy and fine-tune on the All these distances are normalized by the penalty of rank
new data. result. Two different styles with smaller distance are much

123
Chinese calligraphic style representation for recognition 67

Table 7 Distance of different


Standard Semi-cursive Seal Clerical Cursive
style on SCL (top-right triangle)
and CCD (bottom-left triangle) Standard – 2.27 2.35 3.66 1.9
respectively, smaller number
means more similar between Semi-cursive 3.3 – 3.27 2.36 1.04
two styles Seal 2.31 2.6 – 2.26 2.7
Clerical 3.42 1.4 2.5 – 3.22
Cursive 3.76 1.36 1.52 2.73 –
Bold number shows the smallest distance on different style, which means that the two styles are more similar
than other styles

150 150
cursive cursive
semi−cursive semi−cursive

100 100

50 50

0 0

−50 −50

−100 −100

−150 −150
−150 −100 −50 0 50 100 150 −100 −50 0 50 100 150

Fig. 7 Overlapping of cursive and semi-cursive on CCD (left) and SCL (right)

more similar to the pair with larger distance. As we can see 6 Conclusion
in Table 7, the top-right triangle area is the distance on SCL
and the bottom-left area is the distance on CCD. We can see In this paper, an automatic calligraphic style representa-
from the table that the cursive and semi-cursive script are tion for recognition method was proposed. Three kinds of
most similar both on SCL and CCD. This could be easily features, the typical global feature GIST descriptor, the typ-
justified in Fig. 7. In Fig. 7, we only visualized the cursive ical local feature SIFT descriptor and the deep feature,
script and semi-cursive script using deep feature. We can were extracted. Three kinds of classifiers: MQDF, SVM
see that no matter on CCD or SCL, there is a big overlap and NN are employed to perform recognition. We evalu-
between the two scripts. ated our method on two calligraphic character datasets, CCD
(the unconstraint real-world calligraphic character dataset)
5.4 Analysis and discussion and SCL(the standard calligraphic character library). Our
experiments show that the deep feature outperforms the
From the experiments above we can see that no matter on other two kinds of features on calligraphic style recogni-
CCD or SCL, the deep feature captures more style informa- tion task by employing MQDF, achieving 99.78 % average
tion than SIFT descriptor and GIST descriptor. This can be recognition accuracy on SCL and slightly lower recogni-
directly concluded from the visualization of the three kinds tion accuracy(average 94.22 %) on CCD. We also fine-tune
of feature. But still the performance on CCD is a little lower alex-net to perform calligraphic style recognition. It turns
than the performance on SCL. This is mainly because the out our method achieves about equal accuracy comparing
characters of the same style in SCL are uniformed. But for with the fine-tuned alex-net but with much less training time.
CCD, real-world characters are usually unconstrained. The Furthermore, algorithm SDE was employed to evaluate the
calligraphic character of the same style in CCD may be writ- relations between different styles, finding that the cursive
ten by different people. So there are still slight differences and semi-cursive scripts are most similar both on SCL and
for the characters of the same style in CCD. CCD.

123
68 G. Pengcheng et al.

Acknowledgements This work is supported by National Natural Sci- 18. Weiming, Lu, Zhuang, Yueting, Jiangqin, Wu: Discovering callig-
ence Foundation of China (No.61379073) and the CADAL Project and raphy style relationships by supervised learning weighted random
Research Center, Zhejiang University. walk model. Multimedia Syst. 15(4), 221–242 (2009)
19. Oliva, Aude, Torralba, Antonio: Modeling the shape of the scene: a
holistic representation of the spatial envelope. Int. J. Comput. Vis.
42(3), 145–175 (2001)
References 20. David, G.: Distinctive image features from scale-invariant key-
points. Int. J. Comput. Vis. 60(2), 91–110 (2004)
1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification 21. Ke, Y., Sukthankar, R.: Pca-sift: a more distinctive representation
with deep convolutional neural networks. In: Advances in Neural for local image descriptors. In: Proceedings of the 2004 IEEE
Information Processing Systems, pp. 1097–1105 (2012) Computer Society Conference on Computer Vision and Pattern
2. Wang, S., Lee, H.: Dual-binarization and anisotropic diffusion of Recognition, 2004. CVPR 2004, vol. 2, pp. II–506. IEEE (2004)
chinese characters in calligraphy documents. In: Proceedings. Sixth 22. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust fea-
International Conference on Document Analysis and Recognition, tures. In: Computer Vision–ECCV 2006, pp. 404–417. Springer
IEEE, pp. 271–275 (2001) (2006)
3. Zhuang, Y., Zhang, X., Wu, J., Lu, X.: Retrieval of chinese calli- 23. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient
graphic character image. In: Advances in Multimedia Information alternative to sift or surf. In: 2011 IEEE International Conference
Processing-PCM 2004, pp. 17–24. Springer (2005) on Computer Vision (ICCV), pp. 2564–2571. IEEE (2011)
4. Pengcheng, G., Jiangqin, W., Yuan, L., Yang, X., Tianjiao, M., Bao- 24. Rosten, E., Drummond, T.: Machine learning for high-speed cor-
gang, W.: Fast image-based chinese calligraphic character retrieval ner detection. In: Computer Vision–ECCV 2006, pp. 430–443.
on large scale data. In: 2014 IEEE/ACM Joint Conference on Dig- Springer (2006)
ital Libraries (JCDL), IEEE, pp. 211–220 (2014) 25. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: binary robust
5. Yu, K., Wu, J., Zhuang, Y.: Skeleton-based recognition of chinese independent elementary features. In: Computer Vision–ECCV
calligraphic character image. In: Advances in Multimedia Infor- 2010, pp. 778–792. Springer (2010)
mation Processing-PCM 2008, pp. 228–237. Springer (2008) 26. Hinton, Geoffrey E., Salakhutdinov, Ruslan R.: Reducing the
6. Lin, Y., Wu, J., Gao, P., Xia, Y., Mao, T.: Lsh-based large scale dimensionality of data with neural networks. Science 313(5786),
chinese calligraphic character recognition. In: Proceedings of the 504–507 (2006)
13th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 27. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng,
323–330. ACM (2013) E., Darrell, T.: Decaf: a deep convolutional activation feature for
7. Pengcheng, G., Jiangqin, W., Yuan, L., Yang, X., Tianjiao, M.: Fast generic visual recognition. In: Proceedings of The 31st Interna-
chinese calligraphic character recognition with large-scale data. In: tional Conference on Machine Learning, pp. 647–655 (2014)
Multimedia Tools and Applications, pp. 1–18 (2014) 28. Oquab, M., Bottou, L., Laptev, I., et al.: Learning and transferring
8. Wu, Y., Zhuang, Y., Pan, Y., Wu, J.: Web based chinese calligraphy mid-level image representations using convolutional neural net-
learning with 3-d visualization method. In: 2006 IEEE International works. In: Proceedings of the IEEE conference on computer vision
Conference on Multimedia and Expo, IEEE, pp. 2073–2076 (2006) and pattern recognition, pp. 1717–1724 (2014)
9. Zhang, Z., Wu, J., Yu, K.: Chinese calligraphy specific style render- 29. Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: Advances
ing system. In: Proceedings of the 10th Annual Joint Conference in Neural Information Processing Systems, pp. 1753–1760 (2008)
on Digital Libraries, pp. 99–108. ACM (2010) 30. Erin Liong, V., Lu, J., Wang, G., Moulin, P., Zhou, J.: Deep hash-
10. Xia, Y., Wu, J., Gao, P., Lin, Y., Mao, T.: Ontology-based model ing for compact binary codes learning. In: Proceedings of the
for chinese calligraphy synthesis. In: Computer Graphics Forum, IEEE Conference on Computer Vision and Pattern Recognition,
vol. 32, pp. 11–20. The Eurographics Association and Blackwell pp. 2475–2483 (2015)
Publishing Ltd. (2013) 31. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hier-
11. Songhua, Xu, Lau, Francis, Cheung, William K., Pan, Yunhe: Auto- archies for accurate object detection and semantic segmentation.
matic generation of artistic chinese calligraphy. IEEE Intell. Syst. In: 2014 IEEE Conference on Computer Vision and Pattern Recog-
20(3), 32–39 (2005) nition (CVPR), pp. 580–587. IEEE (2014)
12. Li, H., Liu, P., Xu, S., Lin, S.: Calligraphy beautification method 32. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R.,
for chinese handwritings. In: 2012 Fourth International Conference LeCun, Y.: Overfeat: integrated recognition, localization and detec-
on Digital Home (ICDH), pp. 122–127. IEEE (2012) tion using convolutional networks. arXiv preprint arXiv:1312.6229
13. Zhu, Yong, Tan, Tieniu, Wang, Yunhong: Font recognition based (2013)
on global texture analysis. IEEE Trans. Pattern Anal. Mach. Intell. 33. Kimura, F., Takashina, K., Tsuruoka, S., Miyake, Y.: Modified
23(10), 1192–1200 (2001) quadratic discriminant functions and the application to chinese
14. Ding, Xiaoqing, Chen, Li, Tao, Wu: Character independent font character recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1,
recognition on a single chinese character. IEEE Trans. Pattern Anal. 149–153 (1987)
Mach. Intell. 29(2), 195–204 (2007) 34. Liu, C.L., Yin, F., Wang, D.H., et al.: Online and offline handwritten
15. Tao, Dapeng, Jin, Lianwen, Zhang, Shuye, Yang, Zhao, Wang, Chinese character recognition: benchmarking on databases. Pattern
Yongfei: Sparse discriminative information preservation for chi- Recogn. 46(1), 155–162 (2013)
nese character font categorization. Neurocomputing 129, 159–167 35. Van der Maaten, Laurens, Hinton, Geoffrey: Visualizing data using
(2014) t-sne. J. Mach. Learn. Res. 9(2579–2605), 85 (2008)
16. Tao, D., Lin, X., Jin, L., et al.: Principal component 2-D long short- 36. Chang, Chih-Chung, Lin, Chih-Jen: Libsvm: a library for support
term memory for font recognition on single Chinese characters. vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3),
IEEE Trans. Cybern. 46(3), 756–765 (2016) 27 (2011)
17. Zhuang, Yueting, Lu, Weiming, Wu, Jiangqin: Latent style model:
discovering writing styles for calligraphy works. J. Vis. Commun.
Image Represent. 20(2), 84–96 (2009)

123

You might also like