You are on page 1of 11

Vis Comput

DOI 10.1007/s00371-016-1265-5

ORIGINAL ARTICLE

3D cartoon face generation by local deformation mapping


Jingyong Zhou1 · Xin Tong2 · Zicheng Liu3 · Baining Guo1,2

© Springer-Verlag Berlin Heidelberg 2016

Abstract We present a data-driven method for automati- by applying it to faces of different facial features. Results
cally generating a 3D cartoon of a real 3D face. Given a demonstrate that our method not only preserves the artistic
sparse set of 3D real faces and their corresponding cartoon style of the exemplars, but also keeps the unique facial geo-
faces modeled by an artist, our method models the face in metric features of different identities.
each subspace as the deformation of its nearby exemplars
and learn a mapping between the deformations defined by Keywords 3D cartoon generation · Local deformation
the real faces and their cartoon counterparts. To reduce the mapping · Data-driven method
exemplars needed for learning, we regress a collection of
linear mappings defined locally in both face geometry and
identity spaces and develop a progressive scheme for users 1 Introduction
to gradually add new exemplars for training. At runtime, our
method first finds the nearby exemplars of an input real face Three-dimensional cartoon or caricature face models are use-
and then constructs the result cartoon face from the corre- ful for many graphics and multimedia applications, including
sponding cartoon faces of the nearby real face exemplars and immersive communication, entertainment, games, virtual
the local deformations mapped from the real face subspace. and augmented reality. A carefully designed 3D cartoon face
Our method greatly simplifies the cartoon generation process can achieve a good balance between detailed facial features
by learning artistic styles from a sparse set of exemplars. and artistic style and thus efficiently avoid the ”uncanny val-
We validate the efficiency and effectiveness of our method ley” effects. However, manually creating a personalized 3D
cartoon face for a specific person is a tedious task even for a
skilled artist. Therefore, a set of methods has been proposed
Electronic supplementary material The online version of this for 3D cartoon generation.
article (doi:10.1007/s00371-016-1265-5) contains supplementary Rule-based methods generate the face caricature by exag-
material, which is available to authorized users.
gerating the face features deviated from the mean of all
B Jingyong Zhou human faces. Intensive user interactions are needed to con-
zh-y10@mails.tsinghua.edu.cn; mikezjy@gmail.com trol the face deformation for creating cartoon results in a
Xin Tong consistent style. It is also difficult to generalize this method
xtong@microsoft.com to different artistic styles. Data-driven methods formulate the
Zicheng Liu cartoon generation process as a mapping from the subspace
zliu@microsoft.com of real faces to the subspace of 3D cartoon faces in a spe-
Baining Guo cific artistic style and then learn the mapping from a set of
bainguo@microsoft.com real faces and their corresponding cartoon models created by
1
an artist. Although these methods are automatic and can be
Tsinghua University, Beijing, China
easily adapted to different artistic styles, a large number of
2 Microsoft Research, Beijing, China cartoon exemplars are always needed for learning, which is
3 Microsoft Research, Redmond, USA a still labor-intensive task for artists.

123
J. Zhou et al.

In this paper, we present a data-driven method for auto- 2 Related work


matically generating a 3D cartoon of a real face from a sparse
set of exemplars. Given a sparse set of 3D real faces and their In this section, we only discuss previous face caricature solu-
corresponding 3D cartoon models designed by an artist, we tions that are mostly related to our work. Please refer to [19]
represent the face in each subspace with the deformations for a comprehensive survey of caricature generation methods
from its several nearby exemplars to itself. We also assume and [4] on recent progress on photorealistic 3D face model-
that the structure of the two subspaces spanned by the real ing.
faces and corresponding cartoon results are similar. Specif- Rule-based methods follow the “Exaggerating the Difference
ically, for each real face, the corresponding cartoon models From the Mean” (EDFM) rule and exaggerate the face fea-
of its nearby exemplars are also the nearby exemplars of its tures that are different from the mean of all real faces for
cartoon result. We thus learn a mapping that transforms the cartoon generation. Akleman and his colleague [1,2] pre-
deformations between a real face and its nearby exemplars to sented interactive methods for users to manually pick and
the deformations between the corresponding cartoon face and exaggerate facial features for 2D and 3D caricature gener-
its nearby exemplars in the cartoon subspace. To efficiently ation. Kondo et al.[12] automatically detected face features
learn the artistic style from a sparse set of cartoon exem- from 3D face images and exaggerated the difference between
plars, we subdivide the face geometry into several regions and the detected features and the mean face for 3D caricatures
regress a set of linear mappings for each region, where each creation. Fujiwara et al. [9,10] presented a similar PICASSO
mapping is defined locally for the deformations around an system for 3D caricature generation. Chen et al. [7] exagger-
exemplar. We further develop a progressive training scheme ated images of a real face captured from different views and
for an artist to gradually add faces to the training dataset and fused them to generate 3D caricature. Liu et al. [16] deformed
update the mapping until the result is good enough. At run- the face feature points and face contour according to EDFM
time, we first find the nearby exemplars of an input face and rule and then reconstructed the 3D caricature model with
then map its local deformations to the cartoon subspace. After manifold RBFs.
that, we construct the 3D cartoon result from the cartoon Mo et al. [18] extended the EDFM rule by taking the vari-
counterparts of nearby real face exemplars and the mapped ance of facial features into consideration. Wang et al. [23]
deformations. Our method is designed for generating person- modeled the neutral face features and expressional deforma-
alized cartoon faces for different identities so that we assume tion of all real faces as low-dimensional manifolds and then
the input faces are in neutral expression. Because the carica- exaggerated the face feature and expressional deformation
ture of a real face is mainly determined by facial geometry, of an input face based on their probabilities on the mani-
our method focuses on modeling the cartoon face geometry folds. Mingming et al. [17] exaggerated face features at both
and assigns a pre-defined albedo map to the result cartoon global and local levels, where the input face is first classified
model. and deformed with pre-defined templates and then refined by
Compared to previous data-driven methods that need a exaggerating facial features that are different from the mean.
large number of training data, our approach learns the local Most recently, Sucontphunt [20,21] modeled 3D artistic face
deformation mappings from a sparse set of exemplars and by blending a real face with an artistic model in a PCA space
thus efficiently reduces the artists’ workload. It can be eas- spanned by both real face and artistic model.
ily generalized to different artistic styles with the mapping Although these rule-based methods can directly generate
learned from the cartoon exemplars modeled by different caricature from 2D image or 3D model of a real face without
artists. We validate our method and demonstrate its effective- any exemplars, user interactions are always needed to control
ness and efficiency with real faces of different facial features the exaggerations of different facial features. Moreover, it
and the mappings learned from two datasets. Results demon- is not clear how to generalize these methods for creating
strate that our method not only preserves the personalized cartoons or caricatures in different artistic styles.
facial features of each input, but also maintains consistent Data-driven methods learn a mapping between the real face
artistic style for variant subjects. subspace and cartoon subspace and then apply it to the input
In summary, the technical contributions of our paper real face to generate the result caricature. Chen et al. [6] and
are: Liang et al. in [14] generated 2D cartoons from face images
with a mapping learned from a set of training images and
• An automatic method for generating 3D cartoon model of their associated 2D sketches drawn by an artist.
a real 3D face. For 3D cartoon generation, Xie et al. [24] developed an
• A local deformation mapping for cartoon generation that interactive system, where a rough 3D cartoon is designed by
can be learned from a sparse set of exemplars. sampling a PCA space learned from 100 3D cartoon models
• A progressive scheme for learning the local deformation and then refined by sampling a low-dimensional manifold
mapping from gradually expanded training dataset. learned from 1000 2D cartoon images. Li et al. [13] con-

123
3D cartoon face generation by local deformation mapping

Cartoon Creation … Local Deformation Local Deformation


by Artist Construction Mapping Regression
Face Exemplar Pairs
Local Deformation
… Mappings
Reall F
R Faces Offline Training
Runme Generaon

Local Deformation Mapping Cartoon Construction

Real F
Face IInput Cartoon Result

Fig. 1 System overview. Our system consists of two phases: the off-line training stage and runtime cartoon generation stage

structed two local linear embeddings (LLE) for modeling 3 System overview
the real face and cartoon subspaces from 110 exemplars and
regressed a neural network for the mapping between two sub- We define the real face subspace and cartoon face subspace
spaces from the same exemplars. Liu et al. [15] also applied as the two subspaces spanned by all possible real faces and all
LLE for modeling two subspaces and modeled the mapping possible cartoon faces in a specific artistic style. To compute a
with a regressive manifold regularization that is learned from 3D cartoon model of an input 3D face, our method computes
100 real and cartoon face pairs and hundreds of unmatched a mapping between these two subspaces.
real face images and 3D cartoon models. All these data-driven As shown in Fig. 1, our system consists of two stages:
methods require hundreds of real faces and their correspond- an off-line training stage and a runtime cartoon generation
ing cartoon models for building the mapping from real face stage. Given a small set of 3D real face exemplars Ai and
subspace to cartoon face subspace. Our method learns the their corresponding cartoon faces Bi modeled by an artist,
caricature model from a sparse set of exemplars and thus our method learns the mapping between real face subspace
greatly simplifies the cartoon generation process. and cartoon subspace in the off-line training stage. To this
Clarke et al. [8] encoded artistic styles of 3D caricatures end, we first register all real face and cartoon face exemplars
with a pseudo stress–strain model and learned the ”virtual” and then subdivide the face into subregions. For each region,
physical and material properties of the model from one pair of we regard each pair of real exemplar region and its corre-
2D face images and corresponding caricature images. How- sponding cartoon exemplar region as an input/output pair of
ever, this method is difficult to be extended for modeling our system and compute their deformations to the neighbor
other artistic styles. exemplars in local deformation construction step. After that,
we regress the mappings Mir for each exemplar region pair
Deformation transfer [22] directly transferred the deforma-
with the deformations of all exemplar region pairs. Our pro-
tion gradient between two source meshes to a target mesh.
gressive training scheme gradually brings new exemplar pairs
Semantic deformation transfer (SDT) [3] encoded a source
in the training dataset and repeats the training process until
mesh as a linear combination of LRI coordinates of exem-
the result mapping becomes stable and satisfies the artist’s
plar source meshes and then directly transferred the weights
need.
to the target LRI coordinates computed by the correspond-
In the runtime cartoon generation stage, we register an
ing exemplar target meshes. Although these methods can be
input real face with the exemplars and then subdivide it into
used for cartoon generation, directly transferring the defor-
subregions. For each input face region, we first compute its
mations in real face subspace to the cartoon subspace will
deformations to the neighbor exemplars and then map them
break the artistic effects of the cartoon faces. On the con-
to the cartoon face subspace to obtain the deformations of
trary, our method transfers the deformations in the real face
the result cartoon model. Finally, we construct the result 3D
subspace to the cartoon face subspace via a mapping learned
cartoon model from the cartoon neighbor exemplars and the
from exemplars and thus preserves the artistic effects of the
mapped deformations in the cartoon construction step.
3D cartoons. We compare our method with SDT for cartoon
generation and demonstrate the advantage of our method in
Sect. 7.

123
J. Zhou et al.

4 Off-line local deformation mapping training

Preprocessing Our method requires that the real face and the
cartoon face models are registered with each other and share
the same mesh topology. We thus register each exemplar face
mesh with a reference face and then resample the face mesh
with the reference mesh topology. To this end, we manually
specify a small number of feature points on the reference face
and their corresponding points on the exemplar face mesh.
We then use these landmark points as constraints to deform
Real face Subspace Cartoon face Subspace
the reference face to exemplar face mesh via non-rigid Iter- Local Deformation Mapping
ative Closest Points (ICP) technique described in [22]. After
non-rigid ICP, the deformed reference face is aligned to the Fig. 3 Local deformation mapping for each facial region
exemplar face shape. We thus resample the exemplar face
with the reference face topology by replacing each vertex Specifically, the deformation gradient si,r j (x) from a triangle
position in the reference face with the position of its closest x̃ in Arj to the corresponding triangle x in Ari is defined by a
point on the exemplar face. We repeat this process for all
3 × 3 affine transformation matrix V (x)Ṽ −1 (x̃), where
face exemplars so that they are registered with each other

and share the same mesh topology. V (x) = v2 − v1 v3 − v1 v4 − v1 ]
We then manually subdivide the face meshes into N R =  (1)
Ṽ (x̃) = v˜2 − v˜1 v˜3 − v˜1 v˜4 − v˜1 ].
12 regions, each of which represents relatively independent
facial shape variations. As shown in Fig. 2, the neighboring
Here (v1 , v2 , v3 ) are 3D positions of three vertices of the
regions are overlapped with each other to make sure that the
triangle x, and (v˜1 , v˜2 , v˜3 ) are 3D positions of the three cor-
reconstructed face mesh is still smooth after each region is
responding vertex of the triangle x̃. v4 is computed by
deformed separately. Since all face meshes share the same 
topology, we directly apply this subdivision scheme to all v4 = v1 +(v2 −v1 )×(v3 −v1 )/ |(v2 − v1 ) × (v3 − v1 )| (2)
real and cartoon face meshes. Our method computes the local
deformation mapping for each face region r separately. and v˜4 is computed from (v˜1 , v˜2 , v˜3 ) in the similar way. In
Local deformation construction In this step, we construct our work, we represent the deformation gradient matrix of
local deformations for each pair of real face exemplar region each triangle as a nine-dimensional vector.
Ari and its corresponding cartoon face exemplar region Bir For the corresponding cartoon face exemplar Bir , we
(Fig. 3). directly obtain its neighbor exemplars N (Bir ) = {B rj | j =
To this end, we first find the K nearest neighbor exem- 0, 1, 2, . . . , K − 1}, in which each B j is the corresponding
plars N (Ari ) = {Arj | j = 0, 1, 2, . . . , K − 1} for region Ari , cartoon exemplar of A j . After that, we model the deformation
where the distance between two face regions are computed ti,r j between Bir and a neighbor exemplar B rj by computing
by the sum of Euclidean distance of all corresponding mesh the deformation gradients ti,r j (x) of all triangles x (Fig. 3).
vertices in region R(r ). The deformation si,r j between Ari We repeat the local deformation construction process for
and a neighbor exemplar region Arj is then modeled with the all regions of each exemplar and obtain K deformations sir
deformation gradients of all triangles in the region as in [22]. in the real space and their corresponding deformations tir in
the cartoon space for each exemplar region pair (Ari , Bir ) in
the training dataset

sir = {si,r j (x)|x ∈ R(r ), j = 0, . . . , K − 1}


(3)
tir = {ti,r j (x)|x ∈ R(r ), j = 0, . . . , K − 1},

where si,r j (x) and ti,r j (x) are nine-dimensional vectors of the
deformation gradient for each triangle x in the region.
Local deformation mapping regression In this step, we train
a set of local deformation mappings Mir for transforming the
deformations around a exemplar region Ari to the deforma-
tions around the corresponding cartoon exemplar region Bir
Fig. 2 Face partitions on a real face (a) and a cartoon face (b) (Fig. 3).

123
3D cartoon face generation by local deformation mapping

For this purpose, we assume that all triangles in the region 5 Runtime cartoon generation
share the same linear mapping between their deformation
gradients. Specifically, each element of vector tir j (x) is a Preprocessing Given an input 3D real face Ac , we first regis-
linear combination of all the elements in sir j (x). With a nine- ter it with the reference face and then resample the input face
dimensional offset vector, Mir is a 9 × 10 matrix with 90 shape with the reference face mesh so that the input has the
unknowns. Thus, for all the triangles x in the region r , we same topology as the exemplars. After that, we subdivide the
have input face into several regions using the face partition mask
 r  defined in Fig. 2.
s (x)
ti,r j (x) = Mir i, j . (4) Local Deformation Mapping For each face region Arc , we
1
first project it into the real face subspace constructed by the
Based on this assumption, we compute the local deforma- same regions of all exemplars to find its nearest K neighbor
tion mappings Mir by optimizing exemplars Arj | j = 0, . . . , K − 1. The K neighbor exemplars
B rj | j = 0, . . . , K − 1 of cartoon result Bc are then defined by

n−1 the corresponding cartoon exemplars of real face exemplars
arg min E d (Mir ) + λt E t (Mir ) + λs E s (Mir ), (5) in Arj | j = 0, . . . , K − 1.
r
Mi
i=0 After that, we compute the deformation gradients between
Arc and each neighbor exemplars Arj . For each triangle x
where n is the number of exemplar pairs in the training in the region, we obtain K deformation gradients s r (x) =
dataset. The first term E d is the data term for measuring s rj (x)| j = 0, . . . , K − 1. Then we map the deformation gra-
the sum of the difference between tir and transformed sir for dients of each triangle x in the input face region Arc to the
all triangles and all n input exemplars deformation gradients of the same triangle in the cartoon face
region Bcr by

K −1   r 
si, j (x) 2  r 
E d (Mir ) = w ( j, i)
r
ti,r j (x) − Mir  , s j (x)
1 t rj (x) = M Ir nd( j) , (9)
j=0 x∈R(r ) 1
(6)
where M Ir nd( j) is the local deformation mapping for the
where wr (i, j) = 1/dist (Ari , Arj )2 is a weighted function neighbor exemplar Arj , where I nd( j) finds the index of the
that decreases as the distance of two face region Ari and Arj jth neighbor exemplar in the whole exemplar set.
increases. The second fidelity term E t is defined as We repeat the local deformation mapping for all face
regions to obtain the deformation gradients of all triangles
  2
 
r I 
of the result cartoon model Bc . Note that for a triangle in the
E t (Mir ) = 
 I − M i 1  , (7) overlapped regions, we need to map the deformation gradient
t rj (x) for each overlapped region separately.
which maps the identity deformation gradient of the real face Cartoon construction Given the deformation gradients t rj (x)
to the identity deformation gradient I in the cartoon space so of all triangles x of the result cartoon model Bc , we construct
that the training exemplar can be well approximated by the the result cartoon mesh from these deformation gradient con-
result local deformation mapping. The identity deformation straints by optimizing
 T
gradient vector I is defined as I = 1, 0, 0, 0, 1, 0, 0, 0, 1 .
The third smooth term E s is defined by 
K −1 
NR  −1
arg min wr ( j) V (x)V˜ j (x̃) − t rj (x)2 ,
v∈Bc
j=0 r =0 x∈R(r )

n−1
E s (Mir ) = w r
(k, i)Mir − Mkr 2 , (8) (10)
k=0,k=i
where v is the 3D positions of all the vertices in the result
which ensures the local deformation mappings of the similar cartoon model Bc , x̃ denotes the corresponding triangle of
−1
exemplars should be similar. λt and λs are weights of E t x on the neighbor exemplar B rj . V (x)V˜ j (x̃) is the vector
and E s , respectively. We set λt = 100 and λs = 1.0 in our form of the deformation gradient from x̃ to x, and t rj (x) is
implementation. the deformation gradient from x̃ to x that is computed by
Since Eq. (5) is a linear system, we solve it with a sparse the local deformation mapping. Here V (x) and Ṽ (x̃) are two
linear solver (CHOLMOD in our implementation) to obtain matrices computed by 3D vertex positions of triangle x and x̃,
the local deformation mappings. respectively, according to Eq. (1). wr ( j) = 1/dist (Ar , Arj )2

123
J. Zhou et al.

(normalized by mesh bounding box)


0.500%
is the weight function that decreases when the distance from

Average Vertex Displacement


0.450%

Ar to Arj increases. 0.400%


0.350%
Since this optimization can be formulated as a sparse linear 0.300%
0.250%
system, we solve all vertex positions of the result cartoon 0.200%
0.150%
model with CHOLMOD sparse linear solver. Also note that 0.100%

since this function is translation invariant, we set the mean 0.050%


0.000%

of the result cartoon model B to be the origin. 6 7 8 9 10 11


Training Dataset Size
12 13 14 15

Normal Style Exaggerate Style

6 Progressive training Fig. 4 The average vertex difference of the cartoon results of all 50
real faces for different number of exemplars

Our method learns the artistic style from a set of exemplars


and then applies it to generate cartoons for new input real
faces. For this purpose, the number of exemplars in the train-
ing dataset should be large enough so that the learned local
deformation mapping can create consistent cartoon results
for different identities. Meanwhile, we want to keep the train-
ing dataset as compact as possible to minimize the artist’s
manual work. To this end, we develop a progressive training
Fig. 5 The visualization tool for exploring cartoon results. The left
scheme that allows the artist to gradually add cartoon exem- window shows the 2D points mapped from 50 real faces, while the
plars to the training dataset until the result local deformation right one shows the real face and its cartoon result of the selected 2D
mapping is good enough. point
Given a dataset that includes 50 real faces with distinct
facial geometry features, we start our training process by
first selecting K + 1 real faces as the initial exemplars by the artist to quickly explore the cartoon results of 50 faces.
minimum-pairwise-distance maximization, where the dis- As shown in Fig. 5, after the user selects one facial region, we
tance of two faces is measured by the sum of Euclidean map 50 real faces as a set of 2D points and determine their 2D
distances of all the regions of two face meshes. We then layouts via a Force-directed graph drawing algorithm [11].
ask the artist to manually create the cartoon faces of these The distance between 2D points are proportional to the dis-
exemplars and construct the local deformation mappings tances of selected facial region of real faces. Therefore, the
from the initial face exemplars as described in Sect. 4. artist can easily compare the cartoon results of faces with
After that, we apply the result local deformation mapping similar facial features to check the style consistency of the
to all the 50 real faces and generate their cartoon mod- cartoon results.
els. We invite the artist to check the results and pick a
real face that mismatches the artist’s expectation most and
then refine the result cartoon model. After that, we add this 7 Experimental results
real/cartoon face pair into the training dataset and update
the local deformation mappings. Finally, we update the car- Algorithm performance We have implemented our method
toon results for 50 real faces with the new local deformation in C++ on a PC with an Intel i7 3.40GHz eight core CPU and
mappings and compute the difference between current car- 16GB memory and evaluated our solution with two sets of
toon results and the ones generated in the last step. We exemplars in different artistic styles. As shown in Fig. 6, the
repeat this process until the differences become stable and Style I set exaggerates the facial features moderately while
the artist is satisfied with the quality of the cartoon results the Style II set exaggerates the facial features more. For Style
generated by the current local deformation mappings. Fig- II set, the eyeballs, teeth and tongue of the cartoons designed
ure 4 plots the average vertex difference of the 50 cartoon by artist are used for rendering only but not for local defor-
faces for two exemplar sets in different artistic styles. As mation mapping. Table 1 lists the statistics of two training
the number of exemplars increases, the difference quickly datasets and the performance of our method. For both exem-
decreases and becomes stable, which means the result local plar sets, our method takes about three to four seconds for
deformation mappings can well predict the cartoon models regressing local deformation mappings from the final exem-
generated by the artist and thus lead to less artist’s refine- plar set. At runtime, it takes less than one second to generate
ment. a 3D cartoon model from an input real 3D face.
To help the artist quickly identify the face that needs to be Comparisons We demonstrate the advantage of our method
refined at each iteration, we develop a visualization tool for by comparing it with Li et al.’s method [13] and Semantic

123
3D cartoon face generation by local deformation mapping

Fig. 7 Comparison between our method and Li et al.’s method [13]. a


Input real face. b Result generated by our method. c Result generated
by Li’s method. d, e An exemplar pair with similar eye shape with (a).
f The real face reconstructed by LLE

Fig. 6 Several exemplars in the two exemplar sets. a The real 3D faces.
b The corresponding 3D cartoon models in the Style I set. c The corre-
sponding 3D cartoon models in the Style II set

Table 1 Statistics of two exemplar sets and performance of our method

Exemplar set Style I Style II

Number of exemplars 13 15
Vertex number of face mesh 15,424 25,356
Training time 2.67 s 3.80 s
Cartoon generation time 0.3 s 0.5 s Fig. 8 Comparison between our method and SDT. a, d Input real faces.
b, e Results generated by our method. c, f Results generated by semantic
deformation transfer

Deformation Transfer (SDT) [3], where the Style II exemplar from sparse exemplars cannot faithfully reconstruct the faces
set is used for all three methods. in both real and cartoon face subspaces (Fig. 7f).
Figure 7b, c illustrates the results generated by our method Figure 8 compares the results generated by our method
and the ones generated by Li et al.’s method for an input (Fig. 8b, e) and the ones generated by SDT (Fig. 8c, f) for
real face (Fig. 7a). To implement Li’s method, we selected two input real faces (Fig. 8a, d). We used our region partition
the sigmoid function as the activation function and set the scheme (Fig. 2) for the patch-based LRI representation in
hidden node number to be 150 in ELM training as in [13]. SDT implementation. By computing the weights locally with
The local neighborhood size in LLE is set to four for both real nearby exemplars for each region, our method well preserves
face and cartoon face subspaces, and the reduced dimension the differences of the eyes and noses of the two real faces in
of LLE is 8. For reference, we also show an exemplar face the cartoon results. On the contrary, the SDT method uses the
in Fig. 7d that shares a similar eye shape with the input. Its same weights computed from all exemplars for LRIs of all
cartoon counterpart modeled by the artist is shown in Fig. 7e. regions and thus fails to maintain these personalized facial
Note that our local deformation mapping successfully learns features in the cartoon results.
the artistic style from the sparse exemplars and generates Since the perception of the similarity between real and
consistent eye shape in the cartoon result. On the contrary, cartoon faces is quite subjective, we also conduct an infor-
Li et al.’s method fails to preserve the artistic style of the mal user study to demonstrate the advantage of our method.
eye shape exhibited in the exemplar because the LLE learned To this end, we randomly selected 14 subjects (9 males

123
J. Zhou et al.

100%
90%
80%
Rate of each method

70%
60%
50%
40% 78.9% 84.2% 84.2%
76.3% 73.7% 71.1% 73.7% 73.7%
30% 60.5% 63.2% 60.5% 60.5% 63.2%
52.6%
20%
10%
0%
case 1 case 2 case 3 case 4 case 5 case 6 case 7 case 8 case 9 case 10 case 11 case 12 case 13 case 14

our method Li's method SDT

Fig. 9 Results of the user study

Fig. 11 Local vs. global mapping. a, d Input real faces. b, e Cartoon


results generated by our method. c, f The results generated by global
deformation mapping in face identity space
Fig. 10 Our method vs. direct transfer. a Input real faces; b cartoon
results generated by our method. c Cartoon results generated by direct
deformation transfer

and 5 females, age ranging from 7 to 60) in Facewarehouse


dataset [5] and generate their cartoon faces with our method,
Li’s method, and SDT, respectively. In the user study, we
display the photos of each subject and the randomly ordered
three cartoon faces in one page to the users and ask each user
to select a cartoon face that is the most similar to the photo.
The rendering images all the test cases and the UI for user
study are shown in the supplemental material. There are 38
users who participated in the user study. As shown in Fig. 9,
the results generated by our method are selected as the most
similar ones much more frequently (69.7 %) than the ones
generated by Li’s method (12.4 %) and SDT (17.9 %). Fig. 12 Full face vs. regional mapping. a, d Input real faces; b, e
Cartoon results generated by our method. c, f Cartoon faces generated by
Method validation We validate our local deformation map-
local deformation mappings regressed from full face without partition
ping scheme by comparing our method with alternative
solutions in which each component is replaced with a naive
solution. and thus cannot preserve the personalized face features such
Figure 10 compares our method with a naive method that as face shape and length.
directly transfers the deformation gradients of the input real We also validate our local mapping scheme in face
face to the cartoon result without mapping. The direct transfer geometry space by training the local deformation mapping
method transfers all facial geometry details of an input real collection without face partition. Figure 12 compares the
face (such as the nasolabial folds marked in the red box) to results generated by our method with the ones generated by
the cartoon model and thus violates the artistic style learned this alternative approach. Note that our method can well pre-
from the exemplars. On the contrary, our mapping method serve the personalized face features of each real face (e.g., the
successfully maintains the artistic style in the cartoon results big eye in the top row and the small eye in the second row),
and avoids these undesired geometric details in the result while the method with the full face region local deforma-
cartoon model. tion mappings fails to maintain these personalized features
We then validate our local mapping scheme in face iden- in its cartoon results. In the supplemental material, we also
tity space by training one deformation mapping for each face compare the results generated by our method with different
regions from all exemplars. Figure 11 compares the results settings of the neighborhood size K .
generated by our method and this alternative solution. Com- Cartoon generation results Figure 13 illustrates cartoon
pared to the results generated by our method, the results models in two different artistic styles, which are generated by
generated by global mapping scheme exhibit less variations our method for a set of real faces in the USF 3D face dataset.

123
3D cartoon face generation by local deformation mapping

Fig. 13 The cartoon results generated by our method. a The input real faces. b, c The results generated by local deformation mappings learned
from the Style I set, d and e the results generated by local deformation mappings learned from the Style II set. c, e The 3D cartoon avatars with
faces automatically generated by our method and hair and bodies manually designed by the artist

123
J. Zhou et al.

For visualization, we also integrated the result cartoon faces 5. Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: A
with a 3D hair and body model manually designed by the 3d facial expression database for visual computing. Vis. Comput.
Gr. IEEE Trans. 20(3), 413–425 (2014)
artists who created all the exemplars. With the local defor- 6. Chen, H., Xu, Y.Q., Shum, H.Y., Zhu, S.C., Zheng, N.N.: Example-
mation mappings learned from sparse exemplars, the results based facial sketch generation with non-parametric sampling. In:
generated by our method not only preserve the personalized Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE
face features of each input, but also maintain the artistic style International Conference on, vol. 2, pp. 433–438. IEEE (2001)
7. Chen, Y.L., Liao, W.H., Chiang, P.Y.: Generation of 3d caricature
consistent to the input exemplars. by fusing caricature images. In: Systems, Man and Cybernetics.
SMC’06. IEEE International Conference on, vol. 1, pp. 866–871.
IEEE (2006)
8. Clarke, L., Chen, M., Mora, B.: Automatic generation of 3d carica-
8 Conclusion and future work tures based on artistic deformation styles. Vis. Comput. Gr. IEEE
Trans. 17(6), 808–821 (2011)
We have developed a novel local deformation mapping tech- 9. Fujiwara, T., Koshimizu, H., Fujimura, K., Kihara, H., Noguchi, Y.,
Ishikawa, N.: 3d modeling system of human face and full 3d facial
nique to automatically generate 3D cartoon face for any given caricaturing. In: 3-D Digital Imaging and Modeling. Proceedings.
3D real face. The local nonlinear deformations and collec- Third International Conference on, pp. 385–392. IEEE (2001)
tions of local deformation mapping presented in our method 10. Fujiwara, T., Nishihara, T., Tominaga, M., Kato, K., Murakami,
efficiently model the relationship of two subspaces spanned K., Koshirnizu, H.: On the detection of feature points of 3d facial
image and its application to 3d facial caricature. In: 3-D Digital
by real and cartoon faces from a sparse set of exemplars. Our Imaging and Modeling, 1999. Proceedings. Second International
method generates artistic-looking 3D cartoons while preserv- Conference on, pp. 490–496. IEEE (1999)
ing facial characteristics of individuals. It is fast and has the 11. Kamada, T., Kawai, S.: An algorithm for drawing general undi-
potential for real-time applications. rected graphs. Inf. Process. Lett. 31(1), 7–15 (1989)
12. Kondo, T., Murakami, K., Koshimizu, H.: From coarse to fine cor-
Although our method works well for generating 3D car- respondence of 3-d facial images and its application to 3-d facial
toons of real faces with variant shape features, it still has caricaturing. In: 3-D Digital Imaging and Modeling. Proceedings.
some limitations that need future work. Our system only takes International Conference on Recent Advances in, pp. 283–288.
the shape difference of face features into consideration and IEEE (1997)
13. Li, P., Chen, Y., Liu, J., Fu, G.: 3d caricature generation by manifold
ignores the detailed appearance variations among both real learning. In: Multimedia and Expo. IEEE International Conference
faces and cartoon faces. We would like to investigate how to on, pp. 941–944. IEEE (2008)
extend our method for modeling the appearance variations of 14. Liang, L., Chen, H., Xu, Y.Q., Shum, H.Y.: Example-based cari-
real faces and cartoon faces and the mapping between them. cature generation with exaggeration. In: Computer Graphics and
Applications. Proceedings. 10th Pacific Conference on, pp. 386–
In addition, our current training data set does not include the 393. IEEE (2002)
hair. It is interesting to develop efficient methods for gen- 15. Liu, J., Chen, Y., Miao, C., Xie, J., Ling, C.X., Gao, X., Gao, W.:
erating 3D cartoons of real hair so that we could generate Semi-supervised learning in reconstructed manifold space for 3d
complete 3D cartoon heads. Finally, we would like to extend caricature generation. In: Computer Graphics Forum, vol. 28, pp.
2104–2116. Wiley Online Library (2009)
our method for generating animated 3D cartoons of a real 16. Liu, S., Wang, J., Zhang, M., Wang, Z.: Three-dimensional cartoon
face with different expressions. facial animation based on art rules. Vis. Comput. 29(11), 1135–
1149 (2013)
Acknowledgments We thank reviewers for their constructive com- 17. Mingming, Z., Shoukuai, L., Jiajun, W., Huaqing, S., Zhigeng, P.:
ments and suggestions. We also thank graphics and parallel processing The 3d caricature face modeling based on aesthetic formulae. In:
lab of Zhejiang University to share the FaceWarehouse dataset for our Proceedings of the 9th ACM SIGGRAPH Conference on Virtual-
research. All cartoon exemplars used in our system are created by Xing Reality Continuum and its Applications in Industry, pp. 191–198.
Zhao. ACM (2010)
18. Mo, Z., Lewis, J., Neumann, U.: Improved automatic caricature
by feature normalization and exaggeration. In: ACM SIGGRAPH
2004 Sketches, p. 57. ACM (2004)
19. Sadimon, S.B., Sunar, M.S., Mohamad, D., Haron, H.: Computer
References generated caricature: a survey. In: Cyberworlds (CW), 2010 Inter-
national Conference on, pp. 383–390. IEEE (2010)
1. Akleman, E.: Making caricatures with morphing. In: ACM SIG- 20. Sucontphunt, T.: 3d artistic face transformation with identity
GRAPH 97 Visual Proceedings: the art and interdisciplinary preservation. In: Smart Graphics, pp. 154–165. Springer, New York
programs of SIGGRAPH’97, p. 145. ACM (1997) (2014)
2. Akleman, E., Reisch, J.: Modeling expressive 3d caricatures. In: 21. Sucontphunt, T.: A practical approach for identity-embodied 3d
ACM SIGGRAPH 2004 Sketches, p. 61. ACM (2004) artistic face modeling. Int. J. Comput. Games Technol. 2014, 7
3. Baran, I., Vlasic, D., Grinspun, E., Popović, J.: Semantic deforma- (2014)
tion transfer. In: ACM Transactions on Graphics (TOG), vol. 28, 22. Sumner, R.W., Popović, J.: Deformation transfer for triangle
p. 36. ACM (2009) meshes. In: ACM Transactions on Graphics (TOG), vol. 23, pp.
4. Bradley, D., Heidrich, W., Popa, T., Sheffer, A.: High resolution 399–405. ACM (2004)
passive facial performance capture. ACM Trans. Gr. (Proc. SIG- 23. Wang, S.F., Lai, S.H.: Manifold-based 3d face caricature gener-
GRAPH) 29(3) (2010) ation with individualized facial feature extraction. In: Computer

123
3D cartoon face generation by local deformation mapping

Graphics Forum, vol. 29, pp. 2161–2168. Wiley Online Library Zicheng Liu received his Ph.D.
(2010) in computer science from Prince-
24. Xie, J., Chen, Y., Liu, J., Miao, C., Gao, X.: Interactive 3d caricature ton University in 1996. He is
generation based on double sampling. In: Proceedings of the 17th currently a principal researcher
ACM international conference on Multimedia, pp. 745–748. ACM at Microsoft Research, Red-
(2009) mond, Washington. Before join-
ing Microsoft Research, he
worked at Silicon Graphics Inc.
Jingyong Zhou is currently as a member of technical staff.
a Ph.D. candidate of Com- His current research interests
puter Science and Technology include human activity recog-
at Institute for Advanced Study nition, 3D face modeling and
of Tsinghua University. His animation, and multimedia sig-
adviser is Prof. Baining Guo. He nal processing. He is an affiliate
obtained his bachelor’s degree professor in the department of
in Electronic Engineering from Electrical Engineering, University of Washington. He is an IEEE dis-
Tsinghua University in 2010. tinguished lecturer from 2015 to 2016. He is a fellow of IEEE.
During his study, he interned
at Microsoft Research, Asia for
over four years. His research Baining Guo is Assistant Man-
interests include cartoon face aging Director of Microsoft
modeling and animation. Research Asia, where he also
serves as the head of the graph-
ics lab. Prior to joining Microsoft
Xin Tong is now a principal Research China in 1999, he was
researcher in Internet Graphics a senior staff researcher with the
Group of Microsoft Research Microcomputer Research Labs
Asia. He obtained his Ph.D. of Intel Corporation in Santa
degree in Computer Graphics Clara, California. He received his
from Tsinghua University in Ph.D. and MS from Cornell Uni-
1999. His Ph.D. thesis is about versity and BS from Beijing Uni-
hardware-assisted volume ren- versity. He is a fellow of IEEE
dering. He got his BS Degree and and ACM. His research inter-
Master Degree in Computer Sci- ests include computer graphics,
ence from Zhejiang University visualization, natural user interface, and statistical learning. He is par-
in 1993 and 1996, respectively. ticularly well known for his work on texture and reflectance modeling,
His research interests include real-time rendering, and geometry modeling.
appearance modeling and ren-
dering, texture synthesis, and
image based modeling and rendering. Specifically, his research concen-
trates on studying the underline principles of material light interaction
and light transport, and developing efficient methods for appearance
modeling and rendering.

123

You might also like