´¸'¸¯¸´¸`-¸`¬¸

Master’s Thesis
º¸º¸·¸¹¸·.¨¸`-¸'¸¯¬¸´¸¯¸¨¸¯¸¨¸`-¸¨¸¨¸
¯¸-¸¸'¸^¸´¸
Many-to-many feature point correspondence establishment for
region-based facial expression cloning
´¸¯¸(÷° Xing, Qing)
7¸·¸7¸´¸¯¸¨(7¸´¸¯¸7¸¯·¸
Department of Electrical Engineering and Computer Science
Division of Computer Science
¯¸¯¬¸¨(¯¸·¸´·¸"é¸
Korea Advanced Institute of Science and Technology
2006
º¸º¸·¸¹¸·.¨¸`-¸'¸¯¬¸´¸¯¸¨¸¯¸¨¸`-¸¨¸¨¸
¯¸-¸¸'¸^¸´¸
Many-to-many feature point correspondence
establishment for region-based facial
expression cloning
Many-to-many feature point correspondence
establishment for region-based facial
expression cloning
Advisor : Professor Shin, Sung Yong
by
Xing, Qing
Department of Electrical Engineering and Computer Science
Division of Computer Science
Korea Advanced Institute of Science and Technology
A thesis submitted to the faculty of the Korea Advanced Institute
of Science and Technology in partial fullfillment of the requirements
for the degree of Master of Engineering in the Department of Electri-
cal Engineering and Computer Science Division of Computer Science
Daejeon, Korea
2006. 6. 16.
Approved by
Professor Shin, Sung Yong
Advisor
º¸º¸·¸¹¸·.¨¸`-¸'¸¯¬¸´¸¯¸¨¸¯¸¨¸`-¸¨¸¨¸
¯¸-¸¸'¸^¸´¸
´¸¯ ¸
´¸ `-¸`¬¸´-¸ ¯¸¯¬¸¨(¯¸·¸´·¸"é¸ ´¸'¸¯¸´¸`-¸`¬¸´.~. ¯¸´¸`-¸`¬¸´¸'¸
´¸"é ¸° ,^¸'¸´¸'¸`-¸¨(¬¸º¸-¸.
2005`¸6Z416´¸
´¸'¸´¸"é ¸´_ Sung Yong Shin (´¸)
´¸'¸´¸" é¸ Otfried Cheong (´¸)
´¸'¸´¸" é¸ Frederic Cordier (´¸)
MCS
20044365
??. Xing Qing. Many-to-many feature point correspondence establishment
for region-based facial expression cloning. *å*97¸ºÿ›¹¡+äõi;¹¸¿ì>aÂ6Òl¸'¸
l¸ m;? +8 '¸‹ûBMÎ37¸ 0ïF)ç. Department of Electrical Engineering and Com-
puter Science Division of Computer Science . 2006. 30p. Advisor Prof. Shin,
Sung Yong. Text in English.
Abstract
In this thesis, we propose a method to establish a many-to-many feature point correspon-
dence for region-based facial expression cloning. By exploiting the movement coherency of
feature points, we first construct a many-to-many feature point matching cross source and
target face models. Then we extract super nodes from the relationship of source and target
feature points. Source super nodes that show a strong movement coherency are grouped
into concrete regions. The source face region segmentation result is transferred to the target
face via the one-to-one super node correspondence. After we obtain corresponding regions
on source and target faces, we classify face mesh vertices to different regions for later fa-
cial animation cloning. Since our method reveals the natural many-to-many feature point
correspondence on source and target faces, each region is adaptively sampled by varying
number of feature points. Hence the region segmentation result can preserve more mesh
deformation information.
i
Contents
Abstract i
Contents iii
List of Tables v
List of Figures vi
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Related Works 6
3 Region Segmentation 8
3.1 Many-to-many Feature Point Matching . . . . . . . . . . . . . . . . . . . . 8
3.2 Super Node Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Source Super Node Grouping . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Region Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5 Vertex Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 Experimental Results 19
4.1 Key Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Key Model Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Cloning Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5 Conclusion 26
Summary (in Korean) 27
iii
References 28
iv
List of Tables
4.1 Key model specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Self-cloning Errors for Man . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Performance comparison with Park et al.’s . . . . . . . . . . . . . . . . . . 24
v
List of Figures
1.1 Source and target regions may have different density of feature points. The
red dots on the source mouth region are matched with the four dots on the
target mouth region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Facial expression cloning system overview . . . . . . . . . . . . . . . . . . 4
1.3 Region segmentation operations . . . . . . . . . . . . . . . . . . . . . . . 4
3.1 Extracted feature points on the source and target faces. . . . . . . . . . . . 9
3.2 Feature point relationship graph. Each connected component in this graph
has a pair of corresponding source and target super nodes. . . . . . . . . . . 11
3.3 Feature point matching results comparison. Black dots are unmatched fea-
ture points. (a) one-to-one matching result in [16], (b) our many-to-many
matching result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Region segmentation result compared with [16]. (a) Park’s result (b) Our
result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Vertex-region coherency is defined as the maximum of vertex-feature point
coherencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.6 (a) Partition of feature points (b) Vertex classification result . . . . . . . . 18
4.1 Face models. (a) Man (b) Roney (c) Gorilla (d) Cartoon . . . . . . . . . 19
4.2 14 key models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Comparison of feature point matching results. . . . . . . . . . . . . . . . . 22
4.4 Region segmentation results on other face models. . . . . . . . . . . . . . . 23
4.5 Expression cloning results . . . . . . . . . . . . . . . . . . . . . . . . . . 25
vi
1. Introduction
1.1 Motivation
Computer animated characters are nowindispensable components of computer games, movies,
web pages, and various human computer interface designs. To make these animated virtual
characters lively and convincing, people need to produce realistic facial expressions which
play the most important role in delivering emotions. Traditionally, facial animation has
been produced largely by keyframe techniques. Skilled artists manually sculpt keyframe
faces very two or three frames for an animation consisting of tens of thousands of frames.
Furthermore, they have to repeat a similar operation on different faces. Although it guaran-
tees the best quality animation, this process is painstaking and tedious for artists and costly
for animation producers. While large studios or production houses can afford to hire hun-
dreds of animators to make feature films, it is not feasible for low budget or interactive
applications.
As facial animation libraries are becoming rich, the reuse of animation data has been a
recurring issue. Noh and Neumann[14] presented a data-driven approach to facial animation
known as expression cloning which transferred a source model’s facial expressions in an
input animation to a target face model. They first computed displacement vectors for source
vertices. Then they modified the vectors based on 3D morphing between the source and
target face meshes. The modified vectors were applied to target vertices to deform the
target neutral face mesh. This method works well only when the source and target faces
share similar topology and the displacement of vertices from the neutral mesh is small.
Pyun et al. [20] proposed a blend shape approach based on scattered data interpolation.
Given source key models and their corresponding target key models, the face model at each
frame of an input animation is expressed as a weighted sum of source key models, and the
weight values of source key models are applied to blend the corresponding target key mod-
els to obtain the face model at the same frame of the output animation. This approach is
computationally stable and efficient, as pointed out in [13]. What makes it more attractive is
1
that blend shape approach doesn’t restrict source and target face models have similar topol-
ogy, or the source animation have small deformations from the neutral face. This versatility
can not be achieved by physics based or morph based approaches. By providing source
and target key models and assigning the correspondence, the animator can incorporate her
creativity into the output animation. However, in [20] they blended each key model as a
whole entity. So the number of key models grows combinatorially as the number of facial
aspects such as emotions, phonemes and facial gestures increases.
Park et al.’s feature-based approach [15] is an extension of Pyun et al.’s work and pushed
the utilization of example data to the extreme. Later their region-based approach [16] auto-
mated the process of region segmentation which greatly reduced an artist’s work load. By
analyzing source and target key models, their system automatically segment the source and
target faces into corresponding regions. They apply the blend shape scheme [20] to each of
the regions separately and composite the results to get a seamless output face model. The
resulting animation preserves the facial expressions of the source face as well as the charac-
teristic features of the target examples. With a small number of key models, the region-based
approach can generate diverse facial expressions like asymmetric gestures while enjoying
the inherent advantages of blend shape approach.
In the region-based approach [16], it’s critical to segment source and target face models
into coherently moving regions. The corresponding source and target regions must have a
strong correlation such that similar source and target expressions can be generated by using
the same set of blending weights. In the previous work [16], they assumed the cross-model
feature point correspondence is a one-to-one mapping. They transferred the source feature
point grouping result to the target face via this one-to-one source and target feature point
mapping. Thus, the corresponding source and target regions were sampled by the same
number of feature points. However, source and target face models may have significantly
different geometric characteristics or mesh resolutions. So for regions containing the same
facial feature, the density of feature points varies dramatically.
As illustrated in Figure 1.1, let’s assume the source face has a big mouth or the mesh
in the mouth region has many fine details. Hence, we extract a lot of feature points in the
source mouth region. On the other hand, the target face doesn’t have many mesh details in
the mouth region. We extract only four feature points. Under the assumption of one-to-one
2
source face model target face model
one-to-one matching
[Park and Shin 06]
Figure 1.1: Source and target regions may have different density of feature points. The red
dots on the source mouth region are matched with the four dots on the target mouth region.
correspondence, only four source feature points are matched with the four target feature
points. Thus the source mouth region is sampled by only four feature points. A great deal
of information is lost. It’s the same case for the left eye region on the source and target
faces.
Our observation is that one source (target) feature point doesn’t necessarily moves co-
herently with only one target (source) feature point. It actually moves similarly with a
small group of feature points. We propose to establish a many-to-many cross-model feature
point correspondence. Under this correspondence, corresponding source and target regions
are adaptively sampled by different number of feature points. Our segmentation method
maximizes the correlation between the source and target regions.
1.2 Overview
Following the framework of region-based approach [16], our facial expression cloning sys-
tem consists of two parts, analysis and synthesis, as illustrated in Figure 1.2. Our contribu-
tion lies in the region segmentation part. Figure 1.3 shows the sequence of operations in the
region segmentation module.
The analysis part is preprocessing which is performed only once. Regarding a face
mesh as a mass-spring network, the system automatically picks feature points on source
and target faces. Feature points are defined as vertices which have local maximum spring
3
feature point
extraction
source key models
target key models
region
segmentation
parameterization
analysis
parameter
extraction
input
animation
key shape
blending
region
composition
synthesis
output
animation
Figure 1.2: Facial expression cloning system overview
region segmentation
many-to-many
feature point matching
super node
extraction
source super
node grouping
region
transfer
vertex
classification
Figure 1.3: Region segmentation operations
4
potential energies. We establish a many-to-many correspondence between source and tar-
get feature points through running the hospitals/residents algorithm [7] in two directions.
Since a small group of source feature points move coherently with a small group of target
feature points, we name coherently moving source and target feature point groups as super
nodes. The one-to-one super node correspondence is set up by finding connected compo-
nents in a graph which embodies the source and target feature point relationship. Similar
to Park’s [16] feature point grouping, we group source super nodes into concrete regions.
The source region segmentation result is easily transferred to the target face via the super
node correspondence. Now we are ready to classify every vertex to regions according to the
vertex-region correlation. The last preprocessing step is to place each region in the param-
eter space, which is a standard technique for blend shape-based facial expression cloning.
Readers can refer Pyn’s paper [20] for technical details.
The task in the synthesis part is to transfer expressions from the source face to the target
face at runtime. There are three steps in this part, parameter extraction, key shape blending,
and region composition. Park’s work [16]has treated this part rather well. Hence, we use
their techniques directly.
The remainder of this thesis is organized as follows: In Chapter 2, we review related
works. Chapter 3 describes our region segmentation method in detail. We show experi-
mental results in Chapter 4. Finally, we conclude our work and suggest future research in
Chapter 5.
5
2. Related Works
Realistic facial animation remains a fundamental challenge in computer graphics. Begin-
ning with Parke’s pioneering work [17], extensive research has been dedicated to this field.
Williams [23] first proposed a performance-driven facial animation. Noh and Neumann
[14] addressed the problem of facial expression cloning to reuse facial animation data. In
fact, performance-driven animation can be regarded as a type of expression cloning from
an input image sequence to a 3D face model. We focus on recent results closely related to
facial expression cloning besides those already mentioned in Chapter 1. An comprehensive
overview can be found in the well known facial animation book by Parke and Waters [18].
Blend shape scheme: Following Williams’ work, there have been many approaches
in performance-driven animation [10, 19, 2, 6, 4, 11, 1, 5, 3]. For our purposes, the most
notable are blend shape approaches [10, 19, 2, 6, 1, 5, 3], in which a set of example models
are blended to obtain an output model. In general, the blending weights are computed by
least squares fitting [10, 19, 2, 6, 1, 5, 3]. From the observation that the deformation space of
a face model is well approximated by a low-dimensional linear space, a series of research
results on facial expression cloning have been presented based on a blend shape scheme
with scattered data interpolation [20, 13, 15, 16]. The favorable advantages are stated in
Chapter 1.
Region segmentation: While being robust and efficient, the main difficulty of blend
shape approaches is an exponential growth rate of the number of key models with respect
to the number of facial attributes. Kleiser [9] applied a blend shape scheme to manually-
segmented regions and then combined the results to synthesis a facial animation. Joshi et
al. [8] automatically segmented a single face model based on a deformation map. Inspired
by these approaches, Park et al. [15] proposed a method for segmenting a face model into
a predefined number of regions, provided with a set of feature points manually specified on
each face feature. The idea was to classify vertices into regions, each containing a face fea-
ture, according to the movement coherency of each vertex with respect to the feature points
in each region. Park et al. [16] further explored this idea to automate the whole process of
6
feature-based expression cloning, which greatly reduces the animator’s burden. They ad-
dressed three issues of automatic processing: the extraction, correspondence establishment,
and grouping of the feature points on source and target face models.
Multi-linear model: Vlasic et al. [22] proposed a method based on a multi-linear hu-
man face model to map video-recorded performances of one individual to facial animations
of another. This method is a generalization of blend shape approach and thus can be trivially
adapted to facial expression cloning. As a general data-driven tool, a reasonable multi-linear
model requires a large number of face models with different attributes. Moreover, the multi-
linear model is not quite adequate to address specific issues in facial expression cloning such
as asymmetric facial gestures and topological independence between source and target face
models.
Mesh deformation transfer: Sumner and Popovi [21] proposed to transfer the defor-
mation of a triangular mesh to another triangular mesh. This method can also be applied to
facial expression cloning. Unlike blend shape approaches [20, 13, 15, 16], the method does
not require any key models besides a source and a target face mesh. Instead, the animator
manually provides facial feature points and their correspondence between source and target
models. Without using key face models, however, it is hard to incorporate the animator’s
intention into the output animation. Another limitation is that the source and target models
should share the same topology although their meshes may be different in both vertex count
and connectivity.
7
3. Region Segmentation
In this chapter, we explain in detail our new method to segment the source and target faces
into corresponding regions which will be synthesized individually at runtime. We first estab-
lish a many-to-many correspondence between source and target feature points by analyzing
their movement coherence. Then we extract super nodes from the relationship graph of the
source and target feature points. Next we segment the source face into regions by grouping
source super nodes. Since the source and target super nodes have one-to-one correspon-
dence, the grouping result is transferred from the source face to the target face. Finally
every vertex of the face mesh is classified to one or more regions according to its movement
coherence with the regions. In general, a face model is symmetrical with respect to the ver-
tical bisecting plane. Assuming that both halves of the face model have similar deformation
capability, we only do analysis on a halve face and reflect the regions to the other half face.
3.1 Many-to-many Feature Point Matching
We extract feature points on the source and target faces by using the method in [16]. Figure
3.1 shows the feature points extracted on the source and target faces. We want to find their
correspondence so that corresponding feature points move similarly when the source and
target faces show the same expression.
From a single source feature point’s view, it does not necessarily move similarly as
only one target feature point. Rather it moves similarly as a small group of target feature
points. The situation is the same for a target feature point. So we set up this many-to-many
correspondence in two steps. In the first step, we find corresponding target feature points
for every source feature point. In the second step, we find corresponding source feature
points for every target feature point. The two steps are symmetric. We mainly describe the
first step as follows.
We use the equation proposed in [16] to measure the movement coherency c
jk
for a
source feature point v
j
and a target feature point v
k
.
8
Figure 3.1: Extracted feature points on the source and target faces.
c
jk
=

1
N
N−1

i=0
s
i
jk

w
1
·

1
N
N−1

i=0
θ
i
jk

w
2
·

d
0
jk

w
3
(3.1)
where
s
i
jk
=





1 if v
i
j
= v
0
j
and v
i
k
= v
0
k
1−
abs( v
i
j
−v
0
j
−v
i
k
−v
0
k
)
max{ v
i
j
−v
0
j
, v
i
k
−v
0
k
}
otherwise
θ
i
jk
=























1 if v
i
j
= v
0
j
and v
i
k
= v
0
k
0 if v
i
j
= v
0
j
or v
i
k
= v
0
k
(but not both)
max

v
i
j
−v
0
j
v
i
j
−v
0
j

·
v
i
k
−v
0
k
v
i
k
−v
0
k

, 0

otherwise
d
0
jk
= max

1−
v
0
k
−v
0
j

D
, 0

.
9
Here, N is the number of source key models. v
i
j
denotes the vertex j’s 3D position in
key model i. Key model 0 is the neutral face which is regarded as the base model. w
l
,
l=1,2,3 is the weight for each multiplicative term. The user can adjust these parameters.
We empirically set w
1
= 2
−1
, w
2
= 2
0
, and w
3
= 2
2
. We define D = maxD
S
, D
T
. D
S
is the minimum Euclidean distance such that the source vertices form a single connected
component when we connect every pair of vertices whose Euclidean distance is not greater
than D
S
. D
T
can be obtained by binary search, starting from the maximum distance over all
pairs of vertices.
Intuitively, s
i
jk
measures the similarity of moving speeds of vertices, v
i
j
and v
i
k
. θ
i
jk
gives
the similarity of moving directions. d
0
jk
measures the geometrical proximity of the pair
of vertices, v
0
j
and v
0
k
in the base face model (key model 0) that correspond to v
i
j
and v
i
k
,
respectively. Note that every term takes on a value between zero and one, inclusively. Thus,
the movement coherency c
jk
also takes on a value in the same range.
For every source (target) feature point, a preference list of target (source) feature points
is made by interpreting the movement coherency value c
jk
as the preference that source
feature point j and target feature point k have for each other. Now the problem of finding
corresponding target feature points for every source feature point can be reduced to the
hospitals/residents problem with ties [7]. Here we consider every source feature point as
a hospital and every target feature point as a resident. We set the number of available
posts of every hospital to be 5% of the total number of residents. The algorithm [7] can
determine whether a given instance of hospitals/residents problem with ties admits a super-
stable matching, and construct such a matching if it exists. Let m and n be the number of
hospitals and residents respectively. The algorithm is O(mn) time - linear in the size of the
problem instance. If the algorithm reports there is not a super stable matching, we break
the ties according the feature points’ geometrical distance d
i j
and run the algorithm again
to find a weak stable matching.
In the second step, we treat target feature points as hospitals and source feature points as
residents. After running the algorithm again, every target feature point gets its correspond-
ing source feature points.
We construct an undirected bipartite graph G = (V
F
, E
F
). Vertex set V
F
consists of
10
source feature points
target feature points
target
super node
source
super node
Figure 3.2: Feature point relationship graph. Each connected component in this graph has
a pair of corresponding source and target super nodes.
11
source and target feature points. A source and a target feature points are connected by an
edge if they are matched either in the first step or the second step. If a feature point is
not matched to any other feature points, we do not include it in the graph. We mark it as
unmatched and deal with it in a later processing stage. The bipartite graph in Figure 3.2
embodies the many-to-many correspondence between source and target feature points.
3.2 Super Node Extraction
From the definition of movement coherency equation 3.1, we can say that if several source
feature points move coherently with one target feature, then these source feature points must
also have high movement coherencies. We find all connected components in the undirected
bipartite graph G. This problem can be solved in O(|V| +|E|) time using a standard graph
algorithm [12]. Each connected component has a group of source feature points and a
group of target feature points. We define the two groups as corresponding super nodes on
the source and target face models, shown in 3.2. They have two properties: first, all the
feature points in a super node move coherently; second, the feature points in a source super
node move coherently with the feature points in the corresponding target super node. By
extracting super nodes from the relationship graph, we convert the many-to-many feature
point matching to the one-to-one super node matching. We compare our matching result
with that in Park’s previous work in Figure 3.3. In [16], many feature points are unmatched
on the source face. Hence a large amount of information about the source mesh detail is
lost.
3.3 Source Super Node Grouping
Similar to the idea of feature point grouping in [16], we group source super nodes into
regions. The underlying idea is to partition the face mesh into meaningful regions such that
each region contains a facial feature like an eye, the mouth, or a cheek. By using Equation
?? with slight modifications, we can compute the movement coherency c
jk
of two vertices
v
j
and v
k
on the same face.
Then we define the movement coherency between two source super nodes s
p
and s
q
as
12
(a)
(b)
Figure 3.3: Feature point matching results comparison. Black dots are unmatched feature
points. (a) one-to-one matching result in [16], (b) our many-to-many matching result.
13
below:
C
pq
=
1
|I
p
| ×|I
q
|

j∈I
p

k∈I
q
c
jk
(3.2)
Here I
p
and I
q
are index sets of feature points in source super nodes s
p
and s
q
. I
p
= {j |
feature point v
j
belongs to super node s
p
}. I
q
= {k | feature point v
k
belongs to super node
s
q
}. |I
p
| and |I
q
| are the numbers of feature points in super nodes s
p
and s
q
.
Our assumption is that if two super nodes have a high movement coherency, they belong
to the same region. We construct an undirected graph G
S
= (V
S
, E
S
), where V
S
is the source
super node set. A pair of super nodes are connected by an edge in E
S
if their movement
coherency is greater than or equal to a given threshold γ. The problem of source super node
grouping is reduced to finding connected components in graph G
S
. The user can change
the γ value to control the number of connected components until she thinks the grouping
result is reasonable. There might be some regions which have only one or two feature
points. They are not sampled adequately and will cause artifacts in the output animation.
We remove these outliers by merge their super nodes to other surviving regions.
3.4 Region Transfer
Given the one-to-one correspondence of source and target super nodes, transferring the su-
per node grouping result from the source face to the target face is travail. Suppose source
region SR
i
(i is the region index) has source super nodes s
k
, k ∈ IR
i
. IR
i
is the index set
of super nodes that belong to SR
i
. Then it’s counterpart target region TR
i
has target super
nodes s
j
, j ∈ IR
i
. Here, we want to emphasize several points again. First, source and target
faces have the same number of regions. Second, each source region has its corresponding
target region. Third, a pair of corresponding source and target regions have the same num-
ber of source and target super nodes respectively. Last, a pair of corresponding source and
target regions do not have the same number of source and target feature points because cor-
responding source and target super nodes don’t necessarily have the same number of feature
points. This is the key strength of our segmentation method presented in this thesis. Since
each region is sampled adaptively by a varying number of feature points, the characteristic
of the region is preserved as much as possible. Remember that a few feature points are
14
unmatched after running the hospitals/residents algorithm. Now we classify each of them
into the region which has the largest coherency with it. Figure 3.4 compares our region seg-
mentation result with that in the previous work [16]. In the previous work, corresponding
regions are sampled with the same number of feature points.
3.5 Vertex Classification
Nowwe are ready to classify all vertices of the face mesh into (possibly overlapping) regions
by exploiting the movement coherency of one vertex with respect to one region. Specifi-
cally, we choose the vertex-region coherency c
jF
l
as the maximum of coherencies between
the vertex v
j
and the feature points v
k
contained in the region F
l
, see Figure 3.5. That is,
c
jF
l
= max
k∈I
F
l
{c
jk
}, (3.3)
where I
F
l
is the index set for the feature points in F
l
.
A vertex is classified into a region if their coherency is greater than or equal to a thresh-
old value γ. Note that each vertex can be classified into two or more regions. It is necessary
to have regions overlap on the boundary to get a seamless output animation on the target
face. The user can tone the γ value to control how much regions overlap. Figure 3.6 gives
the vertex classification result.
15
(a)
(b)
Figure 3.4: Region segmentation result compared with [16]. (a) Park’s result (b) Our result
16
vertex
j
v
region
l
F
k
v feature point
Figure 3.5: Vertex-region coherency is defined as the maximum of vertex-feature point
coherencies.
17
(a)
(b)
Figure 3.6: (a) Partition of feature points (b) Vertex classification result
18
4. Experimental Results
We carried out several sets of experiments to verify the new region segmentation method
proposed in this thesis. The facial expression cloning system was implemented with C++
and OpenGL. We performed our experiments on an Intel Pentium

PC (P4 3.0GHz proces-
sor, 2GB RAM, and NVIDIA GeForce FX 5950 Ultra

). All the computation was done on
CPU. Experimental details and data are shown and analyzed in this chapter.
4.1 Key Model Specification
To show the versatility of our region-based approach, we use various face models with dif-
ferent geometric property, topology, and dimension. Figure 4.1 shows the four face models
we used. The numbers of vertices and polygons in each face model, together with the num-
ber of used key models are listed in Table 4.1. Note that the Cartoon face model is a 2D
mesh while others are 3D meshes.
As illustrated in Figure 4.2, we used fourteen key models for expression cloning: one
face model with a neutral expression, six key models for emotional expressions, and seven
key models for verbal expressions. Each source key model has a corresponding target key
model. The neutral source and target face models are also called the source and target base
(a) (b) (c) (d)
Figure 4.1: Face models. (a) Man (b) Roney (c) Gorilla (d) Cartoon
19
Table 4.1: Key model specification
appearance # vertices # polygons # key models
Man Figure 4.1 (a) 1839 3534 14
Roney Figure 4.1 (b) 5614 10728 14
Gorilla Figure 4.1 (d) 4160 8266 7

Cartoon Figure 4.1 (f) 902 1728 7


neutral, joy, surprise, anger, sadness, disgust, and sleepiness.
Figure 4.2: 14 key models.
20
face models respectively. The source face models share the same mesh configuration. So
do the target face models. However, the source and target face models, in general, may have
different mesh configuration and even different topology. For Gorilla and Cartoon, we use
even less key models and still can get satisfactory results.
4.2 Key Model Analysis
In this section, we show our key model analysis results. We used the same γ value for
feature point correspondence establishment, super node grouping and vertex classification.
For movement coherency, we set w
1
= 2
−1
, w
2
= 2
0
, and w
3
= 2
2
for Equation 3.1.
Figure 4.3 comprises our feature point matching results with Part et al’s [16]. Our
method makes use of larger percentages of original feature points. Region segmentation
results on more face models are shown in Figure 1.1. By using different threshold, the user
can control the number of regions. We get reasonable region segmentation by trial and error.
4.3 Cloning Errors
We perform two sets of experiments and compare the results from previous work [16] to
verify our segmentation methods can achieve more accuracy. To measure the accuracy,
self-cloning was done for the face model Man in two ways, direct self-cloning (from Man
to Man) and indirect self-cloning (first from Man to X and then from X back to Man). The
cloning error ε is measured as follows:
ε =

n
j=1
x
j
−x

j


n
j=1
x
j

, (4.1)
where x
j
and x

j
are the original and cloned 3D positions of a vertex v
j
of the face model
Man, and n is the number of vertices in the model. For comparison with Park’s work [16],
we use the same sets of models. Roney, Gorilla, and Cartoon were used as the intermediate
models. The input animation for Man consists of 1389 frames. The results were collected
in Table 4.2 and Table 4.3.
The next set of experiments was conducted to demonstrate the visual quality of cloned
animations, where the transfer of asymmetric facial gestures was emphasized. The results
21
99.187% 99.213% 80.488% 77.593% percentage
122 126 99 99 matched
123 127 123 127 original
many-to-many one-to-one # feature points
Man to Roney
77.297% 88.976% 42.1628% 61.417% percentage
143 113 78 78 matched
185 127 185 127 original
many-to-many one-to-one # feature points
Man to Gorilla
95.238% 73.228% 74.603% 37.008% percentage
60 93 47 47 matched
63 127 63 127 original
many-to-many one-to-one # feature points
Man to Cartoon
Figure 4.3: Comparison of feature point matching results.
22
Man to Gorilla
Man to Cartoon
Figure 4.4: Region segmentation results on other face models.
23
Table 4.2: Self-cloning Errors for Man
Types Intermediate face model
Errors (%)
Park et al.’s approach [16] Our method
direct self-cloning – 0.080 0.060
indirect self-cloning
Roney 0.223 0.195
Gorilla 0.250 0.214
Cartoon 0.167 0.150
Table 4.3: Performance comparison with Park et al.’s
Types Approaches Errors
Time
Key model analysis Run-time transfer
direct cloning
Ours 0.060% 3.02 sec. 0.98 msec. (1020)
(Man to Man) Park et al.’s 0.080% 3.11 sec. 0.96 msec. (1041)
indirect cloning
Ours 0.195% 6.86 sec. 1.38 msec. (725)
(Man to Roney to Man) Park et al.’s 0.223% 7.06 sec. 1.36 msec. (735)
( ) : Average frames per second
24
Figure 4.5: Expression cloning results
are given in the accompanying movie file. Figure 4.5 are snapshots from the animation.
25
5. Conclusion
In this thesis, we present a new region segmentation method which can be integrated in
Park et al’s region-based facial expression system [16]. We first establish a natural many-
to-many feature point correspondence cross source and target face models. Then we extract
coherently moving groups of feature points as super nodes to convert the many-to-many
feature point correspondence to a one-to-one super node correspondence. We segment the
source face model by grouping strongly related source super nodes and reflect the result
onto the target face. Our region segmentation methods adaptively samples facial regions
with varying numbers of feature points. Hence the region segmentation result can preserve
more mesh deformation information.
One limitation of this work is that users have to adjust γ for different face models. In
the future, we also want to incorporate physics to synthesis more realistic facial details.
The facial expression cloning system only focuses on the geometry of face models. There
are many other issues to consider about facial expression cloning, like head motions, gaze
directions, different textures for each key model and so on.
26
³À»ÈÐ%K
*å*97¸ºÿ›¹¡+äõi;¹¸¿ì>aÂ6Òl¸'¸l¸m;?+8'¸‹ûBMÎ37¸
0ïF)ç
`·¸`-¸`¬¸^¸'¸`-¸º¸º¸·¸¹¸´¸¯¸·.¨¸`-¸'¸¯¬¸´¸¯¸¨¸¯¸¨¸¯¸-¸`-¸¨¸¨¸´¬·¸^¸¸¯¸¹_
`¸`¬¸ '¸´¸¯¸¨¸. `-¸¨¸¨¸ '¸^¸^¸ `¬¸¨¸´¸´¸ ´¸'¸'.`¬¸ ´-¸¨¸¬¸~¸ ´..´¸ 1¸' ¸ ´¸¯¸ '.¨¸
7¸´¸ ¨¸¯¸¨¸ `-¸¨¸¨¸ "¸¯ ¸`¬¸ ¯¬¯¸¨¸. ´..´¸ 1¸'¸ `-¸¨¸¨¸`¸ '¸^¸´¸ ¸'¸~.`÷1¸ ´¬¨¸
'.'.¯¬¸ ÷-¸«¸` ¸¨¸. `¬¸¨¸´¸´¸ ´¸'¸´¸^¸ "¸´¬ `-_´-¸ ´¬¨¸'.'.`-¸ `¸ '¸´¸ º¸º¸´.~. ¨
¯-¸°,^ ¸´¬´¸¨¸. ´..´¸¯¸º¸º¸`-¸^¸¨¸¨(¯¬¸´¸'¸¯¬¯¸´¸¯¸´¸´¬¨¸'.'.¯¸-¸^¸"¸·¸
1¸'¸´¸¯¸~.7¸´¬¸¯¸¨¸. `-¸¨¸¨¸^¸~¸-¸¨¸¨¸`¸`¬¸¨¸¨¸´¸º¸º¸^¸"¸·¸`-¸¯¸¯¸¨¸. ·¸¯-¸
´¸¯¬´¸¬¸¯.«¸'¸, `·¸¹_`¸´-¸`-¸¨¸¨¸´¬¨¸¯¸·¸¬¸'¸¨¸¯-¸´¸¯¸º¸º¸^¸¯¸«¸'¸'.¯÷~¸
¨ ¸·.¨¸¯·¸7¸`¬¸²¸¹¸º¸^¸´¬´¸`-¸´_¨¸`¬¸¨¸¨¸¨¸.
27
References
[1] C. Bregler, L. Loeb, E. Chuang, and H. Deshpande. Turning to the masters: motion
capturing animations. In Proc. of ACM SIGGRAPH, pages 399–407, 2002.
[2] I. Buck, A. Finkelstein, and C. Jacobs. Performance-driven hand-drawn animation. In
Symposium on Non-Photorealistic Animation and Rendering, 2000.
[3] Jin Chai, Jing Xiao, and Jessica Hodgins. Vision-based control of 3d facial animation.
In ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 193–
206, 2003.
[4] Byoungwon Choe, Hanook Lee, and Hyeongseok Ko. Performance-driven muscle-
based facial animation. Journal of Visualization and Computer Animation, 12(2):67–
79, 2001.
[5] Erika Chuang and Chris Bregler. Performance driven facial animation using blend-
shape interpolation. Stanford University Computer Science Technical Report,CS-TR-
2002-02, 2002.
[6] Douglas Fidaleo, Junyong Noh, Reyes Enciso Taeyong Kim, and Ulrich Neumann.
Classification and volume morphing for performance-driven facial animation. In In-
ternational Workshop on Digital and Computational Video, 2000.
[7] Robert W. Irving, David Manlove, and Sandy Scott. The hospitals/residents problem
with ties. In SWAT ’00: Proceedings of the 7th Scandinavian Workshop on Algorithm
Theory, pages 259–271, London, UK, 2000. Springer-Verlag.
[8] Pushkar Joshi, Wen C. Tien, Mathieu Desbrun, and F. Pighin. Learning controls for
blend shape based realistic facial animation. In ACM SIGGRAPH/Eurographics Sym-
posium on Computer Animation, pages 187–192, 2003.
[9] J. Kleiser. A fast, efficient, accurate way to represent the human face. ACM SIG-
GRAPH 89 Course #22 Notes, 1989.
28
[10] Cyriaque Kouadio, Pierre Poulin, and Pierre Lachapelle. Real-time facial animation
based upon a bank of 3d facial expressions. In Computer Animation, pages 128–136,
1998.
[11] I-Chen Lin, Jeng-Sheng Yeh, and Ming Ouhyoung. Realistic 3d facial animation pa-
rameters from mirror-reflected multi-view video. In IEEE Computer Animation, pages
241–250, 2001.
[12] K. Mehlhorn. Data Structures and Algorithms, volume 1-3. Springer Publishing
Company, 1984.
[13] K. Na and M. Jung. Hierarchical retargetting of fine facial motions. Computer Graph-
ics Forum, 23(3):687–695, 2004.
[14] J. Noh and U. Neumann. Expression cloning. In ACM SIGGRAPH, pages 277–288,
2001.
[15] Bongcheol Park, Heejin Chung, Tomoyuki Nishita, and Sung Yong Shin. A feature-
based approach to facial expression cloning. Computer Animation and Virtual Worlds,
16(3-4):291–303, 2005.
[16] Bongcheol Park and Sung Yong Shin. A region-based facial expression cloning. Tech-
nical Report CS/TR-2006-256, Korea Advanced Institute of Science and Technology,
2006.
[17] F. I. Parke. Computer generated animation of faces. In ACM National Conference,
pages 451–457, 1972.
[18] Frederic I. Parke and Keith Waters. Computer facial animation. A. K. Peters, Ltd.,
Natick, MA, USA, 1996.
[19] F. Pighin, R. Szeliski, and D. H. Salesin. Resynthesizing facial animation through 3d
model-based tracking. In IEEE International Conference on Computer Vision, pages
143–150, 1999.
29
[20] H. Pyun, Y. Kim, W. Chae, H. Y. Kang, and S. Y. Shin. An example-based approach
for facial expression cloning. In ACM SIGGRAPH/Eurographics Symposium on Com-
puter Animation, pages 167–176, 2003.
[21] Robert W. Sumner and Jovan Popovi´ c. Deformation transfer for triangle meshes. In
ACM SIGGRAPH, pages 399–405, 2004.
[22] Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popovi´ c. Face transfer
with multilinear models. In ACM SIGGRAPH, pages 426–433, 2005.
[23] L. Williams. Performance-driven facial animation. Computer Graphics(Proceedings
of SIGGRAPH 90), 24:235–242, 1990.
30
´ÇԘ+ ;³
I am grateful to numerous people who have helped in many ways throughout this research.
I would like to thank my advisor, Professor Sung Yong Shin for his valuable support,
feedback, and guidance throughout very stage of this research. I appreciate his encourage-
ment for independence while simultaneously providing valuable guidance.
I would also like to thank the master’s thesis committee ( Dr. Shin, Dr. Cheong, and Dr.
Cordier) for their valuable insights and helpful reviews.
Special thanks go to my mentor, Bongcheol Park for his time, direct guidance, and
consistent patience as I floundered my way through this process.
I would like to express my gratitude to the Institute of Information Technology Assess-
ment for generously sponsoring this research throughout Korean Government IT Scholar-
ship Program.
I am thankful to all members of TC Lab and all my friends for their help, friendship,
and the comfortable working atmosphere.
Last, but far from least, I must thank my parents for their unending support, uncondi-
tional love, and forever blessing. Although I couldn’t see them in the last two years, chatting
with them over phone is the best way to cheer me up.

Sign up to vote on this title
UsefulNot useful