Professional Documents
Culture Documents
factorization algorithm
Jorge C. Luceroa兲
Department of Mathematics, University of Brasilia, Brasilia DF 70910-900, Brazil
Kevin G. Munhallb兲
Departments of Psychology and Otolaryngology, Queen’s University, Kingston Onatrio K7L 3N6, Canada
共Received 13 January 2008; revised 19 July 2008; accepted 21 July 2008兲
This paper presents an analysis of facial motion during speech to identify linearly independent
kinematic regions. The data consists of three-dimensional displacement records of a set of markers
located on a subject’s face while producing speech. A QR factorization with column pivoting
algorithm selects a subset of markers with independent motion patterns. The subset is used as a basis
to fit the motion of the other facial markers, which determines facial regions of influence of each of
the linearly independent markers. Those regions constitute kinematic “eigenregions” whose
combined motion produces the total motion of the face. Facial animations may be generated by
driving the independent markers with collected displacement records.
© 2008 Acoustical Society of America. 关DOI: 10.1121/1.2973196兴
PACS number共s兲: 43.70.Aj, 43.70.Bk, 43.70.Jt 关BHS兴 Pages: 2283–2290
J. Acoust. Soc. Am. 124 共4兲, October 2008 0001-4966/2008/124共4兲/2283/8/$23.00 © 2008 Acoustical Society of America 2283
pirical modeling strategies using independent component phing problem 共Knappmeyer et al., 2003兲. The identification
analysis 共ICA兲 have also been proposed 共Müller et al., 2005兲. of key features and spatial regions is one form of solution to
In a previous work 共Lucero et al., 2005兲, an empirical this correspondence problem.
model was introduced that was based on decomposing the This empirical modeling approach is close to the articu-
face surface into a finite set of regions. The total motion of latory modeling work of Badin, Bailly, et al. 共Badin et al.,
the face was computed as the linear combination of the 2002; Beautemps et al., 2001; Engwall and Beskow, 2003兲.
movement of those regions. The model was built by analyz- In their work, PCA is used to determine articulatory param-
ing the recorded 3D position of a set of markers placed on a eters to control the shape of a 3D vocal tract and face model.
subject’s face, while producing a sequence of sentences. An For better correspondence to the underlying biomechanics,
algorithm grouped the markers into a set of clusters, which some of the parameters 共e.g., jaw height, lip protrusion, etc.兲
are defined a priori, and their contributions are subtracted
had one primary marker and a number of secondary markers
from the data before computing the remaining components.
with associated weights. The displacement of each secondary
The present approach proposes to rely entirely on the data to
marker was next expressed as the linear combination of the
build the model, with as few prior assumptions as possible.
displacements of the primary markers of the clusters to
Instead of setting a model by defining the biomechanical
which it belonged. The model was next used to generate properties of skin tissue and muscle structure based on a
facial animations, by driving the primary markers and asso- priori theoretical reasons, a possible model is inferred just by
ciated clusters with collected data. looking at the measured motion patterns of the facial surface.
It was argued that the computed cluster structure repre- The algorithm presented by Lucero et al. 共2005兲 had a
sented the degrees of freedom 共DOF兲 of the system. The preliminary nature and had some drawbacks. For example, it
DOF are the independent modes of variation and thus the was necessary to define an initial facial mesh, linking nodes
independent sources of information. Subjects differ in the corresponding to the initial position of the markers and leav-
information transmission from their faces and this may be ing “holes” for the eyes and mouth. Also, a more solid foun-
due in part to differences in their DOF. In the model, the dation for the criteria for grouping the markers was desired.
facial clusters define facial eigenregions, whose combined In the present paper, an improved version is introduced,
motion forms the total motion of the facial surface. which uses a QR factorization technique 共Golub and Loan,
This empirical model and the PCA 共or ICA兲 approach 1996兲 to identify a linearly independent subset of facial
are based on linear modeling of the data which can result in markers. This subset is next used as a basis to predict the
similar levels of accuracy in reconstruction of the data. How- displacement of arbitrary facial points.
ever, the resulting solutions are quite different and serve dif-
ferent purposes. The PCA approach extracts global structure
and will produce a set of primary gestures. Its DOF are the II. DATA
functional modes of deformation of the face. In the present
case, the aim is to identify a set of spatially distinct regions The data consist of the 3D position of 57 markers dis-
that move independently as the DOF. There presumably is a tributed on a subject’s face, recorded with a Vicon equipment
mapping between these two representations and it could be 共Vicon Motion Systems, Inc., Lake Forest, CA兲 at a 120 Hz
hypothesized that the nervous system must know what to sampling frequency. According to calibration data, the posi-
control in its anatomical DOF to produce its gestural basis tion measures were accurate within 0.5 mm. An additional
set. six markers on a head band were used to determine a head
There are a number of advantages to focusing on the coordinate system, with the positive directions of its axes
spatial DOF. This approach focuses the data analysis on the defined as: the x axis in the horizontal direction from right to
generating mechanism for gestures: The facial musculature. left, the y axis in the vertical direction to the top, and the z
The action of muscles is obviously spatially concentrated axis in the protrusion direction to the front. The position of
the 57 facial markers was expressed in those coordinates,
and thus facial regions can be found that are associated with
with the approximate location shown in Fig. 1. Neither the
individual muscles or synergies of muscles that are in close
morphology of subjects’ faces nor the precise placement of
proximity or whose actions are spatially localized. This
the markers is exactly symmetrical and thus the markers
muscle-based approach is consistent with a productive tradi-
seem to be offset from the facial features in the schematic
tion in the analysis of facial expression and the study of
and symmetrical face in the figure.
perception of expressions 共Ekman et al., 2002兲 and this ap- Data were recorded from 2 subjects 共S1 and S2兲, while
proach has also been a powerful tool in facial animation they were producing selected sentences from the Central In-
共e.g., Terzopoulos and Waters, 1990兲. Another advantage of stitute for the Deaf Everyday sentences 共Davis and Silver-
finding regional DOF in a data set is that these regions can man, 1970兲, listed in http://www.mat.unb.br/lucero/facial/
be animated in arbitrary facial configurations. There is con- qr2.html. A large portion of the data for the marker at the left
siderable interest in speech perception research on the role of upper eyelid 共marker 13兲 of subject S1 was missing, due to
individual talker characteristics in speech perception 共e.g., recording errors. Since the motion patterns of right and left
Goldinger, 1996兲. Studies that involve the use of animating a upper eyelids seemed very close 共by visual assessment兲, the
generic face or the animation of one talkers morphology with missing data of the left upper eyelid were copied from the
another talkers motion must solve a registration and mor- marker at the other eyelid. It will be shown later that the
2284 J. Acoust. Soc. Am., Vol. 124, No. 4, October 2008 J. C. Lucero and K. G. Munhall: Facial motion patterns during speech
how far Ak is from being a singular matrix. Consequently, the
8
4
7
6 smaller k, the more independent the subset Ak. In principle,
2 5
3 9 the subset selection problem could be solved by testing all
1 13
11
10 possible combinations of k columns from the total of n col-
14
12 16 27
umns of A. The number of possible combinations is
17 15
26 n! / 关k!共n − k兲!兴, which could be prohibitively large. In the
y (mm)
18 22 25
2021 30 24
19 23 present case, with n = 57 and adopting k = 10 as an example,
28 31 32 33 53
42 29 35 36
45 37 38 51 52 54 the number of combinations is 4.32⫻ 1010. So far, it appears
43
44 34 41 39 that an exhaustive search is the only means to compute the
46 50
40
47
48
49 57 optimal solution to the problem 共Lawson and Hanson, 1987;
55
Björck, 2004兲.
56 A possible solution to the subset selection problem is
provided by the algorithm of QR factorization with column
0 50
x (mm) pivoting 共Golub and Loan, 1996; Chan and Hansen, 1992兲.
That algorithm decomposes A in the form A⌸ = QR, where
FIG. 1. Spatial distribution of the marker positions superimposed on a sche-
⌸ 苸 Rn⫻n is a column permutation matrix, Q 苸 Rm⫻n is an
matic face.
orthogonal matrix, and R 苸 Rn⫻n is an upper triangular ma-
trix with positive diagonal elements1 The first column of the
algorithm detected this artifact within the data, which serves permutated matrix A⌸ is just the column of A that has the
as a proof of its ability to analyze kinematic patterns. largest 2-norm 共euclidean norm兲. The second column of A⌸
A total of 40 sentences was recorded from subject S1 is the column of A that has the largest component in a direc-
and 50 sentences from subject S2. In the case of S2, the set tion orthogonal to the direction of first column. In general,
of 50 sentences was recorded twice, forming two datasets the kth column of A⌸ is the column of A with the largest
which will be denoted as S2a and S2b. In the recording ses- component in a direction orthogonal to the directions of first
sions, the subjects were asked to adopt a consistent rest po- k − 1 columns 关or, equivalently, is the column of A which has
sition at the beginning of each sentence. The initial positions maximum distance from the subspace spanned by the first
of the markers were taken as representative of a rest 共neutral兲 k − 1 columns 共Björck, 2004兲兴. The diagonal elements of R
configuration. 共rkk兲, also called the R values, measure the size of those
orthogonal components; they appear in decreasing order for
III. THE SUBSET SELECTION PROBLEM k = 1 , . . . , n, and tend to track the singular values of matrix A.
This approach for building an empirical facial model is Thus, the algorithm reorders the columns of A to make its
based on the so-called subset selection problem of linear al- first columns as well conditioned as possible. The first k
gebra 共Golub and Loan, 1996兲. Assume a given data matrix columns of A⌸ may be then adopted as the sought subset of
A 苸 Rm⫻n and the observation vector b 苸 Rm⫻1, with m 艌 n, k least dependent columns.
and that a predictor vector x is sought in the least squares Another useful property of the above algorithm is that it
sense, which minimizes 储Ax − b储22. Assume also that the data reveals the rank of matrix A. Let us define the following
matrix A derives from observations of redundant factors. block partitions for R,
Therefore, instead of using the whole data matrix A to pre-
dict b, it may be desirable to use only a subset of its columns,
so as to filter out the data redundancy. The problem is, then,
R= 冋 R11 R12
0 R22
册, 共1兲
how to pick the nonredundant columns. In the present case of where R11 苸 Rk⫻k, and the dimensions of the other blocks
facial modeling, the data matrix will contain the displace- match accordingly. If rank共A兲 = k ⬍ n, then R22 = 0. Letting i,
ments of the 57 facial markers, arranged in columns. A small for i = 1 , . . . , n be the singular values of A, with 1 艋 2
subset of markers 共columns of the data matrix兲 must be se- 艋 ¯ 艋 n, it may be shown that k+1 艋 储R22储2. Therefore, a
lected which may be used to predict the motion of any other small value of 储R22储2 implies that A has at lest n − k + 1 small
arbitrary facial point. singular values, and thus A is close to being rank k 共Chan,
The idea of reducing the data set is consistent with the 1987兲. As a tolerance value, ⑀, it is usual to consider the level
long-held view that the speech production system is itself of uncertainties or precision of the data. Hence, a value of
low-dimensional. As Bernstein 共1967兲 suggested about motor 储R22储2 艋 ⑀ implies that A has ⑀-rank2 k.
control in general, the nervous system acts to reduce the A number of other algorithms have been proposed to
potential DOF. In speech the muscles and articulators are find solutions to the subset selection problem, based, for in-
coupled synergistically during articulation in order to pro- stance, on singular value decomposition 共Golub and Loan,
duce particular sounds. 1996兲, search techniques exploiting partial ordering of the
To solve the subset selection problem, the most linearly variables, stepwise regression algorithms 共Lawson and Han-
independent columns of matrix A must be identified. Let Ak son, 1987兲, and backward greedy algorithms 共de Hoog and
denote a subset of k columns of A. A measure of “indepen- Mattheij, 2007兲. However, the QR factorization offers a
dency” of the subset is provided by the smallest singular number of advantages 共e.g., Golub and Loan, 1996; Björck,
value of Ak, k, which measures the distance of Ak to the set 2004; Setnes and Babuska, 2001; Chan and Hansen, 1992;
of k-rank singular matrices, in the 2-norm. Thus, it indicates Bischof and Quintana-Ort, 1998兲: It is computationally sim-
J. Acoust. Soc. Am., Vol. 124, No. 4, October 2008 J. C. Lucero and K. G. Munhall: Facial motion patterns during speech 2285
0.35
pler and numerically robust. It detects the rank of the data
and provides a good technique for solving least squares prob- 0.3
lems, as shown below. It has been used in numerous techni-
0.25
cal applications of the subset selection problem, including
fuzzy control systems 共Setnes and Babuska, 2001兲, neural 0.2
rkk/||R||F
network design 共Kanjilal et al., 1993兲, signal bearing estima- 0.15
tion 共Prasad and Chandna, 1991兲, noise control systems
共Ruckman and Fuller, 1995兲, very large scale integrated array 0.1
puted, so that A⌸ = QR, and let ⌸ = 关⌸1 ⌸2兴, where ⌸1 FIG. 2. R values 共diagonal elements of matrix R兲 normalized to the size of
苸 Rn⫻k, ⌸2 苸 Rn⫻n−k. The first k columns of A⌸ are there- R, for subjects S1 共circles兲, S2a 共stars兲, and S2b 共triangles兲.
fore given by A⌸1 and are selected as a subset of k indepen-
dent columns. An observation vector b may be predicted by
minimizing 储A⌸1x − b储22. Letting Qtb = 关c d兴T, where c In the case of subject S2 共datasets S2a and S2b兲, there is
苸 Rk⫻1, d 苸 Rn−k⫻1 then the minimizer may be easily com- no gap in the data, and rank共A兲 = 57. Thus, any model built
puted as the solution of the upper triangular system R11x = c from a subset of less than 56 or 57 markers will necessarily
共Golub and Loan, 1996兲. result in a loss of information.
In the present case, the k columns of A⌸1 will be used to Figure 3 shows the first 12 R values normalized to the
predict the remaining n − k columns of A, given by A⌸2, in size of matrix R, for subject S1, when varying the number of
the least squares sense. The problem may be expressed as the sentences in the dataset. The values stabilize for sets with
minimization of more than approximately 25 sentences. Any larger data set is
therefore reliable enough for building a model. The datasets
E = 兺 储A⌸1xi − 共A⌸2兲i储22 , 共2兲 for subject 2 produce similar results, and are therefore not
i shown.
where the subindex i represents each of the n − k columns of Table I shows the index of the first 16 columns 共or mark-
A⌸2. Using the Frobenius norm 共euclidean matrix norm兲, ers兲 selected by the algorithm, for the various datasets. A set
produces of 30 sentences was used in all analyses.
In all cases, the first selected marker is the 40th, at the
E = 储A⌸1X − A⌸2储F2 , 共3兲 center of the lower lip 共see Fig. 1兲, which has therefore the
where X is a k ⫻ 共n − k兲 matrix. Since the norm is invariant largest displacement 共largest 2-norm of the associated col-
under orthogonal transformations, then umn兲. The second is marker 34, at the lip’s left corner. From
the third marker, subject S1 shows a different pattern than
E = 储QTA⌸1X − QTA⌸2储F2 = 储R11X − R12储F2 + 储R22储F2 . 共4兲 subject S2. In the case of subject S1, the next four markers
Therefore, the least square minimizer is the solution of the are lip’s right corner 共38兲, right eyebrow 共2兲, upper lip center
upper triangular system R11X = R12, and the residual is 储R22储F. 共36兲, and left eyebrow 共6兲. Subject S2, on the other hand,
besides the lip’ right corner 共38兲, incorporates the eyelids 共13
or 11兲, and markers at the lower-right portion of the face 共42,
IV. ANALYSIS OF FACIAL DATA 43, 47, 48, or 56兲, depending on the dataset. The upper lip
marker 共6兲 appears later, in position 8th–10th. The four col-
The displacement of each marker was computed relative umns of results for subject S2 show similar markers, al-
to the initial neutral position. For each subject, the markers’ though some differences in the selected markers and the or-
displacements for all sentences where concatenated and ar-
ranged in a displacement matrix A 苸 R3M⫻N, where N is the
0.35
number of markers 共57兲 and M is the total number of time
samples of all the concatenated sentences. QR factorization 0.3
with column pivoting was then applied to data matrix A,
0.25
using a standard MATLAB implementation.
rkk /||R||F
2286 J. Acoust. Soc. Am., Vol. 124, No. 4, October 2008 J. C. Lucero and K. G. Munhall: Facial motion patterns during speech
TABLE I. Selected columns 共markers兲 of data matrix A. For S1, S2a, and
S2b, the first 30 sentences of each dataset were used. In case of S2a* and
S2b*, the last 30 sentences 共from a total of 50兲 of the respective datasets
were used.
1 40 40 40 40 40
2 34 34 34 34 34
3 38 13 13 13 11
4 2 38 38 42 38
5 36 47 48 38 56
6 6 42 47 43 42
7 20 6 11 47 6
8 49 56 36 11 47
9 11 11 42 36 13
10 52 36 49 6 36
11 54 48 6 49 12
12 47 39 53 12 48
13 22 52 20 48 39
14 32 20 39 16 16
15 39 49 56 20 53
16 48 14 14 52 20
J. Acoust. Soc. Am., Vol. 124, No. 4, October 2008 J. C. Lucero and K. G. Munhall: Facial motion patterns during speech 2287
FIG. 5. 共Color online兲 Four independent kinematic regions for subject S2a FIG. 7. 共Color online兲 First four orthogonal regions for subject S1. Each
and for a basis of ten markers. The darker the region, the larger the least plot shows facial points with patterns of motion in the first four orthogonal
square fit coefficient of each point relative to the main marker. A minus sign directions. The darker the region, the larger the motion. A minus sign indi-
indicates a subregion with negative weight. cates a subregion with negative weight.
lower lip and the left lip corner. The lip corner regions reflect may be produced by driving the selected independent mark-
the mouth widening-narrowing under the combined action of ers with collected signals. Letting P1 苸 Rn⫻k be the displace-
the orbicularis oris, zygomatic major, and other perioral ment matrix of the k main markers 共relative to the initial
muscles at each side of the face. Naturally, even though mo- neutral position兲, then the displacement P2 of the secondary
tion of both lip corners might be strongly correlated, they are markers is just P2 = P1X. The neutral position of all markers
associated with different regions because their motion is con- is next added back, to obtain their position in head coordi-
trolled by different groups of muscle and happens in opposite nates. Finally, the position of other arbitrary facial points
directions. This is precisely one of the intended objectives: may be generated by using, e.g., cubic interpolation. Using
rather than identifying particular facial gestures such as a this technique and the results of Table I, animations were
mouth widening/narrowing action, the algorithm identifies produced for sentences 31–40 for subject S1, and 31–50 for
spatial regions associated with different muscle groups. subject S2. A virtual facial surface was generated by cubic
interpolation of the recorded markers, built from a grid of
V. COMPUTER GENERATION OF FACIAL ANIMATIONS
30⫻ 30 points. The animations look visually realistic, with-
After the main markers and fitting matrix X have been out any noticeable distortion in the motion pattern. They are
computed, facial animations of arbitrary speech utterances available in http://www.mat.unb.br/lucero/facial/qr2.html in
AVI format. Figure 8 shows an example of an animation
frame.
Figure 9 shows the trajectory of marker 56 at the jaw for
subject S1 in one sentence 共31兲, when using a basis of ten
FIG. 6. 共Color online兲 Four independent kinematic regions for subject S2b,
and for a basis of ten markers. The darker the region, the larger the least
square fit coefficient of each point relative to the main marker. A minus sign
indicates a subregion with negative weight. FIG. 8. 共Color online兲 Example of a facial animation frame.
2288 J. Acoust. Soc. Am., Vol. 124, No. 4, October 2008 J. C. Lucero and K. G. Munhall: Facial motion patterns during speech
2 VI. CONCLUSION
0
x (mm)
The paper has shown that the QR factorization with col-
umn pivoting algorithm provides a convenient technique for
facial motion analysis and animation. It identifies a subset of
independent facial points 共markers兲, which may be used to
10 build an individualized linear model of facial kinematics.
Each of the independent points defines a kinematics region,
0
y (mm)
0.6
tural information by revealing the rigid body segments and
0.4 their dimensions as well as showing where the segment
joints are. The face is a unique biomechanical system and the
0.2
analysis of its soft tissue deformations is different from
0 studying articulated motion like locomotion. However, mo-
0 10 20 30 40 50 60
k tion patterns of the face can reveal its structural form. In this
sense, the proposed technique is carrying out biomechanical
FIG. 10. Mean error of computed trajectories for subject S1 共sentences 31 to
40兲 vs number of markers in the selected basis. Top: absolute error. Bottom: analyses of the face. For each individual, it is letting the
relative error. motions define what regions of the face move as an indepen-
J. Acoust. Soc. Am., Vol. 124, No. 4, October 2008 J. C. Lucero and K. G. Munhall: Facial motion patterns during speech 2289
dent unit, what the boundaries of the surface regions are, and Engwall, O., and Beskow, J. 共2003兲. “Effects of corpus choice on statistical
articulatory modeling,” in Proceedings of the 6th International Seminar on
where the spatial peak of motion is located. This can be seen
Speech Production, edited by S. Palethorpe and M. Tabain, pp. 1–6.
as a lumped representation of the muscular actions and their Goldinger, S. D. 共1996兲. “Words and voices: Episodic traces in spoken word
influence on the facial tissue biophysics. identification and recognition memory,” J. Exp. Psychol. Learn. Mem.
Many aspects of the technique require further improve- Cogn. 22, 1166–1183.
ment; for example, a criteria to determine the appropriate Golub, G. H., and Loan, C. F. V. 共1996兲. Matrix Computations, 3rd ed. 共The
Johns Hopkins University Press, Baltimore兲.
number of independent markers to be selected is needed. An Kalra, P., Mangili, A., Thalmann, N. M., and Thalmann, D. 共1992兲. “Simu-
important issue that must be also considered is that the this lation of facial muscle actions based on Rational Free Form Deforma-
modeling approach is dependent on the data captured by the tions,” Comput. Graph. Forum 11, 59–69.
finite set of facial markers. Therefore, building a successful Kanjilal, P., Dey, P. K., and Banerjee, D. N. 共1993兲. “Reduced-size neural
networks through singular value decomposition and subset selection,”
facial model would require researchers to cover a subject’s
Electron. Lett. 29, 1516–1518.
face densely enough to capture all details of its kinematic Knappmeyer, B., Thornton, I. M., and Blthoff, H. H. 共2003兲. “The use of
behavior, or require researchers to place a smaller number of facial motion and facial form during the processing of identity,” Vision
markers at optimal positions. Those and related issues are Res. 43, 1921–1936.
currently being considered as next research steps. Kuratate, T., Munhall, K. G., Rubin, P. E., Vatikiotis-Bateson, E., and Yehia,
H. 共1999兲. “Audio-visual synthesis of talking faces from speech produc-
tion correlates,” in Proceedings of the 6th European Conference on Speech
ACKNOWLEDGMENTS Communication and Technology 共EuroSpeech99兲, Vol. 3, pp. 1279–1282.
Kuratate, T., Yehia, H., and Vatikiotis-Bateson, E. 共1998兲. “Kinematics-
This work has been supported by MCT/CNPq 共Brazil兲, based synthesis of realistic talking faces,” in International Conference on
the National Institute of Deafness and Other Communica- Auditory-Visual Speech Processing (AVSP’98), edited by D. Burnham, J.
tions Disorders 共Grant DC-05774兲, and NSERC 共Canada兲. Robert-Ribes, and E. Vatikiotis-Bateson 共Causal Productions, Terrigal-
Some results were presented in partial form at the 7th Sydney, Australia兲, pp. 185–190.
Lawson, C. L., and Hanson, R. J. 共1987兲. Solving Least Squares Problems
International Seminar on Speech Production held in
共SIAM, Philadelphia兲.
Ubatuba, Brazil, in December 2006, and are available in the Lee, Y., Terzopoulos, D., and Waters, K. 共1995兲. “Realistic modeling for
online proceedings at http://www.cefala.org/issp2006/cdrom/ facial animation,” Comput. Graph. 29, 55–62.
mainគindex.html. We are grateful to Dr.A. R. Baigorri for Lorenzelli, F., Hansen, P. C., Chan, T. F., and Yao, K. 共1994兲. “A systolic
useful discussions on statistical aspects of the algorithms and implementation of the Chan/Foster RRQR algorithm,” IEEE Trans. Signal
Process. 42, 2205–2208.
graduate student E. V. Fortes for his help with some of the Lucero, J. C., and Munhall, K. G. 共1999兲. “A model of facial biomechanics
numerical computations. for speech production,” J. Acoust. Soc. Am. 106, 2834–2842.
Lucero, J. C., Maciel, S. T. R., Johns, D. A., and Munhall, K. G. 共2005兲.
1 “Empirical modeling of human face kinematics during speech using mo-
That is the “thin” version of the factorization. In the full version Q
苸 Rm⫻m and R 苸 Rm⫻n. tion clustering,” J. Acoust. Soc. Am. 118, 405–409.
2
The ⑀-rank of a matrix A is defined as min储A−B储艋⑀rank共B兲 and may be Migliore, M. D. 共2006兲. “On the role of the number of degrees of freedom of
interpreted as the number of columns of A that are guaranteed to remain the field in MIMO channels,” IEEE Trans. Antennas Propag. 54, 620–628.
linearly independent for any perturbation to A with norm less or equal ⑀ Müller, P., Kalberer, G. A., Proesmans, M., and Van Gool, L. 共2005兲. “Re-
共Chan and Hansen, 1992兲. alistic speech animation based on observed 3-D face dynamics,” IEE Proc.
Vision Image Signal Process. 152, 491–500.
Munhall, K. G., and Vatikiotis-Bateson, E. 共1998兲. “The moving face during
Badin, P., Bailly, G., and Revéret, L. 共2002兲. “Three-dimensional linear
speech communication,” in Hearing By Eye, Part 2: The Psychology of
articulatory modeling of tongue, lips, and face, based on MRI and video
Speechreading and Audiovisual Speech, edited by R. Campbell, B. Dodd,
images,” J. Phonetics 30, 533–553.
Beautemps, D., Badin, P., and Bailly, G. 共2001兲. “Linear degrees of freedom and D. Burnham 共Taylor & Francis Psychology, London兲.
in speech production: Analysis of cineradio- and labio-film data and Parke, F., and Waters, K. 共1996兲. Computer Facial Animation 共AK Peters,
articulatory-acoustic modeling,” J. Acoust. Soc. Am. 109, 2165–2180. Wellesley兲.
Bernstein, N. 共1967兲. The Co-ordination and Regulation of Movements 共Per- Pitermann, M., and Munhall, K. G. 共2001兲. “An inverse dynamics approach
gamon, Oxford兲. to face animation,” J. Acoust. Soc. Am. 110, 1570–1580.
Bischof, C. H., and Quintana-Ort, G. 共1998兲. “Computing rank-revealing Prasad, S., and Chandna, B. 共1991兲. “Direction-of-arrival estimation using
QR factorization of dense matrices,” ACM Trans. Math. Softw. 24, 226– rank revealing QR factorization,” IEEE Trans. Signal Process. 39, 1224–
253. 1229.
Björck, A. 共2004兲. “The calculation of linear least squares problems,” Acta Ruckman, C. E., and Fuller, C. R. 共1995兲. “Optimizing actuator locations in
Numerica 13, 1–53. active noise control systems using subset selection,” J. Sound Vib. 186,
Chan, T. F. 共1987兲. “Rank revealing QR factorizations,” Linear Algebr. 395–406.
Appl. 88/89, 67–82. Setnes, M., and Babuska, R. 共2001兲. “Rule base reduction: some comments
Chan, T. F., and Hansen, P. C. 共1992兲. “Some applications of the rank re- on the use of orthogonal transforms,” IEEE Trans. Syst. Man Cybern., Part
vealing QR factorization,” SIAM 共Soc. Ind. Appl. Math.兲 J. Sci. Stat. C Appl. Rev. 31, 199–206.
Comput. 13, 727–741. Terzopoulos, D., and Waters, K. 共1990兲. “Physically-based facial modeling,
Davis, H., and Silverman, S. R., eds. 共1970兲. Hearing and Deafness, 3rd ed. analysis, and animation,” J. Visual. Comp. Animat. 1, 73–80.
共Holt, Rinehart and Winston, New York兲. Troje, N. F. 共2002兲. “Decomposing biological motion: A framework for
de Hoog, F. R., and Mattheij, R. M. M. 共2007兲. “Subset selection for matri- analysis and synthesis of human gait patterns,” J. Vision 2, 371–387.
ces,” Linear Algebr. Appl. 422, 349–359. Vatikiotis-Bateson, E., and Yehia, H. 共1996兲. “Physiological modeling of
Ekman, P., Friesen, W. V., and Hager, J. C. 共2002兲. The Facial Action Cod- facial motion during speech,” Trans. Tech. Com. Psycho. Physio. Acoust.
ing System, 2nd ed. 共Research Nexus eBook, Salt Lake City兲. H-96-65, 1–8.
2290 J. Acoust. Soc. Am., Vol. 124, No. 4, October 2008 J. C. Lucero and K. G. Munhall: Facial motion patterns during speech