You are on page 1of 4

Using 2-D and 3-D Ellipsoid Fitting for Head and Body

Segmentation and Head Tracking


Nikos Grammalidis, St. Member IEEE and Michael G.Strintzis, Senior Member IEEE 
Department of Electrical Engineering, University of Thessaloniki
Thessaloniki 540 06, GREECE
Email:ngramm@panorama.ee.auth.gr, strintzi@eng.auth.gr

ABSTRACT them to a set of \eigenfaces" i.e. a set of eigenvectors


obtained using a large database of normalized face
In this paper, a novel procedure is presented for seg- images.
menting a general 3-D wireframe model obtained from
a head and shoulders multiview sequence. The pro- In [3], a 3-D ellipsoidal model is used for robust
cedure consists of two steps. In the rst step, two el- tracking of the rigid motion of the human head from
lipses corresponding to the head and the body of the a video sequence. This approach is based on the inter-
person are identi ed based on ellipse tting of the out- pretation of the optical ow in terms of 3-D motion of
line of the person in each image. The tting is based the model. The method is seen to be robust to large
on a direct least squares method using the constraint head movements and to provide better results than
that forces a general conic to be an ellipse. In order those obtained using a simpler planar head model.
to achieve head/body segmentation, a K-means algo-
rithm is used to minimize the tting error between the In the present paper we shall assume a typical
points and the two ellipsoids. In the second step, an video-conferencing scene, with one person in front of
3-D ellipsoid model corresponding to the head of the the cameras. An initial 3-D wireframe may be pro-
person is identi ed using an extension of the above duced using depth estimation from stereoscopic or
method. Thus a coarse approximation of the initial multiple views, provided that the camera geometry
wireframe requiring minimal bitrate can be obtained. and calibration parameters are known.
The proposed technique is fast and yields very accu- Furthermore, we assume that the outline of the per-
rate results. son is available using some preprocessing technique.
In the test sequences, used for our experimental re-
I Introduction sults, this outline is easily available since the back-
ground is homogeneous. A k-means algorithm, which
Estimation of the position, shape and motion of the is able to identify ellipses in the image, was then used
head from image sequences has been the focus of a for simultaneous ellipse tting and segmentation of
signi cant amount of recent research [1, 2, 3]. the outline. Speci cally, a least squares t of the
In [1], a face segmentation and identi cation algo- data on a conic is performed, using an additional con-
rithm, utilizing the elliptical structure of the human straint, forcing the conic to be an ellipse. The algo-
head is presented. An ellipse is tted to a properly rithm is applied to estimate two ellipses, each corre-
preprocessed edge map, in order to mark the bound- sponding to the head and body of the person.
ary between the head and the background regions. A The basic ellipse tting technique was also extended
similar approach is used in [4], where a 2-D ellipse is to the three dimensional case. This 3-D ellipsoid t-
tted onto binary edge data and post-processing tech- ting technique was applied so as to estimate 3-D el-
niques are used to eliminate detection errors. How- lipsoids for the head of a person. Speci cally, nodes
ever, such approaches work well only with when the of the 3-D wireframe that are projected inside each of
background is homogeneous and lightly to moderately the estimated ellipses in the previous step are used to
cluttered. estimate the 3-D ellipsoid model.
In [2], a model-based coding technique is proposed
for facial image coding. The detection of the face is Because of its high computational eciency, this
based on a parametric image model obtained using technique can be used for fast tracking of the head
a Karhunen-Loeve decomposition. The detected face and body position and orientation. Since the segmen-
images are then normalized and coded by projecting tation of the nodes is only performed at the rst time
 This work was supported by the European project ACTS instant, only the estimation of the 3-D ellipsoid from
092: PANORAMA and the PENED project of the Greek Sec- the available data points has to be determined at sub-
retariat of Science and Technology sequent instants.
II Estimating the initial 3-D For the 2-D case, a direct solution to the problem
wireframe model was presented in [10]. The equation of the 2-D conic
is
In order to estimate the initial 3-D wireframe model, F (a; x) = a  x = ax2 + bxy + cy2 + dx + ey + f = 0 (3)
a multiocular system with N fully calibrated views is
assumed. Each camera ck ; k = 1; : : : ; N is modeled where a = [a b c d e f ], x = [x2 xy y2 x y 1] and
using a pinhole camera model based on perspective the constraint to lead to an ellipse is 4ac ; b2 > 0.
projection and we assume that accurate calibration Since a is a free parameter, the equality constraint
information is available. 4ac ; b2 = 1 may be imposed instead. This in turn
A number of methods have been developed for may be be written as
depth estimation in stereoscopic [5, 6] and multiview 2 3
[7, 8] image sequences. 0 0 2 0 0 0
6 0 ;1 0 0 0 0 7
In this paper, we present results using the depth 6 7
aT Ca = aT 66 20 00 00 00 00 00 77 a = 1 (4)
6 7
estimation and wireframe estimation method from
trinocular image sequences developed in the ACTS 6
4 0 0 0 0 0 0 5
7
092 PANORAMA project. More speci cally, depth is
estimated from a fully calibrated trinocular image pair 0 0 0 0 0 0
using the depth estimation algorithm in [8]. Further- Using the above constraint, the following constrained
more, a reliability value is also estimated for depth tting problem is formulated [10]:
estimate. A set of 3-D points where depth estimation
with high reliability is then identi ed, and a triangu- Problem 1 Minimize E = PNi F (a; xi ) = jjDajj
2 2

lar 3-D wireframe model R is then tted using the subject to the constraint aT Ca = 1
=1

above points as control points [9].


where D = [x x    xN ]T . This problem can be
1 2
directly solved using a Lagrange multiplier, providing
III Approximation of a 3-D the following solution:
data set using an 3-D Ellip- a is the generalized eigenvector of
soid Model
In [10], a new ecient method was presented for t-
DT Da = Ca (5)
ting ellipses to scattered 2-D data. The method is corresponding to the smallest positive eigenvalue.
based on tting the general equation of a conic to the
given data, subject to a constraint which forces the In [10], it is proved that exactly one eigenvalue of eq.
conic to be an ellipse. In this paper, we develop an (5) is positive, thus a unique eigenvector as is always
segmentation and estimation scheme using this tech- found using the above method, which is non-iterative
nique in a K-means algorithm. Furthermore, we ex- and thus extremely ecient.
tend this technique for 3-D data, so that it can be used Even though this approach is not easily extended
for approximating a general set of 3-D points using a into the general N -D problem, a generalization suit-
3-D ellipsoid. able for the 3-D case will now be formulated. The
The general equation of a conic in the N -D space equation of a 3-D conic is:
is: F (a; x) = ax = ax2 +bxy+cy2+dxz +fz 2+gz +hy+kz +l = 0
F (A; b; c; x) = xT Ax + bT x + c = 0 (1) (6)
where A is a symmetric matrix. In order for this where a; x are 10-dimensional vectors, can be written
general conic to be an ellipsoid in N -dimensions, the in the form of eq. (1), where
matrix A must be either positive or negative de nite. 2
a b=2 d=2
3
 
As a result, the problem of tting an ellipsoid into N A = b=2 c e=2 = vT f ;
4 5 U v
data points xi can be solved by minimizing the sum d=2 e=2 f
of squares of the \algebraic distances"  
N
X U = b=a2 b=c2 ; v = [d=2 e=2]T :
d(A; b; c) = F (A; b; c; x)
2
(2)
i=1 The following Lemma, proved in the Appendix, pro-
vides the constraint forcing a general 3-D conic to be
subject to the constraint that A is either positive or an ellipsoid.
negative de nite. In general, this constrained prob-
lem is very dicult to solve since the Kuhn-Tucker Lemma 1 A is positive de nite or positive negative,
conditions [11] do not guarantee a solution [10]. provided that
i. det(U) > 0 () 4ac ; b2 > 0 V Experimental Results
ii. (a + c)  det(A) > 0
Results were obtained using the 3-view sequence
The rst condition of Lemma 1 is very similar to \Ludo-3" 1 . The initial 3-D wireframe is illustrated
the constraint 4ac ; b2 > 0 used in the 2-D case. in Figure 1. The approximation of the outline of the
Therefore, we can setup the problem of minimizing person with two ellipses corresponding to the head
the error E , as before using the constraint aT Ca = 1, and the body area is shown in Figure 2. In this case a
where: very simple initialisation was used: points lying higher
(lower) than the mean height of the outline are at-
2 3 tributed to the head (body) region. Convergence is
0 0 2 0 0 0 0 0 0 0 achieved within a small number of iterations, typi-
6
6 0 ;1 0 0 0 0 0 0 0 0 7
7 cally 3, as shown in Figure 3, each requiring less than
6
6 2 0 0 0 0 0 0 0 0 0 7
7 0.5 sec at an SGI workstation. Then, the nodes of the
6
6 0 0 0 0 0 0 0 0 0 0 7
7 3-D wireframe of the head that are projected inside
6
C = 66 0 0 0 0 0 0 0 0 0 0 7
7 (7) the head outline are used to estimate a 3-D ellipsoid
6 0 0 0 0 0 0 0 0 0 0 7
7 model for the head. This model is illustrated in Figure
6
6 0 0 0 0 0 0 0 0 0 0 7
7 4.
6
6 0 0 0 0 0 0 0 0 0 0 7
7
4 0 0 0 0 0 0 0 0 0 0 5
0 0 0 0 0 0 0 0 0 0
The solution is again obtained by solving Problem 1
using eq. (5) with the new C of (7). However, in this
case, after obtaining the solution, we have to check
whether the second condition of Lemma 1 is satis ed.
If we select the sign of the solution eigenvector so that
a + c > 0, then the second condition is simpli ed to
det(A) > 0.

IV Ellipse estimation and seg-


mentation using K-means
algorithm
A signi cant advantage of the method described in
the previous section is its low computational require-
ments. This feature may be exploited by designing Figure 1: The initial 3-D wireframe
an algorithm for simultaneous ellipse estimation and
segmentation of a set of 2-D points into ellipses.
A simple algorithm that can be used for this pur-
pose is a modi ed K-means algorithm, which is de-
scribed below. In this version, we consider an algo-
rithm using K = 2, which is suitable for identifying
the head and the body of a person from the person's
outline in one of the available views.
i. Initialization: Divide the available points into
K =2 sets, based on subsidiary information (e.g.
relative position of head and body).
ii. Fit an ellipse to each set, using the procedure
described in the previous section.
iii. Reassign each point to the ellipse for which the
distance F 2 (a; xi ) in eq. (3) between the point
xi and the ellipsoid is minimized. Points with Figure 2: The outline of the person (left view) and the
distances larger than a given threshold are not
assigned to any set. two estimated ellipses using the K-means algorithm
iv. End of the estimation procedure if the change
in the estimated ellipsoid parameters is below a 1 This sequence was provided by THOMPSON BROAD-
threshold else return to step ii. CAST SYSTEMS for the the ACTS 092 project PANORAMA.
0.015
fore, if det(A) < 0 then eq. (8) yields xT Ax < 0 for
x 6= 0, thus A is negative de nite, q.e.d.

References

Error E
0.010

[1] S. Sirohey, \Human Face Segmentation and Iden-


0.005 ti cation," Master's thesis, CV Laboratory, Uni-
versity of Maryland, College Park, MD, Novem-
1 2 3 4 5
Iterations

ber 1993.
Figure 3: Convergence of mean error E [2] B. Moghaddam and A. Pentland, \An Automatic
System for Model-Based Coding of Faces ," in
IEEE Data Compression Conference, (Snowbird,
Utah), March 1995.
[3] S. Basu, I. Essa, and A. Pentland, \Motion
Regularization for Model-based Head Tracking
," in Proceedings, International Conference on
Pattern Recognition, (Vienna, Austria), August
1996.
[4] A. Eleftheriadis and A. Jacquin, \Automatic face
location detection for model-assisted rate control
in h.261 compatible coding of video," Signal Pro-
cessing : Image Communication, vol. 7, pp. 435{
455, November 1995.
[5] D. Tzovaras, N. Grammalidis, and M. G.
Figure 4: Estimated 3-D ellipsoid model superim- Strintzis, \Disparity eld and Depth Map Cod-
posed on the head of \Ludo" (left view) ing for Multiview 3D Image Generation," to ap-
pear in Signal Processing : Image Communica-
VI Conclusions tion, 1998.
[6] D. Tzovaras, N. Grammalidis, and M. G.
An ecient technique to segment the head and body Strintzis, \Object-Based Coding of Stereo Im-
parts obtained from a head and shoulders multiview age Sequences Using Joint 3D motion/disparity
sequence and to estimate a 3-D ellipsoid model cor- Compensation," IEEE Trans. on Ciscuits and
responding to the head was presented. A coarse ap- Systems for Video Technology, vol. 7, pp. 312{
proximation of an initial wireframe requiring minimal 328, April 1997.
bitrate was thus obtained. [7] N. Grammalidis and M. G. Strintzis, \Dispar-
ity and Occlusion Estimation in Multiocular Sys-
Appendix tems and their Coding for the Communication of
Multiview Image Sequences," to appear in IEEE
Let x = [x y z ]T , u = [x y]T and uc = ;U;1 vz . Trans. on Ciscuits and Systems for Video Tech-
Then, nology, 1998.
xT Ax = [8] L. Falkenhagen, \Block-Based Depth Estimation
uT Uu + 2uT vz + fz2 = from Image Triples with Unrestricted Camera
(u ; uc )T U(u ; uc ) + z 2(f2 ; vT U;1 v) = Setup ," in IEEE Workshop on Multimedia Im-
(u ; uc )T U(u ; uc ) + detz(U) (fdet(U)  age Processing, (Princeton, NJ), June 1997.
[9] T. Riegel, R. Manzotti, and F. Pedersini, \3-d
;vT ;b= c ;b=2 v) = shape approximation for objects in multiview im-
2 a age sequences," in Proceedings of International
(u ; uc )T U(u ; uc ) + detz(2U) (acf ; fb2=4 Workshop on Synthetic-Natural Hybrid Coding
;cd2 =4 ; ae2=4 + bde=4) = and Three-Dimensional (3D) Imaging, (Rhodes,
T det(A)
(u ; uc ) U(u ; uc ) + z 2 det Greece), pp. 159{162, September 1997.
(U)
(8) [10] M. P. A. W. Fitzgibbon and R. Fisher, \Direct
Assuming that U is positive de nite, (u ; Least Squares Fitting of Ellipses," in Proc. In-
uc)T U(u ; uc) > 0 when x 6= 0. Furthermore, ternational Conference on Pattern Recognition,
det(U) > 0 and tr(U) = a + c > 0, thus if det(A) > 0, (Vienna, Austria), August 1996.
then eq. (8) yields xT Ax > 0 for x 6= 0, thus A is [11] S. Rao, Optimization:Theory and Applications.
positive de nite. Wiley Eastern, 1984.
Similarly, if U is negative de nite, (u ; uc)T U(u ;
uc) < 0, det(U) > 0 and tr(U) = a + c < 0. There-

You might also like