You are on page 1of 6

The 11th IEEE International Workshop on

Advanced Motion Control


March 21-24, 2010, Nagaoka, Japan

Age and Gender Estimation by using Facial Image


Hironobu Fukai∗ , Hironori Takimoto† , Yasue Mitsukura∗ and Minoru Fukumi‡
∗ TokyoUniversity of Agriculture and Technology
Naka-cho 2-24-16,Koganei, Tokyo, Japan
Email: h fukai@cc.tuat.ac.jp, mitsu e@cc.tuat.ac.jp
† Okayama Prefectural University

Kuboki 111, Souja, Okayama, Japan


Email: takimoto@c.oka-pu.ac.jp
‡ The University of Tokushima

Minami-josanjima 1-1, Tokushima, Tokushima, Japan


Email: fukumi@is.tokushima-u.ac.jp

Abstract—In this paper, we propose age and gender estimation using averaged faces of people from 25 to 60 years old used
system by vairous features. Age and gender has a lot of charac- a method of focusing on face texture and shape. Ueki et al.
teristics. These characteristics are one of the difficult cognitive [7] reported a method of age-group classification by linear
process in human interaction. If we can extract the important
feature of this cognitive process, it is considered that the age and discriminant analysis (LDA). Takimoto et al. [8] proposed a
gender estimation by the machine becomes possible. Therefore, gender and age estimation technique that is not influenced
we propose a method of age and gender feature extraction and by posture changes by estimating a NN from several features
estimation using the face image. In this paper, the age of the face including the face texture and features.
means apparent-age that is based on the human perception of However, there are some problems with almost of all
age. Moreover, person’s aging and gender difference appear in
the faces. For example, the pigmented spot, the wrinkle, sagging conventional methods. The most important problem is that
skin, shape, color of skin, and so on. Thus, we extract these these methods are only roughly classified the person’s age.
several features for age and gender estimation. Furthermore, we For example, 20 years old and 29 years old are classified into
estimate a continuous age and gender using a neural network the same class but 19 years old and 20 years old are classified
(NN). into different classes when the estimated age is classified into
groups of ten year width using conventional techniques such
I. INTRODUCTION
as those mentioned above. However, 19 years old and 20 years
In our life, we use the age and gender information in the old are different by only one year old. This difference is an
various situation. In particular, we meet the person a lot in important problem in the area of age estimation. Thus, it is
daily life and we can take a smooth and flexible response assumed that the reliability of recognition accuracy is doubtful.
routinely by estimating age and gender. For example, if we saw Moreover, it is difficult to extract a suitable feature in tech-
the elder person, we behave politely. This is unique to humans niques for which feature extraction is necessary. Furthermore,
sense. It is unknown that why the human conjecture the age most of conventional method needs to detect a lot of feature
and gender, roughly. Therefore, we pay attention to this human point exactly. If it can’t, recognition accuracy falls. In addition,
ability, the same ability as human is given to the computer, age the above studies are based on estimating the actual age, and
and gender are estimated. By the way, age estimation methods there has been little research on human age perception.
based on images of face have been widely studied [1]-[9]. Then, we proposed novel age estimation system [9]. In this
In a study on the change in the physical shape of the face method, we reported that the proposed method can estimate
with age, Todd et al. [1], [2] indicate that the contour of the the apparent-age and gender by frequency feature of a face.
skull can be approximated by a cardioid transform. Yamaguchi However, it is considered that other various features are seen
et al. [3] confirm that the differences between the features of when we estimate the age, and other study reported the
an adult’s face and a child’s face include the length of the effectiveness of a various features [8].
face and the ratio of each part. Age estimation by computer has Also, it is considered that gender information is important
also been performed. Kanno et al. [4] show that a male can be characteristics of a person. Then, gender estimation methods
identified by neural networks representing four ages (12 years, based on images of face have been widely studied, too [3],[8].
15 years, 18 years, and 22 years). Kwon and Lobo [5] reported Moreover, these methods estimate gender by using similar
that the proposed method has been implemented to classify features of the age.
input images into one of three age-groups: babies, young Therefore, we propose an apparent-age estimation system
adults, and senior adults by using the placement information based on several features. In addition, we propose gender
and texture information. However, almost of all their studies estimation system by using age features. In this study, not
were based on cranio-facial development method and skin only the texture feature of the full face but also a local
wrinkle analysis. Burt and Perrett [6] studied age perception texture feature, color feature, shape of face feature et al.

978-1-4244-6669-6/10/$26.00 ©2010 IEEE 179


TABLE I
F EATURE CHANGE OF FACE BY AGING
shape color, texture
Forehead Hollow of temple Wrinkle
Retreat of borders of the hair Dullness
Eye Hollow of the above eyelid Dullness
Fall of tail of the eye Thining
Eyebrow Sparsity Undertint
Nose High -
Lip Blur of the outline Dullness
Fall of corners of the mouth Thick
Skin Sagging Wrinkle
Dullness
Fig. 1. Outline of the normalization method
Pgmented spot
Cheek Sagging Pigmented spot
Dullness
are used for the features. Moreover, the age features are Age spot
Jaw Sagging -
characterized by combining these features. Furthermore, we Choller
estimate a continuous age and gender using a neural network
(NN), because it is considered that age and gender estimation
that uses NN is effective from our previous study [9] and
conventional method [8]. For the purpose of showing the
effectiveness of the proposed method, computer simulations
are performed using the actual data.
II. PROPOSED METHOD
In this section, we describe the procedure of the proposed
method. The proposed method is constructed by 3 processings.
The 1st processing is normalization of the facial image. The
2nd processing is apparent-age and gender feature extraction.
The 3rd processing is apparent-age and gender estimation. We
explain these 3 processings, respectively.
Fig. 2. Feature poin
A. Normalization
It is necessary to normalize the face images for age and gen-
der estimation because the original images have unnecessary feature that we use in the age estimation, because these
features including background, clothes, and hair. In this study, techniques are not need for a lot of feature point, and the
we carry out normalization to extract only the face area. The effectiveness of the technique is reported [8]. Moreover, this
face image is normalized on the basis of both eyes because features are used for the gender feature.
it is easy to perform a normalization based on the eyes by
rotation and adjusting the size compared with normalization 1) Shape feature
based on other facial features. The normalization method is as 2) Frequency feature
follows. First, the original image is changed into an 8 bit gray 3) Texture feature
scale image, and a median filter is used in order to remove 4) Color feature
the noise. Next, the midpoint between both eyes is extracted. 1) Shape Feature: In a study on the change in the physical
Then, the line segment joining both eyes is rotated so that it shape of the face with age, Todd and coworkers [1], [2] show
is horizontal. Furthermore, the distance between both eyes is that the contour of the skull can be approximated by a cardioid
adjusted to 80 pixels by a scale change (Fig. 1). Moreover, transform. Yamaguchi et al. [3] show that the differences
in order to reduce the influence of hair and clothes, most of between the features of an adult’s face and a child’s face
the background is removed. That is, from the midpoint of the include the length of the face and the ratio of each part.
segment joining both eyes, we define the facial region to have Therefore, the shape of face parts and ratio of each part are
a width of 180 pixels (90 pixels to the left and right), and a used for a feature. In this study, we use the 28 feature point
height of 225 pixels (45 pixels above and 180 pixels below that defined as Fig. 2. Then, the shape features are extracted
the midpoint of the eyes). from 28 feature points. TableII shows 14 shape features.
2) Frequency Feature: From our previous study [9], it was
B. Apparent-age and Gender Feature Extraction reported that frequency feature is effective for age estimation.
It is known that the facial features are changed by aging. Then, we use the frequency feature for age feature. We obtain
TableI shows the main change of the aging. the frequency feature of full face by the empirical mode
There are many feature extraction methods in computer decomposition (EMD) [11], like our previous study. In this
vision society. We adopt following 4 features and chose the study, we focus on the change in gray value of the face image

180
TABLE II
S HAPE FEATURE
Feature
Eyebrow Thickness, Length
Eye Area, Angle
Nose Width
Mouth Length, Thickness, Corner
Shape Facial contour
Distance Jaw from eyes, Nose from eyes
Mouth from eyes, Mouth from nose Fig. 3. Sample of the frequency feature
Jaw from mouth

used to judge whether the signal h is an IMF.


in the horizontal direction.
EMD was proposed by Huang et al. [11] for processing non- ∑ |hold (n) − hnew (n)|2
SD = (2)
stationary functions. The technique decomposes signals into n
h2old (n)
components called intrinsic mode functions (IMFs) satisfying
the following two conditions. A typical value for SD can be set to under 0.2-0.3. Therefore,
an SD of 0.2-0.3 for the sifting procedure is a very rigorous
• The numbers of extrema and zero cross points must
limitation.
eigher be equal or differ by at most by one.
Then, we use a Hilbert transform for calculating the imag-
• At any point, the mean value of the envelope defined by
inary part. The Hilbert transform is used to obtain y(t) from
the local maxima and the envelope defined by the local
x(t) as follows:
minima is zero.
∫ ( )
The first condition is imposed to satisfy the narrow-band 1 ∞ x(t0 )
y(t) = dt0 . (3)
requirement and the second condition is necessary to ensure π −∞ t − t0
that the instantaneous frequency does not have redundant
fluctuations induced by asymmetric waveforms. IMFs are The analytic signal z(t) is
extracted by the sifting algorithm from the original signal x(n)
as follows: z(t) = x(t) + iy(t) = a(t)eiθ(t) , (4)
∑N
x(n) = ci (n) + r(n). (1) where a(t) is the amplitude of z(t), and θ(t) is its phase
i=1 given as follows:
In this equation, ci (n) is the ith IMF and r(n) is the residual √
signal. Then, we can describe the principle of EMD as follows: a(t) = x2 (t) + y 2 (t) (5)
1) The set of IMFs is initially defined as I = φ (empty ( )
set). y(t)
θ(t) = arctan . (6)
2) Identify all local extrema of h. x(t)

a) h = x − i∈I ci . The instantaneous frequency ω(t) is derived from θ(t).
b) Compute the k-th IMF (sifting).
i) Get upper and lower envelopes by connecting dθ(t)
ω(t) = (7)
the maximum and minimun points by cubic dt
spline interpolation (upper envelope is u and In the Hilbert transform, the signal should have a single
lower envelope is l). frequency or a narrow band of frequencies. The IMF is a
ii) Compute the mean of these envelopes. narrow band signal. Therefore, it is possible to analyze the
iii) Subtract h from the mean of these envelopes to IMF by the Hilbert transform. A signal that has a wider
get a new h. range of frequencies can be analyzed by applying the Hilbert
iv) Check whether h is an IMF or not. transform to each IMF decomposed by the EMD. A spectrum
v) If not, repeat steps i to v. that is calculated using the IMF is called a Hilbert-Huang
c) h is added to set I. spectrum, and it is defined by the following equation.
The sifting process is continued until the final residue is a {
a(t) (ω(t) = ω)
constant, a monotonic function, or a function with only one H(t, ω) = (8)
0 (otherwise)
maximum and one minimum from which no more IMFs can
be derived. IMFs are sequentially obtained from the element This spectrum does not have a trade-off relation between time
with the high frequency. Therefore, the residual signal r(t) and frequency resolutions. Therefore, it is considered that this
has the lowest frequency. spectrum can be effectively used in the analysis. Fig. 3 shows
In step 2.(b) of this algorithm, the following criterion is the sample of the frequency feature of face image.

181
Fig. 6. Extraction area of the color feature

Fig. 4. Example of nonlinear function


TABLE III
D ETAIL OF THE FACE IMAGE DATABASE
Size 640×480[pix.]
24 bit color
Gender 150 images for each
Age 30 images per 5 years
Emotion neutral

value of the unit. Moreover, output layer is adjusted to one to


estimate the gender, too.
Fig. 5. Result of ε-filter
III. COMPUTER SIMULATIONS

3) Texture Feature: Pigmented spot and wrinkle on skin A. Face Image Database
area are considered as minute change noise. Then, we use the
The face database was provided from the Human and Object
ε-filter for extract this feature [12]. The ε-filter is defined as
Interaction Processing (HOIP) organization in Japan [10]. The
follows:
subject images conprised people with a wide range of ages,

N
and we selected subjects that did not sport a pair of glasses.
y(n) = x(n) + ak F (x(n − k) − x(n)) (9) The background were made the same for all subjects. Subjects
k=−N
were directed to face the lens of the camera, and a picture was
|F (x)| ≤ ε : −∞ ≤ × ≤ ∞. (10) taken with a neutral expression (Table III). 252 people whose
images were preprocessed beforehand were used as subjects.
There is a difference of the I/O signal below ε when it is a In this paper, the face database was used with permission from
nonlinear function that F (x) shows in Fig.4. In this study, we Softopia corporation, Japan. It is prohibited to copy, to use,
use the ε-filter which has 7 size of window. Moreover, we and to distribute the images without the authorization of the
set ε value as 20. Example result of the ε-filter is shown in copyright holder.
Fig.5. We subtract the result image of the ε-filter from original
image and we make the histogram. We adopt these histogram
B. Apparent-age Database Creation
as texture features.
4) Color Feature: It is considered that the color information In this study, we note the human apparent-age, and the
has the correlation in the age. Then, we select the color feature apparent-age was defined as an age that a lot of person felt.
for age feature. In this study, we use the L∗ a∗ b∗ color system In this paper, the apparent-age is obtained by conducting a
that is the color system based on human perception. The color questionnaire with 58 respondents. It is considered that the
feature is extracted from the cheek area and the lip area, objectivity of the apparent-age, which people evaluate subjec-
respectively (Fig. 6). Furthermore, we define as the age feature tively, increases with the number of respondents. Moreover,
each part average and variance. the error of age estimation for each age can be reduced by
using respondents of various ages and both genders.
C. Apparent-Age and Gender Estimation In the questionnaire method, the respondent sees a face
We estimate the apparent-age and gender using a NN. We image and the apparent-age is given. The face images are
use a 3-layered NN. There is one hidden layer in this NN, and presented at random. The apparent-age is assumed to be the
the back propagation method is used for learning. In this paper, median value of an ages given by the respondents. In the
output layer is adjusted to one to enable continuous estimation proposed method, this age is adopted as the apparent-age for
of the age, and the estimated age is calculated from the output the teacher data.

182
TABLE IV
AGE ESTIMATION ERROR 80
apparent-age
Frequency feature 5.08 years 70 estimated age
Texture feature 5.69 years
Color feature 12.27 years 60
Shape feature 5.79 years
All feature 4.65 years 50

age
Human age perception 4.94 years
40
30
C. Conditions of Apparent-Age Estimation 20
In order to show the effectiveness of the proposed method, 10
we perform a simulation. In this paper, we use the actual data 0 20 40 60 80 100 120
that is provided from HOIP organization in JAPAN [10], and subject number
we use a sample size of only 113 men. Moreover, the feature
point and color area is extracted manually. Furthermore, we Fig. 7. Apparent-age and estimated age of all subjects
selected subjects that did not sport a pair of glasses. In this
simulation, we use the leave-one-out cross-validation method.
18
D. Apparent-Age Estimation Results and Discussions 17 proposed method
16 texture
Table IV shows the average estimated age error of each 15
14 freq

age estimation error


subject obtained by the questionnaire and sevral feature. The 13 shape
12
human age estimation ability is given as the average standard 11
deviation of the apparent-age of each face obtained by the 10
9
questionnaire. The estimation error for human perception is 8
4.94 years old. Moreover, the age estimation error using all 7
6
feature is 4.65 years old. This result is better than the our 5
4
previous study that uses frequency feature [9]. Furthermore, it 3
close to that of human beings. 2
1
Fig. 7 shows the apparent age and the estimated age of 0
each subject. The difference in the apparent age and estimated 10 20 30 40 50 60 70 80
age is few in most subjects. However, age estimation error is generation
considered as different in each generation. Therefore, we show
the age estimation error of each generation (Fig. 8), and we Fig. 8. Age estimation error of each generation (each feature)
compare the all feature result with other various features result.
From this figure, all feature age estimation error of 10’s, early
20’s, and early 70’s are smaller than the frequency feature simulation, we use the leave-one-out cross-validation method.
that is our previous study [9]. This resullt shows large error is We use features except the frequency feature.
decreased by using all feature. This fact is important for age
estimation. Moreover, Fig. 9 shows the age estimation error F. Gender Estimation Results and Discussions
of each generation by using all features. From this result, the Table V shows the recognition accuracy of gender esti-
age estimation is performed high accuracy in any generation. mation. This result shows that almost features are effective
Furthermore, Fig. 10 shows the recognition accuracy when for gender estimation. Though, result of using shape feature
we use the age estimation error as a threshold. When we use is better than the result of using all feature. Because, it is
the threshold as until 5 years old, the recognition accuracy considered that using all feature cause the contradictory.
is 70.8%, and this result has improved by 18% compared In addition, Table VI indicates a detail of the shape feature
with the case to use only frequency feature. This threshold result. From this result, both gender estimation is performed
means human’s age perception ability average. Moreover, well. Fig. 11 means the recognition accuracy of gender estima-
recognition accuracy of using all feature is close to that of tion on each generation. As a result, young person is difficult
human beings. From even this result, we think the proposed to estimate. Because, it is considered that adult male face is
method is effective for the age estimation. rugged, but female face and young male face is not.
E. Conditions of Gender Estimation IV. CONCLUSIONS
In order to show the effectiveness of the proposed method, In this paper, we proposed an apparent-age and gender
we perform more simulation. In this paper, we selected 252 estimation system based on several features. There are many
subjects that did not sport a pair of glasses. The number features of human for age estimation. In these features, we
of male is 113, and the number of female is 139. In this decide the important features for age and gender estimation by

183
7.5
7 proposed method male
6.5 female

recognition accuracy (%)


6 100
age estimation error

5.5
5 80
4.5
4 60
3.5
3
2.5 40
2
1.5 20
1
0.5
0 0
10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80
generation generation

Fig. 9. Age estimation error of each generation (proposed method) Fig. 11. Recognition accuracy of gender estimation (shape feature, each
generation)

100
In the future works, it is considered that the improvement
recognition accuracy (%)

of feature extraction method is necessary. Moreover, we think


80
the selection of feature and analyze human age perception is
important. Furthermore, we analyze the cause of estimation
60 error, and we verify the robustness of lighting condition. In
addition, we design the estimation system corresponding to
40 change in the direction of face.

proposed method R EFERENCES


20
human [1] J. T. Todd, L. S. Mark, R. E. Shaw and J. B. Pittenger: The perception of
frequency feature human growth, Scientific American Perception, Vol. 242, pp. 106-114,
0 1980.
0 5 10 15 20 [2] L. S. Mark, J. B. Pittenger, H. Hines, C. Carello, R. E. shaw and J.
T. Todd: Wrinkling and head shape as coordinated sources of age-level
threshold information, Perception & Psychophysics, Vol. 27, pp. 117-124, 1980.
[3] M. K. Yamaguchi, T. Kato, and S. Akamatsu: Relationship between
Fig. 10. Recognition accuracy of age estimation physical traits and subjective impressions of the face - Age and sex
information, IEICE Trans. Vol. J79-A, No. 2, pp. 279-287, 1996.
[4] T. Kanno, M. Akiba, Y. Teramachi, H. Nagahashi and T. Agui: Classi-
fication of age group based on facial images of young males by using
neural networks, IEICE Trans. Inf. Syst. Vol. E84-D, No. 8, pp. 1094-
using several feature extraction method. In this method, we can 1101, 2001.
obtain the efficient features for age and gender estimation by [5] Y. H. Kwon and N. D. V. Lobo: Age classification from facial images,
means of showing the effectiveness of the actual simulations. CVPR’94, pp.762-767, Seattle, US, June 1994.
[6] D. M. Burt and D. I. Perrett: Preception of age in adult caucasian male
Some of several features had average age estimation errors faces: computer graphic manipulation of shape and colour information,
close to that human beings. It was confirmed that the proposed Perception, Vol. 259, No. 1355, pp. 137-143, 1995.
method works well. [7] K. Ueki, T. Hayashida, and T.Kobayashi: Subspace-based age-group
classification using facial images under various lighting conditions, Proc.
IEEE Int. Conf. on Automatic Face and Gesture Recognition, pp. 43-48,
TABLE V 2006.
G ENDER ESTIMATION RECOGNITION ACCURACY ( EACH FEATURE ) [8] H. Takimoto, Y. Mitsukura, M. Fukumi and N. Akamatsu: A robust
gender and age estimation under varying facial pose, IEEJ Trans. Vol.
Texture feature 86.9% 127, No. 7, pp. 1022-1029, 2007.
Color feature 84.13% [9] H. Fukai, H. Takimoto, Y. Mitsukura, T. Tanaka, M. Fukumi: Apparent-
Shape feature 94.05% age Feature Extraction by Empirical Mode Decomposition, Journal of
All feature 89.68% Signal Processing, Vol. 12, No. 6, pp. 457-463, 2008.
[10] http://www.hoip.jp
[11] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shin, Q. Zheng, N.
TABLE VI -C. Yen, C. C. Tung and H. H. Liu: The empirical mode decomposition
G ENDER ESTIMATION RECOGNITION ACCURACY ( SHAPE FEATURE ) and the hilbert spectrum for nonlinear and non-stationary time series
analysis, Proc. R. Soc. London, A454, pp. 903-995, 1998.
male female
[12] H. Watabe, K. Arakawa, Y. Arakawa: A Nonlinear Digital Filter for
male 94.69% 5.31%
Beautifying Facial Images, Journal of Thee Dimensional Images, Vol.
female 6.47% 93.53%
13, No. 3, pp. 41-46, 2003.

184

You might also like