You are on page 1of 5

978-1-4244-5023-7/09/$25.

00 ©2009 IEEE
September 14-16, 2009
METU
Northern Cyprus Campus
646
Co-Occurrence based Statistical Approach for Face
Recognition
Alaa Eleyan, Hasan Demirel
Department of Electrical and Electronic Engineering, Eastern Mediterranean University
{alaa.eleyan, hasan.demirel}@emu.edu.tr
Abstract— this paper introduces a new face recognition
method based on the gray-level co-occurrence matrix
(GLCM). Both distributions of the intensities and information
about relative position of neighbourhood pixels are carried by
GLCM. Two methods have been used to extract feature vec-
tors from the GLCM for face classification. The first, method
extracts the well-known Haralick features to form the feature
vector, where the second method directly uses GLCM by con-
verting the matrix into a vector which can be used as a feature
vector for the classification process. The results demonstrate
that using the GLCM directly as the feature vector in the rec-
ognition process outperforms the feature vector containing the
statistical Haralick features. Additionally, the proposed
GLCM based face recognition system outperforms the well-
known techniques such as principal component analysis and
linear discriminant analysis.
Index Terms— gray level co-occurrence matrix, Haralick
features, face recognition.
I. INTRODUCTION
Face recognition has always been attracting the attention of
the researchers as one of the most important techniques for
human identification. One of the limitations of real-time
recognition systems is the computational complexity of the
existing approaches. Many systems and algorithms have
been introduced in the last couple of decades with high
recognition rates. The general problem of these systems is
their computational cost in data pre-preparation and trans-
formation to another domains such as eigenspace [3][4],
fisherspace[5][6], wavelet transform [10][11] and cosine
transform [12].
Many researchers have used the gray-level co-occurrence
matrix [13] for the extraction of texture features to be used
in texture classification. Gelzinis et al. [14] presented a
new approach to exploiting information available from the
co-occurrence matrices computed for different distance
parameter values. Attempt to extend the approach was in-
troduced in [16], where a new matrix called motif co-
occurrence matrix was proposed. Walker et al. [15] have
proposed to form co-occurrence matrix-based features by
weighted summation of co-occurrence matrix elements
from localized areas of high discrimination. The colour co-
occurrence histograms was used in [17] for object recogni-
tion, where they used false alarm rate to adjust some algo-
rithm’s parameters for optimizing the performance.
The idea behind this paper is to use this co-occurrence ma-
trix and its extracted features in face recognition. To the
best of our knowledge, no one has attempted to implement
this method for face recognition before. The idea is simple
and it is straight forward. For each face image, a feature
vector is formed directly by converting the generated
GLCM to a vector and used for classification. Additionally,
Haralick features [13] containing 14 statistical features can
be extracted from the GLCM to form a new feature vector
with 14 coefficients.
The GLCM method is compared with two well-known face
recognition techniques. The first one is the principal com-
ponent analysis (PCA) [1][2], which is the standard tech-
nique used in statistical pattern recognition and signal
processing for dimensionality reduction and feature extrac-
tion. As patterns often contain redundant information,
mapping them to a feature vector helps to get rid of redun-
dancies and yet preserve most of the intrinsic information.
These extracted features have great role in distinguishing
input patterns. The proposed GLCM based method is also
compared with the other well-known technique, linear dis-
criminant analysis (LDA), which overcomes the limitations
of PCA method by applying the Fisher’s linear discrimi-
nant criterion. This criterion tries to maximize the ratio of
the determinant of the between-class scatter matrix of the
projected samples to the determinant of the within-class
scatter matrix of the projected samples. Fisher discriminant
analysis method groups images of the same class and sepa-
rates images of different classes. Images are projected from
N dimensional space (N is the number of pixels) to C di-
mensional space (where C is the number of classes).
Unlike the PCA method that extracts features to best repre-
sent face images; the LDA method tries to find the sub-
space that best discriminates different face classes and en-
codes discriminatory information in a linear separable
space, where the bases are not necessarily orthogonal.
The paper is organized as follows: second section discusses
the gray level co-occurrence matrix. Third section explains
14 well-know GLCM Haralick features. The databases are
given in section four, while the experimental results, com-
parison with other algorithms and discussions are given in
fifth section followed by the conclusions in the final sec-
tion.
II. GRAY-LEVEL CO-OCCURRENCE MATRIX
One of the simplest approaches for describing texture is to
use statistical moments of the intensity histogram of an
image or region [9]. Using only histograms in calculation,
will result in measures of texture that only carry informa-
611
647
tion about distribution of intensities, but not about the
relative position of pixels with respect to each other in that
texture. Using a statistical approach such as co-occurrence
matrix will help to provide valuable information about the
relative position of the neighbouring pixels in an image.
Given an image I, of size N×N, the co-occurrence, matrix
P can be defined as:
1 1
1, ( , ) & ( , )
( , )
0,
N N
x y
x y
if I x y i I x y j
i j
otherwise
= =
= + A + A = ¦
=
´
¹
__
P
(1)
where the offset (¨
x
, ¨
y
), is specifying the distance be-
tween the pixel-of-interest and its neighbour. Note that the
offset (¨
x

y
) parameterization makes the co-occurrence
matrix sensitive to rotation. Choosing the offset vector, so
a rotation of the image not equal to 180 degrees, will
result in a different co-occurrence matrix for the same
(rotated) image. This can be avoided by forming the co-
occurrence matrix using a set of offsets sweeping through
180 degrees at the same distance parameter ¨ to achieve a
degree of rotational invariance (i.e. [0 ¨] for 0
o
: P hori-
zontal, [-¨ ¨] for 45
o
: P right diagonal, [-¨ 0] for 90
o
: P
vertical, and [-¨ -¨] for 135
o
: P left diagonal).
Figure 1 shows how to generate the four cooccurrence
matrices using N
g
=5 levels and offsets {[0 1], [-1 1], [-1
0], [-1 -1]} defined as one neighboring pixel in the
possible four directions. We can see that two neighboring
pixels (2,1) of the input image is reflected in P
H
concurrence matrix as 3, because there are 3 occurrences
of the pixel intensity value 2 and pixel intensity value 1
adjacent to each other in the input image. The neighboring
pixels (1, 2) will occur again 3 times in P
H
, which makes
these matrices symmetric. In the same manner, the other
three matrices P
V
, P
LD
, P
RD
are calculated.

0 3 4 2
2 1 3 4
0 3 2 1
2 1 0 3
0 1 2 3
0
1
2
3
Input image Intensity
0 1 0 2 0
1 0 3 1 0
0 3 0 1 1
2 1 1 0 2
0 0 1 2 0
0 1 2 3 4
0
1
2
3
4
0 0 4 0 0
0 0 0 3 0
4 0 0 1 1
0 3 1 0 1
0 1 1 1 0
0 1 2 3 4
0
1
2
3
4
0 2 0 0 0
2 0 1 0 1
0 1 0 3 1
0 0 3 2 0
0 0 1 0 0
0 1 2 3 4
0
1
2
3
4
0 1 0 1 0
1 0 1 1 0
0 1 0 2 0
0 1 2 1 0
0 0 0 0 1
0 1 2 3 4
0
1
2
3
4
P
H
(0
o
) P
V
(90
o
) P
RD
(45
o
) P
LD
(135
o
)
Figure 1 – Cooccurrence matrix generation for N
g
=5 levels and
four different offsets: P
H
(0
o
), P
V
(90
o
), P
RD
(45
o
), and P
LD
(135
o
).
The averaged GLCM matrix can be calculated by:
4
H V RD LD
+ + +
=
P P P P
P (2)
For normalization, we then divide each entry in the
GLCM matrix P by normalization constant R, that
represents the number of neighboring pixel pairs used in
calculating GLCM, to form a new normalized matrix p
which will be used later on in the classification process or
in generation of Haralick features.
(a) (b) (c)
Figure 2 – (a) Example from ORL Database (112×92),
(b) corresponding CLGM with 64 gray levels (64×64) and
(c) corresponding CLGM with 256 gray levels (256×256).
III. HARALICK FEATURES
In 1973, Haralick [13] introduced 14 statistical features.
These features are generated by calculating the features for
each one of the co-occurrence matrices obtained by using
the directions 0
o
, 45
o
, 90
o
, and 135
o
, then averaging these
four values. The symbol ¨ representing the offset parame-
ter can be selected as 1 or higher value. In general ¨ value
is set to 1 as the offset parameter. A vector of these 14 sta-
tistical features is used for characterizing the co-occurrence
matrix contents. Calculation of these features can be done
using the following equations.
Notations:
p(i, j) is the (i, j)’th entry in normalized GLCM.
N
g
, dimension of GLCM (number of gray levels)
p
x
(i) and p
y
(j) are the marginal probabilities:
1 1
( ) ( , ), ( ) ( , ).
g g
N N
x y
j i
p i p i j p j p i j
= =
= =
_ _
ȝ is the mean of ȝ
x
and ȝ
y
;
1 1
( ), ( ).
g g
N N
x x y y
i i
ip i ip i µ µ
= =
= =
_ _
ı
x
and ı
y
are the standard deviations of p
x
and p
y
, respec-
tively.
1/ 2 1/ 2
2 2
1 1
( )( ) , ( )( )
g g
N N
x x x y y y
i i
p i i p i i o µ o µ
= =
= ÷ = ÷
| | | |
| |
\ . \ .
_ _
1 1
( ) ( , ); 2, 3, , 2
g g
N N
x y
i j k
i j
p k p i j k N
+
+ =
= =
= =
__
!
1 1
( ) ( , ); 0,1, , 1
g g
N N
x y
i j k
i j
p k p i j k N
÷
÷ =
= =
= = ÷
__
!
612
648
1 1
1 ( , ) log{ ( ) ( )}
g g
N N
x y
i j
HXY p i j p i p j
= =
= ÷
__
1 1
2 ( ) ( ) log{ ( ) ( )}
g g
N N
x y x y
i j
HXY p i p j p i p j
= =
= ÷
__
1
( , ) ( , )
( , )
( ) ( )
g
N
k
x y
p i k p j k
i j
p i p j
=
=
_
Q
HX and HY are the entropies of p
x
and p
y
, respectively.
1
( ) log{ ( )}
g
N
x x
i
HX p i p i
=
= ÷
_
1
( ) log{ ( )}
g
N
y y
j
HY p j p j
=
= ÷
_
Haralick Features:
2
1
1 1
( , ) .
g g
N N
i j
f p i j
= =
=
__
(3)
( )
1
2
2
| | 0 1 1
, .
g g g
N N N
i j n n i j
f n p i j
÷
÷ = = = =
=
¦ ¹
´ `
¹ )
_ __
(4)
( )( )
3
1 1
( , )
.
g g
N N
x y
i j
x y
i i p i j
f
µ µ
o o
= =
÷ ÷
=
__
(5)
2
4
1 1
( ) ( , )
g g
N N
i j
f i p i j µ
= =
= ÷
__
(6)
( )
5 2
1 1
1
( , )
1
g g
N N
i j
f p i j
i j = =
=
+ ÷
__
(7)
2
6
2
( )
g
N
x y
k
f kp k
+
=
=
_
(8)
( )
2
2
7 6
2
( )
g
N
x y
k
f k f p k
+
=
= ÷
_
(9)
2
8
2
( ) log{ ( )}
g
N
x y x y
k
f p k p k
+ +
=
= ÷
_
(10)
9
1 1
( , ) log{ ( , )}
g g
N N
i j
f p i j p i j
= =
= ÷
__
(11)
1 1
2
10
0 0
[ ( )] ( )
g g
N N
x y x y
k l
f k lp l p k
÷ ÷
÷ ÷
= =
= ÷
_ _
(12)
1
11
0
( ) log{ ( )}
g
N
x y x y
k
f p k p k
÷
÷ ÷
=
= ÷
_
(13)
9
12
1
max( , )
f HXY
f
HX HY
÷
= (14)
( )
1/ 2
13 9
1 exp[ 2( 2 )] f HXY f = ÷ ÷ ÷ (15)
( )
1/ 2
14
second largest eignvalue of f = Q (16)
IV. PROPOSED SYSTEM
We adopted two scenarios for the implementation of
GLCM based face recognition. In the first scenario, we
used the co-occurrence matrix directly by converting it into
a column vector for each image in the recognition process.
In the second scenario, we used the vector of Haralick fea-
tures extracted from the co-occurrence matrix for recogni-
tion. Figure 3 shows these proposed scenarios.
Face
Database
GLCM
GLCM
Features
Classification
Classification
1
st
Scenario
2
nd
Scenario
Figure 3 – Proposed scenarios for using co-occurrence matrix and
its features for face recognition.
In the first scenario, we are using the GLCM directly after
converting it to a column vector. However, for 8-bit gray
level representation the size of this vector is 256×256,
which corresponds to a 65536 dimensional vector. The di-
mension of this vector can be reduced by reducing the
number of quantization/gray levels, N
g
, of the image. For
example, if N
g
is reduced to 16, the GLCM will be 256
(16×16) dimensional vector. The effects of dimensionality
reduction on the recognition performance are studied in the
following section.
In the second scenario, a vector of 14 Haralick features is
used to represent the given image. The size of this vector,
which is 14, is independent of the size of the GLCM.
V. SIMULATION RESULTS AND DISCUSSIONS
The simulations have been conducted on the two data sets:
FERET and ORL face databases. The standard test bed
adopted in similar studies for the FERET database [7], has
been used to test our algorithm for identification. The re-
sults are reported in terms of the standard recognition
rates (%). 600 frontal face images out of 200 subjects are
selected, where all the subjects are in the upright and fron-
tal position. Each face image is cropped to the size of
128×128 to extract the facial region and normalized to
zero mean and unit variance. In order to test the algo-
rithms, two images of each subject are randomly chosen
for training, while the remaining one is used for testing
(i.e. 400 training and 200 test images). In Figure 4 the first
two rows are the sample training images while the third
row shows samples from test images. It can be observed
from this figure that the test images contain variations in
illumination and facial expressions.
The second database used, was the face database from Oli-
vetti Research Lab [8]. This database is widely used for
evaluating face recognition algorithms. The database con-
sists of 400 images acquired from 40 persons. All images
were taken under a dark background. The images are grey
scale with a 92×112 pixels resolution.
613
649
Out of the 10 images per subject of the ORL face data-
base, first n were selected for training and the remaining
10-n were used for testing. Hence, no overlap exists be-
tween the training and test face images. Figures 4 and 5
show examples of both databases.
Figure 4 – Example FERET images used in our experiments. The
top two rows the examples of training images used and the bottom
row shows the examples of test images.
Figure 5 – Example ORL images used in our experiments.
Table 1 shows the result of using GLCM column vector
directly (direct GLCM) on ORL database using L
1
norm
distance, į
L1
. Reduced number of gray levels has been
used for GLCM dimensionality reduction. The training set
has been formed by using n=5 samples of each individual,
where the test set has been formed by using the remaining
5 samples. The results indicate that the direct GLCM ap-
proach outperforms the Haralick features based approach.
It is also, important to note that, the dimensionality reduc-
tion based on the reduction of the number of gray levels
does not change the recognition performance significantly
as we reduce the number of gray levels from 256 down to
16 levels. Highest performance is achieved by using 64
gray levels.
Table 2 shows the results on FERET database using L
1
norm distance. Different number of gray levels has been
used for GLCM dimensionality reduction. The training set
has been formed by using 2 samples of each subject,
where the test set has been formed by using the remaining
1 sample per subject. Similarly, the results indicate that
the direct GLCM approach outperforms the Haralick fea-
tures based approach. Highest performance is achieved
by using 32 gray levels.
The results in both tables indicate that, the proposed direct
GLCM method is much more suitable method than the
method using well-known Haralick features. The GLCM
scalability, provided by the scalable number of gray levels
N
g
, is an important flexibility.
An important result, which can be observed in Tables 1 and
2, is that in some cases when the co-occurrence matrix be-
comes smaller due to the use of reduced number of gray
levels, N
g
, the recognition performance of direct GLCM
approach slightly increases. This can be due to the quantiza-
tion process which suppresses the noise in the input image
that helps to increase the recognition performance.
1 2 3 4 5 6 7 8 9
55
60
65
70
75
80
85
90
95
100
Number of training images
R
e
c
o
g
n
i
t
i
o
n

R
a
t
e
NumLevels 256
NumLevels 128
NumLevels 64
NumLevels 32
NumLevels 16
NumLevels 8
Figure 6 – Face recognition performance of the Direct GLCM for
ORL database, using different number of training images and
different number of levels for generation of GLCM matrix, with
the į
L1
distance measure.
Figures 6 shows the performance of the direct GLCM
method for changing number of training samples in the
training set for different number of gray levels. The results
show that the recognition performance of the proposed sys-
tem changes slightly as we reduce the number of gray levels
from 256 to 16. Although, this observation is specific to
these two datasets and cannot be generalized, the results
indicate the scalability of the proposed direct GLCM ap-
proach.
Table 3 compares the performance of the proposed direct
GLCM based recognition system with the state of the art
techniques PCA and LDA by using the ORL face data-
base. The results demonstrate the superiority of the pro-
Table 2 – Face recognition performance (%) for different number of
levels using FERET database and similarity measures: į
L1
, with 2
training Images / 1 testing Images
Number of Gray Levels
Approach 256 128 64 32 16 8 4
Haralick features 29.33 29.67 26.17 26.67 34.50 36.83 13.16
Direct GLCM 88.67 90.67 91.67 92.33 88.67 74.83 14.33
Table 1 – Face recognition performance (%) for different number
of levels using ORL database and similarity measures: į
L1
, with 5
training images / 5 testing images
Number of Gray Levels
Approach 256 128 64 32 16 8 4
Haralick features 71.90 70.20 70.50 74.90 79.50 78.30 69.60
Direct GLCM 95.10 96.00 96.30 95.90 95.20 91.20 70.40
614
650
posed direct GLCM based face recognition system over
the well know face recognition techniques PCA and LDA.
VI. CONCLUSIONS
In this paper, we introduced a new face recognition method,
direct GLCM, which performs recognition by using gray-
level co-occurrence matrix. The direct GLCM method is
very competitive and outperforms the state of the art face
recognition techniques such as PCA and LDA. Using
smaller number of gray levels made the algorithm faster
and, at the same time, preserved the high recognition rate.
This can be due to the process of quantization which helps
in suppressing the noise of the images existing in higher
gray-levels. Recognition in Table 1 for ORL database gen-
erates the highest performance at N
g
=64 which corresponds
to a moderate number of gray levels in image representa-
tion. Same argument holds for FERET database as the rec-
ognition was the highest at N
g
=32 in Table 2. In general, it
is obvious from the results that the direct GLCM is a robust
method for face recognition with performance higher than
PCA and LDA techniques.
REFERENCES
[1] M. Kirby, & L. Sirovich, Application of the Karhunen-Loeve Proce-
dure for the Characterization of Human Faces, IEEE PAMI, vol. 12, no. 1,
pp. 103-108, 0162-8828, January 1990.
[2] L. Sirovich, & M. Kirby, Low-Dimensional Procedure for the Char-
acterization of Human Faces, JOSA, vol. 4, no. 3, pp. 519-524, 1084-
7529, March 1987.
[3] M. Turk, & A. Pentland, Eigenfaces for Recognition, Journal of
Cognitive Neuroscience, vol. 3, pp. 71-86. 0898-929X, 1991.
[4] A. Pentland, B. Moghaddam, & T. Starner, View based and modular
eigenspaces for face recognition, In Proceedings of Computer Vision and
Pattern Recognition, pp. 84–91, 0-8186-5825-8, IEEE Computer Society,
Seattle, USA, June 1994.
[5] P. Belhumeur, J. Hespanha, & D. Kriegman, Eigenfaces vs. Fisher-
faces: Recognition Using Class Specific Linear Projection. IEEE PAMI,
vol. 19, no. 7, pp. 771-720, 0162-8828, July 1997.
[6] W. Zhao, R. Chellappa, & N. Nandhakumarm, Empirical perform-
ance analysis of linear discriminant classifiers. In Proceedings of Com-
puter Vision and Pattern Recognition, pp. 164-169, 0-8186-5825-8, IEEE
Computer Society, Santa Barbara, Canada, June 1998.
[7] P. J. Philipps, H. Moon, S. Rivzi, & P. Ross, The Feret evaluation
methodology fro face-recognition algorithms, IEEE PAMI, vol 22, pp.
1090–1104, 2000.
[8] The Olivetti Database; http://www.cam-orl.co.uk/facedatabase.html.
[9] R. C. Gonzalez, & R. E. Woods, Digital Image Processing, 3rd Ed.
Prentice Hall, 2008.
[10] L. Wiskott, J.-M. Fellous, N. Kruger, and C. von der Malsburg,
“Face recognition by elastic bunch graph matching,” IEEE PAMI., vol.
19, no. 7, pp. 775–779, July 1997.
[11] C. Liu and H. Wechsler, “Independent component analysis of
Gabor features for face recognition,” IEEE Trans. Neural Networks, vol.
14, no. 4, pp. 919–928, July 2003.
[12] B. Manjunath, R. Chellappa, C. von der Malsburg, A feature
based approach to face recognition. In: Proc. IEEE Conf. on Computer
Vision and Pattern Recognition, pp. 373–378, 1992.
[13] R.M. Haralick, K. Shanmugam, I. Dinstein, Textural features for
image classification, IEEE Trans. Syst. Man Cybern. vol. 3, no. 6, pp.
610–621, 1973.
[14] A. Gelzinisa,A. Verikasa,b,, M. Bacauskienea, Increasing the dis-
crimination power of the co-occurrence matrix-based features, Pattern
Recognition, vol. 40, pp. 2367-2372, 2004.
[15] R.F. Walker, P.T. Jackway, D. Longstaff, Genetic algorithm opti-
mization of adaptive multi-scale GLCM features, Int. J. Pattern Recogni-
tion Artif. Intell. vol. 17, no. 1, pp.17-39, 2003.
[16] N. Jhanwar, S. Chaudhuri, G. Seetharaman, B. Zavidovique, Con-
tent based image retrieval using motif cooccurrence matrix, Image Vision
Comput. vol. 22, pp. 1211-1220, 2004
[17] P. Chang, J. Krumm. Object recognition with color cooccurrence
histograms. IEEE conference on computer vision and pattern recognition,
Fort Collins, co, June 1999.
Table 3 – Face recognition performance (%) of PCA, LDA,
direct GLCM and Haralick features for different number of
training images using ORL database with į
L1
and N
g
=256.
# Train
Images
Haralick
Features
Direct
GLCM
PCA LDA
1 42.33 60.11 61.44 35.83
2 56.50 78.69 74.63 76.88
3 62.86 87.43 82.86 87.57
4 68.33 92.33 88.42 93.25
5 71.90 95.10 90.00 95.10
6 74.50 96.38 91.13 96.35
7 74.67 96.83 94.00 96.67
8 76.00 98.25 94.50 97.25
9 76.00 99.50 97.50 98.50
615