You are on page 1of 5

Human Shape Recognition

Using the Method of Moments and Artificial Neural Networks

Cbristodoulos A. Nicolaou, Allan L. Egbert, Jr., R.C. Lacher, and Susan I. Bassett
Computer Science Department, Florida State University
Tallahassee, FL USA

ABSTRACT its known classes. The system then looks at the


The research described in this paper explores the classification of the unknown moment vector and signals
feasibilig of an automated adaptive system capable of whether it was decided to be of a human shape or not.
recognizing that a moving object in a sequence of two-
dimensional images is a human or, at the very least, has a METHODS EMPLOYED
human shape. To addrars this problem a method The system has two main components: a moving object
employing the method of standard moments, and artificial detection component that uses image-differencing to
neural networlrs has been designed and implemented. The segment the shapes of interest in an image frame, and a
results of the method implementation and testing indicate reasoning component that decides whether each shape
the validity of the system design and the techniques used. presented to it represents a human shape or not. The focus
The automated human recognition system developed of this paper is the reasoning component of the system.
recognizes le$ and right human profiles with a success In this section we present the methods employed by the
rate of 87.5% and 92.5% respective&. The system is also reasoning component.
successfil in distinguishing images of human profiles f?om
images of tailgating persons, crouching persons, objects, The Method of Standard Moments
persons with objects andplain image noise. The method of moments, first suggested by Hu [2], was
introduced as a possible solution to pattern recognition
INTRODUCTION from two-dimensional images. Hu's theory was one of two-
The ability of a man-made computer program to recognize dimensional moment invariants for planar geometric
the existence of a human figure from video images is a figures, based on the theory of algebraic forms. Since Hu
long-term goal, which could contribute to the quest for a introduced them in 1962, many variants of the method of
machine able to duplicate human capabilities. This moments have been suggested and a number of
research explores the feasibility of an automated adaptive applications utilizing moments have appeared in the
system capable of recognizing that a moving object in a literature [1,6,7], including Dudani et al. [l] and Reeves et
sequence of video images has a human shape. The method al. [6], whose work focused on automatic aircraft
designed and implemented to achieve this goal employs recognition.
the method of standard moments [6], a variant of the
method of moments [2] and two types of artificial neural Dudani et al. used moment invariants as they were
networks, the self organizing map (SOM)[3] and the Back described in [2], while Reeves et al. [6] devised and used
Propagation Neural Network (BPNN). their own variation of the method of moments, which they
call "standard moments." This variation, which is also use
The method of moments is used to compute efficiently and used in this research, produces invariant moment values
effectively a descriptive vector for each 2-dimensional that are normalized with respect to size, location, rotation,
shape segmented from an image. SOM is used to cluster and optionally, reflection [6].
the moment vectors for a large number of 2-dimensional
shapes that represent a set of objects. The output of SOM, The conventional definition given by Hu [2] of the two
i.e. the suggested groupings of objects are then labeled dimensional (p+q)*moment order, (p,q = 0,1,2, ...), of a
with respect to the shape of the objects that the moment density distribution function f(x,y) is:
vectors they represent. These labeled classes are then
supplied to a BPNN that is trained to recognize them.
During testing, a 2-dimensional shape is segmented from a
video image, its moment vector is computed and it is
supplied to the trained BPNN,which classifies it to one of

01999 IEEE
0-7803-5529-6/99/$10.00 3147
These moments are often named general, geometric or
Cartesian moments.
MI, = MO,
= 0.0
Transformationsof image segments can easily be defined
in terms of moments. Scale change, translation and
rotation transformations for conventional moments are For rotation normalization, M11 is set to zero. By setting it
described below. Scale change transformation aims at to zero, the principal axis of the object (image segment) is
making the feature vector invariant to size variations of the aligned with the reference axis.
same object. The moments of the distribution f(x,y) after a
scale change by a factor lambda are given with the
formula: MI, = 0.0

To achieve scale normalization in the case of standard


moments the scale factor lambda is set as follows:

where p,q = 0,1,2 ,...


a = (m,)-1’2
Translation transformation aims at making the feature
vector invariant to the position of an object in a scene. The For translation normalization, the position of the center of
moments of the distribution f(x,y) after a translation mass is defined
transformationby (xa, yb) in the image plane are defined

Finally, for rotation normalization, the rotation angle is


where p,q = 0,1,2 ,... calculated

Rotation transformationmakes the feature vector invariant tun2q = 2 * MI, / ( M 2 ,- M,)


to the angle of rotation of an object in the image. The
moments of f(x,y) after a rotation by an angle theta at the
point of origin: Once theta is found, rotation normalization of Mll using
the above equation will set the standardized Mll = 0. For
more details on the derivations of the standard moments’
formulas see[6].

Object Recognition Using Standard Moments


where p,q = 0,1,2 ,... Moments can be used to derive a concise, highly
descriptive feature vector of an object in a two-
dimensional image. A possible feature vector that
Standard moments, denoted by M, are a variation of the efficiently characterizes an object using conventional
conventional moments that have been normalized with moments is:
respect to scale, translation and rotation [6].

The zero&order moment representingthe area of the object


is normalized to one:

M,= 1 Using the simple linear transformation properties of


moments described previously the individual moment
The central moments Mlo and Mol are both set to zero. values can be refined to achieve shape, translation, and
This way the center of mass, COM, of the object is set to rotation invariance. Furthermore, computing the moments
the origin. in the case of a discretized image is simple. For a f(x,y)

3148
image segment the two-dimensional moment is defined with a weighted output formula, an optional bias, and an
with the formula: activation hction. Supervised learning is achieved with
the back-propagation rule [5].

The architecture of the back-propagation network would


y m i n xmin
be closely related to the results of the SOM
implementation. Rao and Rao [5] developed the back-
propagation neural network software program used by this
where xmin, xmax are the minimum and maximum
research. The back-propagation (BP) algorithm was used
horizontal values of the image segment of f(x,y) and ymin,
to classify moment vector data into pre-determined classes,
ymax are the minimum and maximum values of the
previously defined by the SOM network.
vertical values of f(x,y).

Neural Networks
Due to the complexity of the human shape, a researcher
could not predefine the characteristics or classes of the
human body. Detecting the multiple classes of a human
shape was achieved with a neural network algorithm that
employs unsupervised learning, namely the Self
Organizing Map (SOM). Distinguishing the features of the
classes was achieved with a feedforward back-propagation
neural network (BPNN).

The Self Organizing Map (SOM), first proposed by


Kohonen of the Helsinki University of Technology [2],
was used to cluster the feature moment vectors
representing the human shape. This research created SOM
networks with the Nenet software package [4]. The SOM
is a neural network that uses an unsupervised learning rule Figure 2: A Back-PropagationNeural Network
to place a number of reference vectors on an "elastic
surface."[3] The main component is the Kohonen layer of
neurons that compete with each other on a winner-take-all
basis. Each neuron computes its output according to a PROCEDURE
weighted output formula. The winner neuron is the one Figure 3 shows how OUT system utilizes feature vectors
that has the highest output and forces changes on the derived by moments for object recognition. Our process
surface of the elastic map [5]. was influenced by Reeves et. al. [9]. During the training
phase, a sufficiently large and diverse set of input images
is provided to the pattern recognition system. The system
initially generates the moment vectors of the objects of
interest in the images and compiles a training set. Then,
with the help of SOM,the clustering method used in this
research, a classification of the moment vectors in the
training set is created. After the clustering process is
complete, the user labels each of the clusters of moment
vectors formed by SOM according to the shape
represented by the moment vectors it contains, i.e. "normal
human: left profiles", "non-normal human: objects", "non-
normal human: tailgating persons" etc. A BPNN is then
trained to recognize the labeled clusters suggested by
SOM. This is achieved by training the BPNN so that it
places all the moment vectors that were placed in a cluster
by SOM in the same output node labeled after the
Figure 1: A Self Organizing Map corresponding SOM cluster.
A multi-layer feedforward neural network is composed of During testing, the pattern recognition system is supplied
an input, output, and one or more hidden layers. With the with a new image from which it generates the moment
exception of the input layer, all layers compute their output

3149
vectors of the objects in the same way as in the training
phase. The vectors generated represent unknown objects,
which the system then "matches" against the vector classes
learned during the training phase. This matching process is
performed by sending each unknown vector through the Number
trained BPNN and recording the output node where it gets Correct
placed. The object that the unknown moment vector
represents is then decided to be of the same type as the Total
label of the output node of BPNN that it was placed in.
Table 1: Left Profile Network Results: Human Shaped
Object

Correct 37 92.5
Wrong I 3 7.5
Total 40 100

U EbmntVeemr bbmcnt Vcaor


GcncaUon Gemcution
Table 2: Right Profile Network Results: Human Shaped
Object

Object Correct Wrong Total


Type Number I Percent. NumberI Percent. Number
Tailgating
- - 21 I 87.5 3 1 12.5 24
Crouching 10 100 0 0 lo
Pers.+Obj. 31 96.87 1 3.13 32
Objects 43 93.5 3 6.5 46
Noise 70 100 0 0 70

Table 3: Left Profile Network Results: Non Human


Shaped Objects

Figure 3: Moment Vector Based Object Recognition Correct

Crouching
6.26 32
Objects 43 93.5 6.5 46
RESULTS Noise 70 100 70
Two l l l y connected BP networks, with three layers were
constructed; one to recognize left profiles and one to Table 4: Right Profile Network Results: Non Human
recognize right profiles. The two networks were trained Shaped Objects
with different data sets and produced similar results. In
addition, both networks had similar failure rate of false
positives and negatives. Other network architectures produced better results for
identifjmg human profiles. However, this resulted in a
The percentage of recognition of left human profiles was lower success rate for identifylngnon-human profiles. The
87.5%. Right human profiles were correctly recognized as left network had a probability of:
such 92.5% of the test cases. When presented with non
human profile shaped objects the system was successful in Pi = 12.5/100 = 0.125
-
classifylng them as non human between 87.5% 96.5% of
the test cases depending on the kind of shapes of objects it of an incorrect classification. However, the probability of
was presented. incorrectly classifylng 2 images in a row is 0.016. Since
the system was designed to minimize false identifications

3150
of non-human profiles as human profiles, an automated [6] A. P. Reeves, R. J. Prokop, S. E. Andrews and F. P.
system can use these probabilities to its advantage. If three Kuhl, Three Dimensional Shape Analysis Using Moments
images are accepted at a time, the profile is labeled human and Fourier Descriptors, IEEE Transactions on Pattern
if minimally, one of the images is identified as a human Analysis and Machine Intelligence, November 1988,
profile. The resulting probability: vol.10, num.6, pp.937-943

P ~ p i=c 12.5/100 * 12.5/100 * 12.5/100 0.002 [7] A. P. Reeves and R W. Taylor, Identification of Three
Dimensional Objects Using Range Information, IEEE
The identification of a human profile will fail only 2/1000, Transactions on Pattern Analysis and Machine
with only a small increase of false positive errors. The Intelligence, April 1989, vol.11, num. 4, pp.403-410.
right network will have a probability error of less than
0.002, due to its higher experimental success rate.
ACKNOWLEDGEMENTS
This work was supported by Department of Energy
CONCLUSIONS contract #DE-FG08-97NV13138 and done in collaboration
The human shape recognition system is a hybrid system with Los Alamos National Laboratory.
that combines the use of the traditional statistical method
of moments for characterizing the shape of objects with
artificial neural networks, such as the feedfomard back
propagation network. The performance of the system
verifies the validity of the approach used to solve the
problem of human shape recognition. The neural networks
used to cluster the moment vectors representing human
profiles and to classify the moment vectors derived by
various object shapes during testing proved to be able to
capture characteristicsunique to human profiles and enable
the system to produce good results consistently and under
various testing conditions.

The use of the method of moments and artificial neural


networks enables the developed system to function in real-
time and to adapt easily after a brief retraining period to
recognize new variations of human shapes in a variety of
operating environments.

BIBLIOGRAPHY
[l] S. A. Dudani, K. J. Breding and R. B. McGhee,
Aircraft Identification by Moment Invariants. IEEE Trans.
Comput., January 1977, C-26, pp. 39-45

[2] M. Hu, Visual Pattern Recognition by Moment


Invariants. IRE Trans. Info. Theory, 1962, vol. 8, pp. 179-
187.

[3] T. Kohonen, Self Organizing Maps, Springer Series in


Information Sciences. Springer, 1995.

[4] J. Pronkko, Nenet: Neural Networks Education Tool,


http://wmv.hut.fi/jpronkko/nenet.html, 1997.

[5] V.B. Rao and H.V. Rao, Neural Networks and Fuzzy
Logic, MIS Press, second edition, 1995.

3151

You might also like