ICIC Object Detection and Tracking

2015 International Conferenceon Industrial Instrumentation and Control (ICIC)
Col/ege ofEngineering Pune, India. May28-30,2015
Object Detection and Tracking using Statistical and

Stochastic Techniques
S.Vasuhi B.Haripriya V.Vaidehi
Department of Electronics Engineering Department of Electronics Engineering Department of Electronics Engineering
Madras Institute of Technology, Anna Madras Institute of Technology, Anna Madras Institute of Technology, Anna
University, Chennai, India. University Chennai, India University, Chennai, India.
lvasuhisrinivasan@yahoo.com haripriyab.32@gmail.com vaidehi@annauniv.edu
Comaniciu and Ramesh (2003) as well as Piater and Crowley

Abstract - This paper proposes a multilevel structure for
object detection and tracking in simple and complex
(2001), Vasuhi and Vaidehi (2014) use Kalman Flter (KF) for
environments. The foreground object is obtained using self
object tracking. If the state space is discrete and also made by
adaptive Gaussian Mixture Model (GMM) for dealing with the number of finite states means, Hidden Markov Models
illumination changes, repetitive motion of the targets and clutters (HMM) filters Gong and Letaief (2001) can be applied for
in the scenario. To obtain the robust and flexible target tracking, tracking. This method is also implemented by Yunqiang et al
synergizing combinations of the two random modeling techniques (2001) for visual tracking. Monte Carlo integration method
are used. One is the Pseudo-2D Hidden Markov Models based bootstrap filters belong to particle filter which is the part
(P2DHMMs) for modeling the outline of the object and detects
of general filters class. This filter allows for the state space
the human. The other is the Kalman Filter which uses the
representation of any distribution such as nonlinear, non
P2DHMM output to track the detected human.
Gaussian dynamical, observation models, and process and
Keywords- Gaussian Mixture Model (GMM); Pseudo-2D observation noises. Osawa et al (2006) discussed the object
Hidden Markov Model (P2DHMM); Kalman Filter (KF); tracking using particle filter which generates the environment
model and an ellipsoid human model. The rendered images of
I. INTRODUCTION
the ellipsoid shape model are used to estimate the
One of the difficult problems in video surveillance is the hypothesized state likelihood. They developed tracking in
trailing of different objects in the arbitrary multifarious cluttered surroundings but there is no discussion about the cost
environments. Image based multi-target tracking has been of rendering an image from a model per particle per time step.
described extensively in the literature [1]. Background In order to segment target this method requires the static
subtraction technique has been widely used in foreground scenario. Saad and Mubarak (2006) and Lopez et al. (2007)
pixel detection where a fixed camera is usually used to described the visual hull based tracking methods. Saad and
observe dynamic scenes Ansuman et al (2014)[2]. A Mubarak (2006) has been discussed a slice of the visual hull
background estimation procedure is performed to separate around people's feet is calculated and tracking is done in
motion from the background. Many standard methods exist for offline. A voxel representation of the visual hull based particle
background modelling and segmentation of foreground filters are used to track people. Visual hull techniques are
objects. In these methods, a background scene model is sensitive to errors in forefront segmentation and are not suited
statistically learned using the redundancy of the pixel for environments with many occlusions. Some of the
intensities in a training stage. In the interactive circumstances distinctive advantages of the above systems are taken and
a real time human tracking system [2] uses depth based adaptations are added to it. The key feature of the proposed
surroundings subtraction. The offline method Darrell [3] is algorithm is that it makes use of two powerful random
also used depth based background subtraction but additionally modeling techniques, namely Pseudo-2DHMMs and Kalman
presents delaying fragmentation. The techniques presented by Filters. The information obtained from the complex shape
Harville (2005) and Zhao (2005) used the "plan views" model of the object is responsible for the input of the kalman
concept of dense stereo data in order to track the objects but filter. The structure has been automatically learned and
not multi-sensor fusion method. Above techniques are all acquired by the P2DHMM. The Kalman Filter obtains its
different from the proposed because they require that the input information from the P2DHMM, the filter itself feeds its
objects in the scenario have enough texture reconstruction of output information back to the P2DHMM. This optimal
dense stereo and also background models with the assumption feedback for the flexible tracking between these two modules
of static background. is another reason for the powerful performance and makes our
The M2Tracker system developed by Mittal and Davis system very robust and reliable.
(2002) has been constructed to work in congested The paper is arranged as follows. Section II explains the
environments. The region based stereo technique avoids many system overview. Section III provides results and discussion.
problems by region matching instead of considering the And the paper concludes in section IV.
points. Kalman Filter was utilized for object tracking by
Ayache and Faugeras (1989). Zhao and Nevatia 2004),
978-1-4799-7165-7/15/$31.00 ©2015 IEEE 1115

II. S YSTEM OVERVIEW each background pixel state. In a complex surrounding pure
The system is built using five modules namely, Image background will not be available and it can always be
Acquisition, Circumstances Learning and Forefront Object changing. A Gaussian Mixture Model (GMM) was first
Detection, Feature Extraction, Object Modeling and Object developed by Stauffer and Grimson [16]. This algorithm had
Tracking. Figure.l illustrates the system flow diagram of the statement that the background is more frequently visible than
proposed system. the foreground. The model has a relatively narrow variance.
Zivkovic and Heijden [17] delivered an improved GMM
model which constantly updates the parameters for a recursive
computation of a GMM, and it adaptively chooses the
appropriate number of Gaussians to model each pixel. When
Image Acquisition
applying background subtraction in real time scenarios, the
suddenly illumination changes constitute a particular
� difficulty. The Multi-Dimensional Gaussian Kernel density
Transform (MDGKT) has been used to deal unnecessary
Background Modeling and
motions and swaying trees [18].
Foreground Extraction
For all pixels x in set X, a self-adaptive Gaussian mixture
model using a global illumination change factor between the
current image ic and the reference image if is defined as,
l
,It
Feature Extraction
J
i ,x
h = medianxcX � (1)
ir,X
,It
to deal with the sudden illumination changes. Based on the
Object Modeling
r- intensity value in RGB color space, each pixel value is
categorized. Then, the probability of discerning the current
1 pixel value at time t is given by the following formula,
Object Tracking M
p(Xt}= .L: wJ't N(Xf,f/'f,L 'f} (2)
J�l ' J, J,
Fig.!. System flow Diagram Where M is the number of distributions and is determined
by the background multimodality and by the available
A. Image Acquisition memory and computational power. Stauffer and Grimson [16]
The proposed system is capable of functioning in the proposed to set M from 3 to 5. Wj,t is the weight associated
offline mode. The system can track persons in pre-recorded
with the f" Gaussian and should satisfy the condition,
videos. Videos are sampled at 30 frames/second. The fact of
using fixed cameras allows us to perform foreground and M
background segmentation at relatively low computational .LW I
· �l
(3)
I�l J '
costs. This justifies the choice for a simple detection algorithm
summarized in the next section.
is the mean associated to the f" Gaussian. L . t is the
fJ .
B. Background Modeling and Forefront Extraction �t �
A motionless camera is used in video observation and standard deviation associated to the f" Gaussian.
supervising system. The main objective of the background N(Xf,f.1 ' , L 'f) is the Gaussian probability density function
j, t j,
modeling is to model the background and to detect the moving
is given by,
object in the scene. The Mixture of Gaussians (MoG) is a
widely used approach for background modeling and to detect
moving forefront objects. A system which is robust must be
independent of lightning changes and should deal with the
problems such as sudden change in weather conditions,
clutters, geometric deformation and the repetitive motions of
The RGB color components are self-governing. So they
objects in the scenario.
have the similar variances. The different constraints of the
The Mixture of Gaussian is a pixel based process. This Gaussian Mixture Model must be initialized. For the every
algorithm is a statistical model and gives the description about
1116
new incoming frame, a match test is made for each pixel First the mean of the detected image is computed. Then
against the existing M Gaussian distributions. subtract the mean from image and the covariance of the above
image was found by mUltiplying the above image data with its
IIX-flJ.lla. < Dfor some i=[l ...M] (5) transpose. From the covariance matrix the Eigen values and
J
Eigen vectors are calculated. The next step is to choose the
components that help to form the feature vector. By noticing
Where D is the deviation threshold value and it is nearly from the eigen value and the eigenvectors obtained from the
equal to 2.5. Then the parameters of the mixture are updated previous step are quite different values. The principle
as follows: component of the image is decided by the eigen vector with
highest eigen value obtained from previous step. The
Wj•t+ �(l- o.)wj,t +0. eigenvector with the largest eigen value will be mostly in the
] (6) middle. It is the most significant relationship between the data
dimensions.
fl).1+1 � (1 - j3)fl).1 + j3Xt+1
We have to order them by eigenvalue in decreasing
manner. This provides the components in order of the
significance, so the eigenvalues with the lesser significance
can be ignored. If we originally have n dimensions of the
The learning rate a is persistent, and � is demarcated as image, and so n Eigen vectors and Eigen values are calculated.
follows: Then only first p Eigen vectors are chosen, and then fmally
feature vectors of p dimensions is obtained.
(7) D. Object Modeling
The Pseudo 2-D HMMs (P2D-HMMs) are the

If there is no match, the component with the lowest weight advancement of the one dimensional HMM in order to model
is re-initialized with, 2-dimensional data. They are called pseudo, due to the fact
that they are not real as 2D since it does not connect all
(8) possible states and state alignments of the consecutive
columns are calculated. It models the occurrence of a feature
vector sequence which can be derived from the pre-processed
Online learning algorithm incorporates all pixel
object as described in [20]. The parameters of the P2DHMM
observations; the mixture model does not differentiate
consist of the output probabilities of the various HMM states
components that correspond to background from those related
that can be learned in order to model different objects and
with foreground objects. So the Gaussians are sorted by
transition. The learning of object shapes can be proficient by
decreasing weight-to-standard-deviation ratio w/crj. This
the following way: Several images of object with proper pre
ordering assumes that a background pixel which having high
processing are applied to the P2DHMM for learning the
weight with a weak variance due to the fact that the
structure of the object by applying parameter estimation
background is present in the scene more than moving objects
methods, like the Forward-Backward algorithm.
and its value is practically constant. The background
distribution B stays on topmost with the lowest variance by It is important to note that this learning procedure is a
applying a threshold T. foundation action that has to be carried out only one time. This
is not part of the actual tracking procedure and it is required
b only if the object to be tracked changes from one to other (e.g.
(9)
B � argbmin( .L WJ't > T)
J�] , from a person to a car). If the system would be reused, e.g. for
surveillance of a traffic intersection, the background states
where b is the number of background components. All pixels could be easily adapted with the Forward-Backward algorithm
X (t) that do not match any of these components are marked as to the new situation. This stochastic model is capable of
foreground. modeling persons within a composite background. The actual
tracking procedure starts from the appearance of the first
C. Feature Extraction
frame of the tracking video sequence to the trained P2DHMM.
The Principal Component Analysis (PCA) features are If an image containing a person is presented in the trained
obtained from the detected image. The PCA is an analysis of P2DHMM means, the Viterbi algorithm can be used again in
n-dimensional data and it examines correspondence between order to compute the segmentation of the image into blocks. In
different dimensions. It determines the principal dimensions the next step, by simply calculating the appropriate moment
along which the variance of the data is high. It is a way of from the segmented blocks using Viterbi algorithm the Center
ascertaining patterns in data and also articulating the data in of Gravity of a person is calculated.
such a way as to highlight their similarities and differences
[19].
1117
E. Object Tracking TABLE I. TRACKING OF SINGLE PERSON
The output of the object modeling step is the coordinates Input Video Tracking Tracking Tracking
Frame Output using Output using Output using
of this COG, represented as xs and ys and the size of the Proposed GMMandKF FD+KF
bounding box of the segmentation, denoted as w and h. It aids System
as the measurement input to the Kalman Filter. This equation
is given by,
M � [x
P
y
P
xv
Y IV
v
hJT (10)
Tracking employs the centroid of each detected blob,

using a constant velocity Kalman Filter [16] model. The state
of the filter is the centroid location and velocity is given by,
z � [x p yp W h JT (11)
From the inputs obtained from frame k- J and the

B. Tracking Results ofMultiple objects
measurement equation Z - ' the system will predict the state
k J The tracking results of multiple objects are compared with
vector xk . The state vector x _ is used to mark the position GMM +KF and Frame Difference (FD) + KF methods are
k J
given in Table II.
of the object in the images of the sequence. The predicted
vector xk is fed back as input to improve the estimation of TABLE II. TRACKING OF MULTIPLE OBJECTS
vector Zk in the next frame. That is to say the bounding box

predicted by the KF is enlarged. Input Video Tracking Output Tracking Tracking
Frame using Proposed Output using Output using
The data association problem between mUltiple blobs is System GMMandKF FD+KF
addressed by the comparison between the predicted centroid

location with the centroid of the detections in the current
frame. The blob with its centroid closest to the predicted
location is chosen as the best match of the system.
III. RESULTS AND DISCUSSIONS

Code development and simulation is carried out in a
personal computer running on Intel Core 15 configuration
having 4GB RAM working in windows 7 64-bit operating
system using MATLAB
A. Tracking Results of Single Object
Tracking of single object in outdoor environment with few

background variations is considered. Tracking results are
shown in Table I. The proposed GMM, P2DHMM and KF
From the Table II, it is observed that, the proposed method
algorithm is compared with GMM +KF and Frame Difference
effectively track the multiple moving objects in the successive
(FD) + KF.
frames, compared to the other methodologies. The number of
From the Table I, it is observed that, the proposed moving objects increases in the video, the bounding box
algorithm gives accurate result by providing proper bounding constrction around the detected object provides error in other
box around the detected human. The output shown against the methods.
other two methods are nor accurate. Because the bounding box
C. Tracking E rrors
around the detected human is not in exact size.
The tracking error rate of the proposed method is
compared with GMM +KF and FD +KF and is shown in
Figure 2.
1118
[II] lH.Piater and lL.Crowley, "Multi-modal tracking of interacting targets
0.2 using gaussian approximations", IEEE International Workshop on
0.15 � Performance Evaluation of Tracking and Surveillance,pp. I - 8,200I.

[12] Gong, Y and Letaief, K.B, "Space frequency time coded OFDM for
�l
....,..,"" 0.1 � • • proposed
broadband wireless communications" in IEEE Proc. on Global
Telecommunications (GLOBECOM),San Antonio,pp.519-523,2001.
�
• II
�
'"'0
'"''"' 0.05
n
!II 0 " �.
.GMM+KF
[13] Yunqiang, C., Yong, R and Hunag,T "JPDAF based hmm for real- time
contour tracking", IEEE in Proc. on Computer Vision and Pattern
Recognition,Vol.l,pp.232-245,2001.
�
a II .. FD+KF [14] Osawa, T., Xiaojun W., Wakabayashi, K. and Takayuki, Y, "Human
tracking by particle filtering using full 3D model of both target and
a 2 4 6 environment," in 18th International Conference on Pattern Recognition,
Ntunber of Objects 2006.
[15] Saad M. K and Mubarak, S, "A multiview approach to tracking people
in crowded scenes uses a planar homography constraint", in European
Fig 2. Tracking error comparison Conference on Computer Vision,pp.782-79, 2006.
[16] Stauffer, Grimson, "Adaptive background mixture models for Real-time
The Figure 2 shows that, the proposed algorithm provides tracking",in Proc. on IEEE CVPR,Vol.2,pp.246-252,1999.
less error compared to the other methodologies, ill an [17] ZZivkovi, Heijden, "Efficient adaptive density estimation per image
increased number of objects in the scene. pixel for the task of background subtraction", Pattern Recognition
Letters,Vol. 27,No.7,pp. 773-780,2006.
IV. CONCLUSION [18] Chen, N. Pears, M. Freeman and J. Austin, "Background subtraction in
video using recursive mixture models, spatio-temporal filtering and
In this paper, investigation of object tracking using shadow removal", in Proc. of 5th ISVC, in Lecture Notes in Computer
combined methods such as self-adaptive Gaussian Mixture Science,Vol. 5876,2009, pp. 1141-1150,2009.
Model with the synergizing combinations such as the [19] Lindsay, Smith, "A tutorial on Principal Components Analysis",
P2DHMM and Kalman Filter. Simulations showed a greater February 26,2002.
enhancement in the tracking and lower inaccuracy rate for the [20] S.Kuo, O.E.Agazzi, "Keyword spotting in poorly printed
documentsusing pseudo 2-D hidden Markov models", IEEE
proposed schemes over the tracking schemes. So it is clear that Transactions on Pattern Analysis and Machine Intelligence 842-848,
the proposed tracking model performs well in all the situations 1994.
such as presence of clutter and swaying trees. [21] S.Marchand-Maillet, "10 and Pseudo-2D Hidden Markov Models for
Image Analysis-Applications and Results", Technical Report MMWP-
REFERENCES 99xx, Department of Multimedia Communications. EURECOM,
Institute,Sophia,Antipolis,552- 559,1999.
[I] Wei Qu, Dan Schonfeld, and Magdi Mohamed, "Distributed bayesian
multiple-target tracking in crowded environments using multiple [22] T. Zhao and R. Nevatia, "Tracking Multiple Humans in Complex
collaborative cameras", Journal on Advances in Signal Processing, Situations", IEEE Transactions on Pattern Analysis and Machine
Hindawi Publishing Corporation,pp.253-263,2007. Intelligence,Vol. 26,No. 9,pp.1208 - 1221,2004.
[2] Ansuman, M., Tusar, K. M., Pankaj, K. S., Banshidhar, M. , "Human [23] Dalal, N., Triggs, B, "Histograms of oriented gradients for human
recognition system for outdoor videos using Hidden Markovmodel", detection" IEEE in Proc. on Computer Vision and Pattern Recognition,
International Journal of Electronics and Communications, Elsevier, Vol. 2005,pp. 886-893.
68,No. 3,pp 227-236,2014. [24] Mirabi, M. and Javadi, S. "People Tracking in Outdoor Environment
[3] Krumm, l, Harris, S., Meyers, 8., Brumitt, 8., Hale, M. and Shafer, S. Using Kalman Filter", in Proc. on Intelligent Systems Modelling and
"Multi camera Multi-person Tracking for Easy Living", in Proc. on Simulation,2012,pp. 303 - 307.
Visual Surveillance,pp. 1- 8,2000. [25] Ayache,N and Faugeras,0.0., "Building Registrating and Fusing Noisy
[4] T. Darrell, D. Demirdjian, N. Checka, and P. Felzenszwalb, "Plan-view Visual maps" Multisensor Integration and Fusion for Intelligent
trajectory estimation with dense stereo background models," in Proc. on Machines and Systems",book chapter edited by Ren C. Luo,Michael G.
Computer Vision,pp.132-140,200I. Kay,pp. 495 - 540,1989.
[5] Harville, M. "Stereo person tracking with short and long term plan-view [26] L. Rabiner, "A tutorial on hidden markov models and selected
appearance models of shape and colour", in IEEE Proc. on Advanced applications in speech recognition", IEEE International Workshop on
Video and Signal based Surveillance,pp.382-295,2005. Performance Evaluation of Tracking and Surveillance, Vol.7.
[6] Zhao, T. Aggarwal, M., Kumar, R and Sawhney, H, "Real-time wide [27] Vasuhi, S., V. Vaidehi, "Target Detection and Tracking for Video
area multi-camera stereo tracking", in IEEE Proc. on Computer Vision Surveillance", WSEAS Transactions on Signal Processing, Vol. 10, pp.
and Pattern Recognition,Vol.l,pp. 976-983,2005. 179 - 188,2014.
[7] Anurag Mittal and Larry S. Davis, "M2tracker: A multiview approach to

segmenting and tracking people in a cluttered scene using region-based
stereo",in European Conference on Computer Vision,pp. 18-36,2002.
[8] T. Zhao, R. Nevatia, and F. Lv, "Segmentation and tracking of mUltiple
humans in complex situations",in Proc. on Computer Vision and Pattern
Recognition,2001.
[9] Lopez, c., Canton F. and Casas, lR. "Multiperson 3D tracking with
particle filters on voxels," in IEEE Proc. on Acoustics, Speech and
Signal Processing,Vol. I,pp. 913-916, 2007.
[10] D. Comaniciu, V. Ramesh,and P. Meer, "Kernel-based object tracking",
IEEE Trans. Pattern Analysis Machine Intelligence, pp.564-575, 2003.
1119

ICIC Object Detection and Tracking

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ICIC Object Detection and Tracking

Uploaded by

Copyright:

Available Formats

2015 International Conferenceon Industrial Instrumentation and Control (ICIC)

Col/ege ofEngineering Pune, India. May28-30,2015

Object Detection and Tracking using Statistical and

Comaniciu and Ramesh (2003) as well as Piater and Crowley

978-1-4799-7165-7/15/$31.00 ©2015 IEEE 1115

The Pseudo 2-D HMMs (P2D-HMMs) are the

Tracking employs the centroid of each detected blob,

From the inputs obtained from frame k- J and the

vector Zk in the next frame. That is to say the bounding box

addressed by the comparison between the predicted centroid

III. RESULTS AND DISCUSSIONS

Tracking of single object in outdoor environment with few

0.15 � Performance Evaluation of Tracking and Surveillance,pp. I - 8,200I.

[7] Anurag Mittal and Larry S. Davis, "M2tracker: A multiview approach to

You might also like