You are on page 1of 11
Page 1
Page 1

Object tracking using Radial basis function networks

1

A. Prem Kumar [a] , T. N. Rickesh [b] , R. Venkatesh Babu [c ] , R. Hariharan [d]

Abstract: The applications of visual tracking are broad in scope ranging f rom surveillance and monitoring to smart rooms. A robust object-tracking algorithm using Radial Basis Function (RBF) networks has been implemented using OpenCV libraries. The pixel-based color features are used to develop classifiers. The algorithm has been tested on various video samples under different conditions, and the results are analyzed.

  • 1. Introduction

The objective of tracking is to follow the target object in successive video frames.

The

 

major utility of such algorithm is in the design of video surveillance system to tackle

terrorism. For instance, large-scale surveillance might

have

played

a

crucial

role

in

preventing (or tracking the trails of terrorism) 26/11 terrorist attacks in Mumbai, many bomb blasts in Kashmir, North-east Indian region, and other parts of India. It is important to have a robust object-tracking algorithm. Since neural network f ramework does not require any assumptions on structures of input data, they have been used in the field of pattern recognition, image analysis, etc. The Radial Basis Function (RBF) based neural

network is one

of

many ways to build classifiers. A robust algo rithm for object tracking

using RBF networks was described in the paper [1]. We have implemented that algorithm using OpenCV libraries so that this module can be integrated into a large surveillance system.

  • 2. Object Tracking

Object tracking is an important task within the field of computer vision. The growth of high-performance computers, the availability of high quality yet inexpensive video cameras, and the increasing need for automated video analysis has generated a great deal of interest in object tracking algorithms. There are three key steps in video analysis: detection of interesting moving objects, tracking of such objects from frame to frame, and analysis of tracks to recognize their behavior. The object tracking is pertinent in the tasks of:

  • Motion-based recognition, that is, human identification based on gait, automatic object detection, etc.

  • Automated surveillance, that is, monitoring a scene to detect suspicious activities or unlikely events

Video

indexing,

that

multimedia databases

is,

automatic

annotation and retrieval of the videos in

  • Human-computer interaction, that is, gesture recognition, eye gaze tracking for data input to computers, etc.

  • Traffic monitoring, that is, real-time gathering of traffic statistics to direct traffic flow

  • Vehicle navigation that capabilities

is, video-based path

planning and

obstacle avoidance

[a] - Indian Institute of Technology Bombay [b]- National Institute of Technology Karnataka, Surathkal

[c] - Video analytics consultant [d] Junior scientist, Flosolver

Object tracking using Radial basis function networks 1 A. Prem Kumar , T. N. Rickesh ,

In its simplest form, tracking can be defined as the problem of estimating the trajectory of an object in the image plane as it moves around a scene. A tracker assigns consistent labels to the tracked objects in different frames of a video. Additionally, depending on the

tracking domain, a tracker can also provide object-centric information, such as orientation, area, or shape of an object. Tracking objects can be complex due to:

  • Loss of depth information

  • Noise in images,

  • Complex object motion,

  • Non-rigid or articulated nature of objects,

  • Partial and full object occlusions,

  • Complex object shapes,

  • Scene illumination changes, and

  • Real-time processing requirements.

One can simplify tracking by imposing constraints on the motion and/or appearance of

objects. For example, almost

all tracking algorithms assume

that the

object motion is

smooth with no abrupt changes. One can further constrain the object motion to be of constant velocity or constant acceleration based on a priori information. Prior knowledge about the number and the size of objects, or the object appearance and shape, can also be incorporated. The foremost factor is the object, its representation, and modeling.

  • 3. Object Representation

Objects can be represented using their shapes and appearances. Here we describe the object shape representations commonly employed for tracking.

  • Points. The object is represented by a point, that

is, the centroid

or by

a

set

of

points. In general,

the point

representation is suitable for tracking objects that

occupy small regions in an image.

  • Primitive geometric shapes. Object shape is represented by a rectangle, ellipse, etc. Object motion for such representations is usually modeled by translation, affine, or projective transformation. Though primitive geometric shapes are more suitable for representing simple rigid objects, they are also used for tracking non-rigid objects.

  • Object silhouette and contour. Contour representation defines the boundary of an object. The region inside the contour is called the silhouette of the object. Silhouette and contour representations are suitable for tracking complex non-rigid shapes

  • 4. Object modeling

The purpose of modeling is to classify whether a pixel chosen belongs to the object or not. Some of the prominent features used for modeling are:

  • Templates: Templates are formed using simple geometric shapes or silhouettes. An advantage of a template is that it carries both spatial and appearance information. Templates, however, only encode the object appearance generated from a single view. Thus, they are only suitable for tracking objects whose poses do not vary considerably during the course of tracking.

In its simplest form, tracking can be defined as the problem of estimating the trajectory of
  • Probabilistic densities of object appearance: The probability density estimates of the object appearance can either be parametric, such as Gaussian and a mixture of Gaussians (for instance RBF networks), or nonparametric, such as histograms. The probability densities of object appearance features (color, texture) can be computed from the image regions specified by the shape models (interior region of an ellipse or a contour).

  • Histogram: It uses the color features of the image. Based on the histogram developed,

a

pixel can

be

decided whether it belongs

to object

or

not. Under

conditions in which background has similar color to that of object then classification can be based on a component color that can differentiate an object or non-object.

  • 5. Radial Basis Function Networks

A radial basis function network[2] is an artificial neural network that uses radial basis functions as activation functions. It is a linear combination of radial basis functions. The Radial basis function networks are neural nets consisting of three layers. The first input layer feeds data to a hidden intermediate layer. The hidden layer processes the data and transports it to the output layer. Only the tap weights between the hidden layer and the output layer are modified during training. Each hidden layer neuron represents a basis function of the output space, with respect to a particular center in the input space. The activation function chosen is commonly a Gaussian kernel. This kernel is centered at the point in the input space specified by the weight vector. The closer the input signal is to the current weight vector, the higher the output of the neuron will be. Radial basis function networks are used commonly in function approximation and series prediction.

  • 6. Description of Algorithm

6.1 Object background separation

The object is selected, and a white rectangle then marks the object domain. Another box is marked around the first one with surrounding region has equal number of pixels, which is used as the object background.

The object and background are separated from each other. The R-G-B based joint probability density function (pdf) of the object region and that of the background region is obtained. The region within the marked region is used to obtain the object pdf and using the marked background region the background pdf is obtained.

The Log-likelihood of a pixel considered in the object and background region is obtained as

 Probabilistic densities of object appearance : The probability density estimates of the object appearance can

where

h o

and

h b

are

the probabilities of

the

i th

pixel belonging to

the

object

or

the

background respectively, and є is small non-zero value to avoid numerical instability. A

binary

image

is then constructed by

giving a

considered to be on object or in the background.

threshold for which a particular

pixel

is

 Probabilistic densities of object appearance : The probability density estimates of the object appearance can
where τ is the threshold. 6.2 Feature Extraction We use the color features of pixels to

where τ 0 is the threshold.

  • 6.2 Feature Extraction

We use the color features of pixels to develop RBF based classifier. The result obtained by applying classifier on a pixel gives values 1 or +1. If the selected pixel belongs to object it is assigned +1, and if it belongs to the background -1.

  • 6.3 Developing Object Model

The object model is developed using a radial basis function (RBF) classifier called the „Object classifier‟ or „Object model‟. The object classifier classifies the pixels into object or background based on the output produced by the classifier. It is possible that with sufficient

number of neurons (second layer), any function can be approximated to any required level accuracy. Let µ i is a d-dimensional real vector, and σ i is a d-dimensional positive real vector, let them be the centre and the width of the Gaussian hidden neuron respectively, with α be

the output weights and N be the number of pixels.

The output with k neurons has the following form[1]:

where τ is the threshold. 6.2 Feature Extraction We use the color features of pixels to

The above equation can be rewritten in matrix form,

=

Y H α

where Y H is the matrix representation of the neuron. Each row in the matrix Y H contains the

coefficients with inputs U 1 , U 2 , U 3 …, U n . And µ and output weights are estimated analytically as

α= where (Y H ) is the pseudo inverse of Y H .

( Y H )

σ values are selected randomly. The

  • 6.4 Object Tracking

It is the process of tracing the path of an object from one frame to another in a video sequence. The centroid of the object is calculated from the output of the classifier. In the first frame where we select the object, we calculate the centroid of the object of that frame. Then we proceed to the next frame the new centroid for the object is calculated. If the

calculated new centroid is within є range (i.e. tolerance) of the previous f rame then the new

centroid is the assigned as the current object centroid and proceeds to the next f rame. Otherwise recursively find the new centroid till it is within є range of the previous centroid.

where τ is the threshold. 6.2 Feature Extraction We use the color features of pixels to

7. Implementation

This algorithm was implemented in C++ using OpenCV libraries[3]. The code flow is given:

7. Implementation This algorithm was implemented in C++ using OpenCV libraries[3]. The code flow is given:

Fig 1: Code Flow

7. Implementation This algorithm was implemented in C++ using OpenCV libraries[3]. The code flow is given:

8. Results

The algorithm is tested on various video samples. The results are given below, and the problems encountered during experiments are also noted.

8.1 Likelihood Results

The following figures show sources (Fig. 2(a), 3(a)) and their binary images (Fig. 2(b), 3(b)) based on likelihoods.

8. Results The algorithm is tested on various video samples. The results are given below, and

Fig. 2(a)

8. Results The algorithm is tested on various video samples. The results are given below, and

Fig. 2(b)

8. Results The algorithm is tested on various video samples. The results are given below, and

Fig. 3(a)

8. Results The algorithm is tested on various video samples. The results are given below, and

Fig. 3(b)

8.2 Classifier Results

The following figures show the results of the classifier. The first column figures (Fig. 4(a), 5(a)) show the object selection. The second column (Fig. 4(b), 5(b)) shows the corresponding binary images based on likelihoods, and the third set (Fig. 4(c), 5(c)) shows the binary images that are obtained from the classifier.

Fig. 4(a) Fig. 4(b) Fig. 4(c)
Fig. 4(a)
Fig. 4(b)
Fig. 4(c)
8. Results The algorithm is tested on various video samples. The results are given below, and
8. Results The algorithm is tested on various video samples. The results are given below, and
8. Results The algorithm is tested on various video samples. The results are given below, and
Fig. 5(a) Fig. 5(b) Fig. 5(c)
Fig. 5(a)
Fig. 5(b)
Fig. 5(c)

8.3 Tracking results

The following figures show tracking rectangle of the object and their respective binary images from the classifier.

Video frame

8.3 Tracking results The following figures show tracking rectangle of the object and their respective binary

Fig. 6(a)

8.3 Tracking results The following figures show tracking rectangle of the object and their respective binary

Fig. 8(a)

8.3 Tracking results The following figures show tracking rectangle of the object and their respective binary

Fig. 10(a)

Binary image

8.3 Tracking results The following figures show tracking rectangle of the object and their respective binary

Fig. 6(b)

8.3 Tracking results The following figures show tracking rectangle of the object and their respective binary

Fig. 8(b)

8.3 Tracking results The following figures show tracking rectangle of the object and their respective binary

Fig. 10(b)

Video Frame

8.3 Tracking results The following figures show tracking rectangle of the object and their respective binary

Fig. 7(a)

8.3 Tracking results The following figures show tracking rectangle of the object and their respective binary

Fig. 9(a)

8.3 Tracking results The following figures show tracking rectangle of the object and their respective binary

Fig. 11(a)

Binary Image

8.3 Tracking results The following figures show tracking rectangle of the object and their respective binary

Fig. 7(b)

8.3 Tracking results The following figures show tracking rectangle of the object and their respective binary

Fig. 9(b)

8.3 Tracking results The following figures show tracking rectangle of the object and their respective binary

Fig. 11(b)

Fig. 6(a), 7(a), 8(a), 9(a), 10(a), and 11(a) correspond to frame numbers 89, 172, 265, 316, 394 and 404 respectively.

8.4 Issues

The problems encountered in the tracking experiment are discussed below.

1) Similar background color: when the neighborhood of the object has color very close to that of the object, then the algorithm gives false detection white mark on the floor has misled the tracking.

8.3 Tracking results The following figures show tracking rectangle of the object and their respective binary
8.3 Tracking results The following figures show tracking rectangle of the object and their respective binary
Fig. 12(a) Fig. 12(b)
Fig. 12(a)
Fig. 12(b)
Fig. 12(c) Fig. 12(d) 2) Occlusion : When the tracking object (car in Fig. 13) is

Fig. 12(c)

Fig. 12(c) Fig. 12(d) 2) Occlusion : When the tracking object (car in Fig. 13) is

Fig. 12(d)

2) Occlusion: When the tracking object (car in Fig. 13) is completely covered by any other surrounding environment (tree in Fig. 13) then the object information is lost thereby leading to failure of tracking.

Fig. 12(c) Fig. 12(d) 2) Occlusion : When the tracking object (car in Fig. 13) is

Fig. 13(a)

Fig. 12(c) Fig. 12(d) 2) Occlusion : When the tracking object (car in Fig. 13) is

Fig. 13(c)

Fig. 12(c) Fig. 12(d) 2) Occlusion : When the tracking object (car in Fig. 13) is

Fig. 13(b)

Fig. 12(c) Fig. 12(d) 2) Occlusion : When the tracking object (car in Fig. 13) is

Fig. 13(d)

3) Intensity change: When the intensity of the light changes (i.e. change in lighting effects) the color of object changes. Fig. 14(a), 15(a) are video frames and 14(b), 15(b) are their binary images respectively. The performance of the classifier, designed originally for different lighting conditions, would degrade. This can be clearly seen in the corresponding binary images.

Fig. 12(c) Fig. 12(d) 2) Occlusion : When the tracking object (car in Fig. 13) is
Fig. 14(a) Fig. 15(a) Fig. 14(b) Fig. 15(b) 9. Conclusions and future enhancements A robust object-tracking

Fig. 14(a)

Fig. 14(a) Fig. 15(a) Fig. 14(b) Fig. 15(b) 9. Conclusions and future enhancements A robust object-tracking

Fig. 15(a)

Fig. 14(a) Fig. 15(a) Fig. 14(b) Fig. 15(b) 9. Conclusions and future enhancements A robust object-tracking

Fig. 14(b)

Fig. 14(a) Fig. 15(a) Fig. 14(b) Fig. 15(b) 9. Conclusions and future enhancements A robust object-tracking

Fig. 15(b)

9. Conclusions and future enhancements

A robust object-tracking algorithm using Radial Basis Function (RBF) networks has been implemented using OpenCV libraries. The pixel-based color features are used to develop classifiers. The algorithm has been tested on various video samples under different conditions, and the results are analyzed. The cases where the tracking algorithm fails are also shown along with possible reasons. The RBF networks could be redesigned to incorporate adaptive mechanisms for light variations and varying object domain, thresholds, scale changes, and multiple camera-feeds.

Acknowledgement: We thank

Dr.

U.

N.

Sinha (Head, Flosolver) for his constant

encouragement and inspiration. Without his support and guidance, this work would not have

been carried out.

References

[1]

R Venkatesh Babu, S Suresh, and Anamit ra Makur, “Robust Object Tracking with

[2]

Radial Basis Function Networks”, volume I, page 937-940, ICASSP, 2007. Simon Haykin Neural Networks, 2 nd Edition, 1999 Prentice Hall International

[3]

Publication. Gary Bradski and Adrian Kaebler - Learning OpenCV, 1 s t Edition, 2008, O‟Reilly.

Fig. 14(a) Fig. 15(a) Fig. 14(b) Fig. 15(b) 9. Conclusions and future enhancements A robust object-tracking
Page 11
Page 11