Page 1

Object tracking using Radial basis function networks

A. Prem Kumar


, T. N. Rickesh[b], R. Venkatesh Babu[c ], R. Hariharan[d]

Abstract: The applications of visual tracking are broad in scope ranging f rom surveillance and monitoring to smart rooms. A robust object-tracking algorithm using Radial Basis Function (RBF) networks has been implemented using OpenCV libraries. The pixel-based color features are used to develop classifiers. The algorithm has been tested on various video samples under different conditions, and the results are analyzed.

1. Introduction
The objective of tracking is to follow the target object in successive video frames. The major utility of such algorithm is in the design of video surveillance system to tackle terrorism. For instance, large-scale surveillance might have played a crucial role in preventing (or tracking the trails of terrorism) 26/11 terrorist attacks in Mumbai, many bomb blasts in Kashmir, North-east Indian region, and other parts of India. It is important to have a robust object-tracking algorithm. Since neural network f ramework does not require any assumptions on structures of input data, they have been used in the field of pattern recognition, image analysis, etc. The Radial Basis Function (RBF) based neural network is one of many ways to build classifiers. A robust algo rithm for object tracking using RBF networks was described in the paper [1]. We have implemented that algorithm using OpenCV libraries so that this module can be integrated into a large surveillance system.

2. Object Tracking
Object tracking is an important task w ithin the field of computer vision. The growth of high-performance computers, the availability of high quality yet inexpensive video cameras, and the increasing need for automated video analysis has generated a great deal of interest in object tracking algorithms. There are three key steps in video analysis: detection of interesting moving objects, tracking of such objects from frame to frame, and analysis of tracks to recognize their behavior. The object tracking is pertinent in the tasks of:  Motion-based recognition, that is, human identification based on gait, automatic object detection, etc.  Automated surveillance, that is, monitoring a scene to detect suspicious activities or unlikely events  Video indexing, that is, automatic annotation and retrieval of the videos in multimedia databases  Human-computer interaction, that is, gesture recognition, eye gaze tracking for data input to computers, etc.  Traffic monitoring, that is, real-time gathering of traffic statistics to direct traffic flow  Vehicle navigation that is, video-based path planning and obstacle avoidance capabilities

[a] - Indian Institute of Technology Bombay [b]- National Institute of Technology Karnataka, Surathkal

[c] - Video analytics consultant [d] – Junior scientist, Flosolver

Page 2

In its simplest form, tracking can be defined as the problem of estimating the trajectory of an object in the image plane as it moves around a scene. A tracker assig ns consistent labels to the tracked objects in different frames of a video. Additionally, depending on the tracking domain, a tracker can also provide object-centric information, such as orientation, area, or shape of an object. Tracking objects can be complex due to:         Loss of depth information Noise in images, Complex object motion, Non-rigid or articulated nature of objects, Partial and full object occlusions, Complex object shapes, Scene illumination changes, and Real-time processing requirements.

One c an simplify tracking by imposing constraints on the motion and/or appearance of objects. For example, almost all tracking algorithms assume that the object motion is smooth w ith no abrupt changes. One can further constrain the object motion to be of constant velocity or constant acceleration based on a priori information. Prior know ledge about the number and the size of objects, or the object appearance and shape, can also be incorporated. The foremost factor is the object, its representation, and modeling.

3. Object Representation
Objects can be represented using their shapes and appearances. Here we describe the object shape representations commonly employed for tracking.  Points. The object is represented by a point, that is, the centroid or by a set of points. In general, the point representation is suitable for tracking objects that occupy small regions in an image.  Prim itive geometric shapes. Object shape is represented by a rectangle, ellipse, etc. Object motion for such representations is usually modeled by translation, affine, or projective transformation. Though primitive geometric shapes are more suitable for representing simple rigid objects, they are also used for tracking non-rigid objects.  Object silhouette and contour. Contour representation defines the boundary of an object. The region inside the contour is called the silhouette of the object. Silhouette and contour representations are suitable for tracking complex non-rigid shapes

4. Object modeling
The purpose of modeling is to classify whether a pixel chosen belongs to the object or not. Some of the prominent features used for modeling are:  Templates: Templates are formed using simple geometric shapes or silhouettes. An advantage of a template is that it carries both spatial and appearance information. Templates, however, only encode the object appearance generated from a single view. Thus, they are only suitable for tracking objects whose poses do not vary considerably during the course of tracking.

Page 3

 Probabilistic densities of object appearance: The probability density estimates of the object appearance can either be parametric, such as Gaussian and a mixt ure of Gaussians (for instance RBF networks), or nonparametric, such as histograms. The probability densities of object appearance features (color, texture) can be computed from the image regions specified by the shape models (interior region of an ellipse or a contour).  Histogram: It uses the color features of the image. Based on the histogram developed, a pixel can be decided w hether it belongs to object or not. Under conditions in which background has similar color to that of object then classification can be based on a component color that can differentiate an object or non-object.

5. Radial Basis Function Networks
A radial basis function network[2] is an artificial neural network that uses radial basis functions as activation functions. It is a linear combination of radial basis functions. The Radial basis function networks are neural nets consisting of three layers. The f irst input layer feeds data to a hidden intermediate layer. The hidden layer processes the data and transports it to the output layer. Only the tap weights between the hidden layer and the output layer are modif ied during training. Each hidden layer neuron represents a basis function of the output space, with respect to a particular center in the input space. The activation function chosen is commonly a Gaussian kernel. This kernel is centered at th e point in the input space specified by the weight vector. The closer the input signal is to the current weight vector, the higher the output of the neuron w ill be. Radial basis function networks are used commonly in function approximation and series prediction.

6. Description of Algorithm
6.1 Object bac kground separation The object is selected, and a white rectangle then marks the object domain. Another box is marked around the first one w ith surrounding region has equal number of pixels, w hich is used as the object background. The object and background are separated from each other. The R-G-B based joint probability density function (pdf) of the object region and that of the background region is obtained. The region within the marked region is used to obtain the object pdf and using the marked background region the background pdf is obtained. The Log-likelihood of a pixel considered in the object and background region is obtained as

where ho and hb are the probabilities of the ith pixel belonging to the object or the background respectively, and є is small non-zero value to avoid numerical instability. A binary image is then constructed by giving a threshold for which a particular pixel is considered to be on object or in the background.

Page 4

where τ 0 is the threshold. 6.2 Feature Extraction We use the color features of pixels to develop RBF based classifier. The result obtained by applying classifier on a pixel gives values –1 or +1. If the selected pixel belongs to object it is assigned +1, and if it belongs to the background -1. 6.3 Developing Object Mode l The object model is developed using a radial basis function (RBF) classifier called the „Object classifier‟ or „Object model‟. The object classifier classif ies the pixels into object or background based on the output produced by the classifier. It is possible that with sufficient number of neurons (second layer), any function can be approximated to any required level accuracy. Let µi is a d-dimensional real vector, and σi is a d-dimensional positive real vector, let them be the centre and the w idth of the Gaussian hidden neuron respectively, with α be the output weights and N be the number of pixels. The output with k neurons has the following form[1]:

The above equation can be rewritten in matrix form, Ỳ = YH α where Y H is the matrix representation of the neuron. Each row in the matrix Y H contains the coefficients with inputs U1 , U2 , U3 …, Un. And µ and σ values are selected randomly. The output weights are estimated analytically as α= ( YH )† Ỳ † where (YH ) is the pseudo inverse of Y H. 6.4 Object Trac king It is the process of tracing the path of an object from one frame to another in a video sequence. The centroid of the object is calculated from the output of the classifier. In the first frame where we select the object, we calculate the centroid of the object of that frame. Then we proceed to the next frame the new centroid for the object is calculated. If the calculated new centroid is w ithin є range (i.e. tolerance) of the previous f rame then the new centroid is the assigned as the current object centroid and proceeds to the next f rame. Otherwise recursively find the new centroid till it is within є range of the previous centroid.

Page 5

7. Implementation
This algorithm was implemented in C++ using OpenCV libraries[3]. The code flow is given:

Fig 1: Code Flow

Page 6

8. Results
The algorithm is tested on various video samples. The results are given below, and the problems encountered during experiments are also noted. 8.1 Like lihood Results The following figures show sources (Fig. 2(a), 3(a)) and their binary images (Fig. 2(b), 3(b)) based on likelihoods.

Fig. 2(a)

Fig. 2(b)

Fig. 3(a)

Fig. 3(b)

8.2 C lassifie r Results The follow ing figures show the results of the classifier. The first column f igures (Fig. 4(a), 5(a)) show the object selection. The second column (F ig. 4(b), 5(b)) shows the corresponding binary images based on likelihoods, and the third set (Fig. 4(c), 5(c)) shows the binary images that are obtained from the classifier.

Fig. 4(a)

Fig. 4(b)

Fig. 4(c)

Fig. 5(a)

Fig. 5(b)

Fig. 5(c)

Page 7

8.3 Trac king results The follow ing figures show tracking rectangle of the object and their respective binary images from the classifier. Vi deo frame Binary i mage Vi deo Frame Binary Image

Fig. 6(a)

Fig. 6(b)

Fig. 7(a)

Fig. 7(b)

Fig. 8(a)

Fig. 8(b)

Fig. 9(a)

Fig. 9(b)

Fig. 10(a)

Fig. 10(b)

Fig. 11(a)

Fig. 11(b)

Fig. 6(a), 7(a), 8(a), 9(a), 10(a), and 11(a) correspond to frame numbers 89, 172, 265, 316, 394 and 404 respectively. 8.4 Issues The problems encountered in the tracking experiment are discussed below. 1) Similar bac kground color: when the neighborhood of the object has color very close to that of the object, then the algorithm gives false detection – white mark on the f loor has misled the tracking.

Fig. 12(a)

Fig. 12(b) Page 8

Fig. 12(c)

Fig. 12(d)

2) Occlusion: When the tracking object (car in F ig. 13) is completely covered by any other surrounding environment (tree in Fig. 13) then the object information is lost thereby leading to failure of tracking.

Fig. 13(a)

Fig. 13(b)

Fig. 13(c)

Fig. 13(d)

3) Intensity change: When the intensity of the light changes (i.e. change in lighting effects) the color of object changes. Fig. 14(a), 15(a) are video frames and 14(b), 15(b) are their binary images respectively. The performance of the classifier, designed originally for different lighting conditions, would degrade. This can be clearly seen in the corresponding binary images.

Page 9

Fig. 14(a)

Fig. 14(b)

Fig. 15(a)

Fig. 15(b)

9. Conclusions and future enhancements
A robust object-tracking algorithm using Radial Basis Function (RBF) networks has been implemented using OpenCV libraries. The pixel-based color features are used to develop classifiers. The algorithm has been tested on various video samples under differe nt conditions, and the results are analyzed. The cases where the tracking algorithm fails are also shown along with possible reasons. The RBF networks could be redesigned to incorporate adaptive mechanisms for light variations and varying object domain, thresholds, scale changes, and multiple camera-feeds. Acknowle dgement: We thank Dr. U. N. Sinha (Head, Flosolver) for his constant encouragement and inspiration. Without his support and guidance, this work would not have been carried out.

[1] [2] [3] R Venkatesh Babu, S Suresh, and Anamit ra Makur, “ Robust Object Tracking with Radial Basis Function Networks”, volume I, page 937-940, ICASSP, 2007. Simon Haykin – Neural Networks, 2nd Edition, 1999 Prentice Hall International Publication. Gary Bradski and Adrian Kaebler - Learning OpenCV, 1s t Edition, 2008, O‟Reilly.

Page 10

Page 11