You are on page 1of 6

Proceedings of the 2008 IEEE International Conference on Information and Automation June 20 -23, 2008, Zhangjiajie, China

Multiple Moving Target Detection, Tracking, and Recognition from a Moving Observer
Fenghui Yao and Ali Sekmen Mohan J. Malkani
Department of Computer Science Department of Electric and Computer Engineering Tennessee State University 3500 John A Merritt Blvd, Nashville, TN 37215, USA {fyao, asekmen, mmalkani }@tnstate.edu Abstract - This paper describes an algorithm for multiple moving targets detection, tracking and recognition from a moving observer. When the camera is placed on a moving observer, the whole background of the scene appears to be moving and the actual motion of the targets must be distinguished from the background motion. To do this, an affine motion model between consecutive frames is estimated, and then moving targets can be extracted. Next, the target tracking employs a similarity measure which is based on the joint feature-spatial space. At last, the target recognition is performed by matching moving targets with target database. The average processing time is 680 ms per frame, which corresponds to a processing rate of 1.5 frames per second. The algorithm was tested on the Vivid datasets provided the Air Force Research Laboratory and experimental results show that this method is efficient and fast for real-time application. I. INTRODUCTION

Detection and tracking of moving objects in an image sequence is one of the basic tasks in computer vision. The detected moving object trajectory can be either of interest in its own or used as the input for a high level analysis such as motion pattern understanding, moving behavior recognition and so on. Applications include surveillance, homeland security, protection of vital infrastructure, and advanced human-machine communication. Therefore, moving objects detection and tracking has received more and more attention, and many algorithms have been proposed. Among these, one interesting approach is the Particle filter [1], which has been used and extended many times [2] [3] [4]. Particle filter was developed to track objects in clutter, where the posterior density and observation density are often non-Gaussian. The key idea of particle filtering is to approximate the probability distribution by a weighted sample set. Each sample consists of an element which represents the hypothetical state of an object and a corresponding probability. The state of an object may be control points of a contour [1], the position, shape and motion of an elliptical region [2], or specific model parameters [3]. That is, these methods [2] [3] are based on models. Ross’s approach [4] is a model-free, statistical detection method which use both edge and color information. The common assumption of these methods [1] [2] [3] is that the background does not move, and the image sequences are from a stationary camera. Tian et al [5] developed a real-time algorithm to detect salient motion in complex environments by combining

temporal difference imaging and temporal filtered optical flow. The image sequence used in this method is also from stationary camera. The works of Smith and Brady [7] and Kang et al [6] employed the image sequences from moving platform. Kang et al developed an approach for tracking of moving objects observed by both stationary and Pan-Tilt-Zoom cameras. Smith and Brady’s approach employed the image sequence from a camera mounted on a vehicle to detect other moving vehicle. This method used special-purpose hardware to implement the real-time target detection and tracking. COMETS system detects the target from a moving observer (an autonomous helicopter but does not perform tracking [8]. Yang et al’s tracker works for image sequence form both stationary and moving platform but it detects and track single target [9]. Literature [10] proposes a detection-based multiple object tracking, literature [11] shows a multiple object tracking method based on multiple hypotheses graph representation, and literature [12] demonstrates a distributed Bayesian multiple target tracker. However they all employ image sequences from stationary observers. As shown above, there are a few works to discuss the multiple moving target detection and tracking from the moving observer. And also few work deals with target recognition at same time. This paper introduces a method for moving target detection, tracking, and recognition from a moving observer. II. MOVING T ARGET DETECTION FROM A MOVING OBSERVER The entire configuration is shown in Fig. 1. The output of the moving target detection is sent to the target tracking. The tracked targets are sent to target recognition. This section describes moving target detection, target tracking and target recognition is discussed in Section 3 and 4, respectively. The moving observer usually means a camera mounted on a ground vehicle or on an airborne platform such as a helicopter or an unmanned aerial vehicle (UAV). In this work, the video sequences are generated by an airborne camera. In airborne video, everything (target and background) appear to be moving over time due to the camera motion. Before employing frame differencing (simple motion detection method for stationary platforms) to detect motion images, it is necessary to conduct motion compensation first. Two-frame background motion

978-1-4244-2184-8/08/$25.00 © 2008 IEEE.

978

estimation is achieved by fitting a global parametric motion model (affine or projective) to sparse optic flow. Here, we use affine transformation model. Moving Target Detection

will generate accurate and reliable model if the moving targets constitute a small area (i.e., less than 50%). In airborne video camera, this requirement can be easily satisfied. C. Target Detection

Target Database Recognition Updating

The

frame

difference

is

generated

according

to

Fdiff = Fi − ω s × Fi −1 , where Fi −1 and Fi is previous frame and

Moving Target Tracking

Target Recognition

Fig. 1 System Configuration

A. Optic Flow Detection Sparse optic flow is obtained by applying Lucas-Kanade algorithm [13]. The number of optic flow is controlled in the range of 200 to 1000. Other methods such as matching Harris corners, Moravec feature, SUSAN corners between frames, or matching SIFT features are all applicable here. The main factors need to be considered are computation cost and robustness. Experiment results show that Lucas-Kanade method is most reliable and pretty fast. B. Affine Parameter Estimation 2-D affine transformation is described as follows,
§ X i · § a1 ¨ ¨Y ¸ ¸=¨ ¨ © i ¹ © a3 a2 ·§ xi · § a5 · ¸¨ ¨ ¸ ¸+¨ ¨ ¸ ¸, a4 ¸ ¹© yi ¹ © a6 ¹

current frame, respectively. After being binarized, a set of morphological operations containing dilation, white-blob detection, blob filtering, blob merging are applied to the binary difference image Fdiff.. After these processings, the blobs are target candidates. For blobs, their contour Ck, center Pck ( xk , y k ) , hull Hk, affine transformation model ωs , and minimal circumscribed rectangle Rk are passed to the next stage for tracking and recognition, where k = 1, 2, …, K, and K is the number of target candidates (see Fig. 4 for target detection results). Merge Missing detection False detection Split
Frame # i-5

New
i-4 i-3 i-2

Disappearing
i-1 i

Fig. 2 Graph structure in multiple object tracking.

(1)

III. MULTIPLE TARGETS TRACKING The multiple targets tracking algorithm accepts the target candidates from target detection sub-system, and keeps multiple target trajectories in a graph structure, as shown in Fig. 2. The tracker employs the similarity measure that is based on the joint feature-spatial space. It consists of (i) similarity measure generation, and (ii) tracking history management. A. Join Feature-Spatial Spaces and Similarity Measure Let ui be a d-dimensional feature vector at image location xi, and S={yi = (xi, ui )} (i = 1, …, N) be samples from a image region. The estimate of the probability at (x, u) in joint space is:

where (xi, yi) are locations of feature points in previous frame, and (Xi, Yi) are locations of feature points in current frame. Theoretically, to determine six affine parameters, three pairs of matched feature points are enough. How to select these three pairs of feature points will affect the precision of affine parameter estimation. To reduce this estimation error, these parameters can be solved in the least-squares method based on all matched feature points. However the computation cost in least-squares method is heavy. To reduce the computation time and estimation error, this work use the algorithm similar to LMedS (Lest Median Square) [14]. Details are as follows. (i) Randomly select N pairs of matched feature points from previous frame and current frame. And further, randomly select M triplets from N pairs of matched feature points, where M << N . Each triplet determines an affine transformation (six parameters). Let ωk represent k-th affine transform, where k = 1, 2, …, M. (ii) For ωk , all feature points in previous frame are transformed to the current frame. The affine transform error is defined as

ˆ ( x, u) = 1 P N

¦ Kσ ( x − xi )Gh (u − ui ) ,
i =1

N

(2)

ˆ − ω × P , where n = 1, 2, …, N, P ε k = ¦n P i i k i

ˆ in current frame. is the feature point in previous frame, and P i The s-th affine transform ωs , which is corresponding to ε s = min{ε1 , ε 2 ,..., ε M } , is considered as the global parametric motion model. Literature [14] has shown that above method

where Kı is a 2-dimensional kernel with a bandwidth ı, and Gh is a d-dimensional kernel with a bandwidth h. The bandwidth in the spatial dimensions represents the variability in feature location due to the local deformation or measurement uncertainty, while the bandwidth in the feature dimensions represents the variability in value of feature. Given two distributions with samples Ix = {(xi, ui)} (i = 1, …, N) and Iy = {(yj, vj)} (j = 1, …, M), the similarity measure between Ix and Iy is defined as:

979

J ( I x, I y ) =

1 M

ˆ (y ,v ) ¦P x j j
i =1 N M 2·

M

§ x −y 1 i j = Kσ ¨ ¦¦ MN i =1 j =1 ¨ σ ©

§ ¸G ¨ ui − v j h ¸ ¨ h ¹ ©

(3)

¸. ¸ ¹

(a) Four targets extracted from aerial image (from left to right: gray truck, red sedan, blue sedan, and white sedan)

Similarity Using 500 Samples
0.1 2 0.1 0

Similarity

J(Ix, Iy) is symmetric and bounded by zero and one. This similarity is based on the average separation criterion in cluster analysis [15] except that it employs the distance with a kernelized one. This similarity measure has been applied for a single target tracking [16] [17]. B. Modified Similarity Measure for Multiple Target Tracking In multiple target tracking, the similarity between the target Tk represented by the hull Hk in (t-1)-th frame and the target Tl in the t-th frame depends on not only joint feature-spatial space but also the distance between them. Therefore the similarity measure in Eq.(3) is modified as follows.
t −1,t (Tk ,Tl ) = S kl t −1,k tl ,Iy ) J (I x tl t −1 Pc − ω s × Pct −1,k

GT-RS GT-GS

GT-BS GT-GT

0.08 0.06 0.04 0.02 0.00 0 1 0 20 30 40 50 60 70 80 90 1 00 1 1 01 20 1 30 1 40 1 50 1 60 1 70

theta
(b) Similarity measures between targets in (a) Fig. 3 Robustness of similarity measure.

,

(4)

t −1,k where I x , Pct −1,k is the distribution of target samples inside tl the hull Hk and the target center in (t-1)-frame, I tl y and P c is

the distribution of target samples inside Hl and the target center t −1 ωs is the affine in t-th frame, respectively, and transformation model from (t-1)-th frame to t-th frame. To verify the robustness of this similarity measure, four targets extracted from aerial images as shown in Fig. 3 (a), which are gray truck (GT), red sedan (RS), blue sedan (BS), and gray sedan (GS) from left to right, are employed for similarity testing. These four targets are rotated in range of 0° to 180°, with 5° increment in each rotation. The similarity measure between these generated images and the gray truck in Fig. 3 (a) are calculated by using 500 random sample points from the each image. The similarity measures for GT-RS, GTBS, GT-GS, and GT-GT are sown in Fig.3 (b). The similarity measure variance for GT-RS, GT-BS, GT-GS, and GT-GT matching is 4.34 × 10-6, 1.04 × 10-5, 1.29 × 10-5, and 5.04 × 10-5, respectively. These results show that the similarity in Eq. (4) is robust to the rotation and scaling. To reduce the computation time, there is no need to use all points inside the target hull. The sample points can be chosen randomly from samples inside the target hull. C. Tracking Graph Management The multiple targets tracker needs to handle all problems as listed in Fig.2. The algorithms to deal with these problems are as follows.

1) Missing detection prediction: The targets which are under tracking till the frame right before the current frame may be missed at the current frame because of the failure of detector. Missing detections at i-th frame, they are estimated from the detection results obtained in image frames prior to the current frame, by applying estimators. According to the position and velocity of the target in previous image frames, its new position and velocity in the new frame can be estimated by Kalman filter, recursive Bayesuan estimator, or particle filter. In this work, Kalman filter is employed. From previous state ˆ i−1,k ) , the next state ( x ik , y ik , v ˆ ik ) ˆi −1,k , i−Δθ ˆik ,θ ( x i −1,k , y i −1,k , i−Δ v
c c c c

is the center of the target Tk is estimated, where (which is missing at i-th frame) at (i-1)-th frame, and ˆ i−1,k ) is the average velocity and direction over ˆ i −1,k , i−Δθ ( i −Δ v passed ǻ frames. Fig. 4 (c) shows a missing detection at frame 21, which will be estimated. 2) New target detection: New targets usually appear at the four surroundings but not interior area. If a target is detected and tracked over ǻ2 frames, it is considered as a new target. Currently, four surroundings with the size of 20-pixel are cleared to zero, to remove the pixels that are not involved in generating frame difference. Toward inside, four surroundings with size of 40-pixel are the area that new target may emerge. Fig. 4 (a) shows 3 newly detected targets at frame 6. 3) False detection filtering: For targets that emerge in the inside area of the image, and are not linked to the targets in previous frame or next frame, they are false detection. They are filtered out. Fig.4 (b) shows a false detection at frame 9, which will be filtered out.

i −1,k i −1,k ( xc , yc )

980

(a)

(a)

(b)

(b)

(c) Fig. 4 Target detection results. (a) Three targets (grey truck, grey sedan, and red sedan) are detected at frame 6; (b) Three targets and false detection (lower red ellipse) at frame 9; (c) Missing detection (red sedan) at frame 21.

(c) Fig. 5 Target detection results. (a) Merging detection at frame 162; (b) Mask image showing target merging at frame 162; (c) Trajectory for six targets from frame 1 to frame 162.

981

4) Disappearing detection: For targets that are close to the four surroundings, if they are not detected and tracked for ǻ2 frames, they disappear from the monitor range of the camera. 5) Merge detection: For two or more targets in previous frame, if they are linked to the same target in the current frame, target merging occurs. In this case the graph manager will separate them. Fig.5 (a) shows target merging and (b) shows its mask image (merging detection is marked by the red circle in the middle) for another input image sequence. The principal axis of the mask image for the merging targets, is calculated and is used as the boundary to separate the merged targets. 6) Split detection: For a target in previous frame, it is linked to two targets, and further the split targets keep track for ǻ2 frames, then split occurs. The target graph manager maintains the trajectory of each target. Fig. 5 (c) shows the target trajectory from frame 1 to frame 162, for the six targets. IV. TARGET RECOGNITION As indicated in Fig.1, the moving target recognition subsystem accepts the tracked targets from the tracker. For each target, it performs matching with the target pattern in database. The target database stores the target name, target region represented by its hull, and image data. For image data

of target, the pixels beyond the hull region are cleared to zeros (refer to Fig. 3 (a)). The similarity measure for target recognition is based on Eq. (3). For the recognized target, this subsystem output the target name. For unknown target, this subsystem will register it to the database. For the recognized target, its model image data is updated. V. EXPERIMENT RESULTS The above algorithm is implemented by using MS-Visual C++ 6.0 and Intel Open CV, running on Windows platform. ǻ used in missing detection is set at 3, and ǻ2 for new target detection and disappearing target detection is set at 5. The calculation for modified similarity measure employs 500 randomly selected pixels inside target hull, and HSV feature is used in Eq. (4). The test video sequences are from AFRL Vivid database. Fig. 6 shows some target detection and tracking results. First column from left shows the detected and tracked target (shown by global number and circled by green ellipses) till frame 48. The second column shows the tracking is lost because of the dynamic observer movement (red ellipses shows the detected targets). The third column shows the five targets under tracking and a newly detected target (shown by yellow ellipse). The fourth column shows the target merging, which is split into two targets. Fig. 7 shows some target tracking and

Frame 48

Frame 108

Frame 198

Frame 342

Fig. 6 Target detection and tracking results at frame 48, 108, 198, and 342, respectively.

Frame 30

Frame 144

Frame 244

Frame 636

Fig.7 Target tracking and recognition results at frame 30, 144, 244, and 636, respectively.

982

recognition result. From left to right: (i) blue sedan, (ii) gray pick-up truck (iii) white sedan and gray pick-up truck, and (iv) white sedan and gray pick-up truck. In (iii), the gray pick-up truck is wrongly recognized as a blue sedan because it is partially hidden by trees, and in (iv) white sedan and gray pickup truck are both wrongly recognized as blue sedan because they are both partially hidden by trees. The average execution time for target detection, tracking and recognition, on a Windows Vista machine mounted with a 2.33GHz Intel Core2 CPU and 2GB memory, are shown in Table 1.
TABLE 1 AVERAGE PROCESSING TIME FOR TARGET DETECTION, TRACKING AND RECOGNITION

Processing task Target detection Target tracking and recognition

Time (ms) 316.1 363.7

VI. CONCLUSIONS This paper proposed an algorithm for multiple moving target detection, tracking and recognition from a moving observer. The moving observer is a manned/unmanned aerial vehicle mounted with a camera. The proposed algorithm first estimate the motion model between two consecutive image frames, which is used to remove the moving background. Then it employs a similarity measure for target tracking based on joint feature-spatial space. The joint feature-spatial space is HSV feature and geometry information. The similarity calculation employs 500 randomly selected pixels. On a Windows Vista machine mounted with a 2.33GHz Intel Core2 CPU and 2GB memory, the average processing time is 680 ms. It leads to 1.5 frame/s processing rate. The experiment results show the proposed algorithm is efficient and fast. ACKNOWLEDGEMENT This work was partially supported by a grant from AFRL under Minority Leaders Program, contract No. TENN 06-S567-07C2. Also, the authors would like to thank AFRL for providing the datasets used in this research. REFERENCES
[1] M. Isard and A. Blake, “CONDENSATION – Conditional Density Propagation for Visual Tracking,” International Journal on Computer Vision, vol. 1, no. 29, pp.5-28, 1998. [2] K. Nummiaro, E. Koller-Meier, and L. V. Gool, “An Adaptive Color-based Particle Filter,” Image and Vision Computing, vol. 21, 2002, pp.99-110. [3] D. Tweed and A. Calway, “Tracking Many Objects Using Subordinated Condensation,” in 13th British Machine Vision Conference (BMVC 2002), 2002. [4] M. Ross, “Model-free, Statistical Detection and Tracking of Moving Objects,” in 13th International Conference on Image Processing (ICIP 2006), Atlanta, GA, Oct.8-11, 2006. [5] Y. L. Tian and A. Hampapur, “Robust Salient Motion Detection with Complex Background for Real-time Video Surveillance,” IEEE Computer Society Workshop on Motion and Video Computing, Breckenridge, Colorado, Jan. 5-6, 2005.

[6] J. Kang, I. Cohen, G. Medioni, and C. Yuan, “Detecction and Tracking of Moving Objects from a Moving Platform in Presence of Strong parallax,” IEEE international Conference on Computer Vision (ICCV), Beijing, China, Oct. 2005. [7] S.M. Smith and J.M. Brady, “ASSET-2: Real-time Motion and Shape Tracking,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, No. 8, Aug. 1995. [8] A. Ollero, J. Ferruz, et al, “Motion Compensation and Object Detection for Autonomous helicopter Visual Navigation in COMETS system,” in Proceedings of IEEE International Conference on Robotics and Automation, New Orleans, LA, USA, April 26 – May 1, 2004. [9] C. Yang, R. Duraiswami and L. Davis, “Efficient Mean-Shift Tracking via a New Similarity measure,” in Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, CVPR 2005, San Diego, CA, USA, June 20-25, pp.176-183. [10] M. Han, A. Sethi and Y. Gong, “A Detection-based Multiple Object Tracking Method,” in Proceedings of 2004 IEEE International Conference on Image Processing (ICIP 2004), Singapore, October 24-27, 2004. [11] A. Chia, W. Huang and L. Li, “Multiple Objects Tracking with Multiple Hypotheses Graph Representation,” in Proceedings of the 18th International Conference on pattern Recognition, ICPR’06, August 20 – 24, Hong Kong. [12] W. Qu, D. Schonfeld, and M. Mohamed, “Distributed Bayesian Multiple-Target Tracking in Crowded Environments Using Multiple Collaborative Cameras,” EURASIP Journal on Advances in Signal Processing, Vol. 2007, Article ID 38373. [13] B.D. Lucas and T. Kanade, “An Interactive Image Registration Technique with an Application in Stereo Vision,” in 7th International Joint Conference on Artificial Intelligent, 1981, pp.674-679. [14] S. Araki, T. Matsuoka, et al, “Real-time Tracking of Multiple Moving Object Contours in a Moving Camera Image Sequence,” IEICE transactions on information and systems, Vol. E83-D, No. 7, July 2000. [15] A. R. Webb, “Statistical Pattern Recognition,” John Weley & Sons, UK, 2nd Edition, 2002. [16] A. Elgammal, R. Duraiswami, and L. S. Davis, “Probabilistic Tracking in Joint Feature-Spatial Spaces,” Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Wisconsin, USA, June 16-22, 2003. [17] C. Yang, R. Duraiswami and L. Davis, “Efficient Mean-Shift Tracking via a New Similarity Measure,” Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, USA, June 20-25, 2005.

983