You are on page 1of 6

Proceedings of the 2001 IEEE Inteinationd Conference 00 Robotics &Automation Taipei, Taiwan, September 14-19, 2001

Robust Model-based 3D Object Recognition by Combining Feature Matching with Tracking


Sungho Kim, Inso Kweon
Dept. of Electrical Engineering & Computer Science Korea Advanced Institute of Science and Technology 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701 Korea shkim@rcv.kaist.ac.kr, iskweon@kaist.ac.kr

..

Incheol Kim

L .

Agency for Defense Development Yuseong P.O. BOX 35, Daejon 305-152 Korea cheol@add.re.kr such as local differential invariants, SIFT (scale invariant feature transform) and eigen window have been system suggested [6-91. But these approaches have shown some limited success to some problems like illumination change. We propose a unified framework of object recognition and tracking for a service robot to work in indoor environments. To solve the object variations, we propose modified local Zemike moments robust to various environmental changes such as illumination changes and pose changes. The recognized object can he tracked using Lie group methodologies to simplify representation and computation of a motion. The initial 3D CAD model alignment is performed using the homography computed in the process of object recognition.
11. RECOGNITION AND TRACKING

Abstrac: - We propose a vision based 3D object recognition and tracking system, which provides high level scene descriptions such as object identification and 3D pose information. The system is composed of object recognition part and real-time tracking part. In object recognition, we propose a feature which is robust to scale, rotation, illumination change and background clutter. A probabilistic voting scheme maximizes the conditional probability defined by the features in correspondence to recognize an object of interest. As a result of object recognition, we obtain the homography between the model image and the input scene. In tracking, a Lie group formalism is used to east the motion computation problem into simple geometric terms so that tracking becomes a simple optimization problem. An initial object pose is estimated using correspondences between the . model image and the 3D CAD model which are predefined and the homography which relates the model image to the input scene. Results from the experiments show the robustness of the proposed system. I. INTRODUCTION
One of the most important tasks for an intelligent service robot is to identify objects of interest and to track them in indoor environment. If a mobile robot is commanded to bring a certain object, it has to recognize what objects are in the scene in advance, then track the recognized object for further actions such as grasping and manipulation. There have been many methods to solve object recognition problems and tracking problems separately, but seldom to solve both problems simultaneously. Object recognition is defined as the process of extracting information such as name, size, position, pose, and functions related to the object. In this paper, we restrict the definition to extracting objects identification and pose information only. Recently, there has been much development in object recognition, hut it still remains on the level of recognizing object in a well controlled environment. This is originated from the object variations such as view angle changes and illumination changes. Further more, it becomes a difficult task when an object is occluded by others or placed in a cluttered environment [ l 21. To solve these problems, various approaches have been proposed using invariant, 3D CAD model and appearance [3-51.Recently, reflecting the characteristic of the human visual object recognition methods based on local image

Figure 1 shows the proposed system of 3 0 object recognition by combining feature matching with tracking. The system is consisted of three components: the model DB generation, object recognition and tracking. In model generation, four points of an image model are pre-selected corresponding to the 3D CAD model points. By performing object recognition using local Zernike moments, we can obtain a homography between the image model and the input scene. By transforming the image model points using this homograpby, we can establish correspondences between the input scene feature point and 3D CAD feature points. Using these correspondences, we can estimate the initial pose of the recognized object. Then we perform object tracking using Lie group SE(3) in real-time.

Sensor lmut

Fig. I . Proposed recognition and tracking system

0-7803-7736-2/03/$17.00 02003 IEEE

2123

I lGENERAL OBJECT RECOGNITION l.

We propose a new object recognition method which is robust to scale, planar rotation, illumination change, occlusion and background clutter.
A . Proposed object recognilion system

The object recognition system is composed of robust feature extraction part and feature matching part. Figure 2 shows the overall system of the object recognition part.
~

_ _ --~___ -

with the conditions ( n -1ml) : even, /mi 5 n . As the Zemike moments are calculated using radial polynomials shown in figure 3, they have inherent rotation invariant property. Especially, Zemike moments have superior properties in terms of image representation, information redundancy and noise characteristics [14].

Fig. 2. The proposed object recognition system

In off-line process, Zemike moments are calculated around interest points detected from the scale space of image model and are stored in a database. We can recognize an object by probabilistic voting of these Zemike moments in on-line process. The locality of the Zemike moment provides some robustness to occlusion and background clutter. We verify the recognition by aligning model features to input scene. In this process, the homography between image model and input scene is calculated. We determine the success of the recognition by the percentage of the outlier. The outlier isdetermined from the distance between scene feature position and transformed model feature position by homography.

Fig. 3. Radial polynomials of oder n=O, 1, . .., 9

But they are sensitive to scale and illumination changes. We reduced the scale problem by applying scale space theory to image model [15]. This method makes the comer extraction process more effective than using the image pyramid. The problem of illumination change is simply solved by normalizing the moments by the ZOO moment which is equivalent to the average intensity. Since local illumination change can be modeled as
f(xj,yi) = a J ( x i , r j )

B. Robust localfeature: Zernike moments


The proposed object recognition method is based on the fact that the human visual system concentrates on a certain object dwing recognition [lo]. Interesting points in image model are placed on the same position in input scene. This is called the repeatability of interesting points. We use the Hams comer detector which has superior repeatability [ l l 121. We use Zemike moments to represent local characteristics of an image segment. Zemike moments are defmed over a set of complex polynomials which form a complete orthogonal set over the unit disk x2 + y 2 < I [13]. Zemike moments are the projections of the image intensity f ( x ,y ) onto the orthogonal basis fnnctions V A x ,Y ) .

(4)

we can make illumination invariant feature as


Z ( f ( X . Y ) )- ~ L Z ( J ( X , Y )- Z ( f ( X , Y ) ) )
~

(5)

my

aLmf

m/

where f ( x , y ) represents the intensity at ( x , y ) , f ( x , y ) is the new intensity after illumination change, a, means the rate of illumination change, m, denotes average intensity and Z means Zemike moment operator. Figure 4 shows the robustness of the modified Zemike moments to illumination changes. We can observe that the normalized Zemike moments are almost constant even under illumination changes.

2124

If all the objects are equally probable and independent mutually, equation (IO) becomes
2 P ( H h I Mj)P(M,)
(1 1)
h=l

By the theorem of total probability, the denominator can be written as


NM
05,
I,

>

>J

I,

.I

P ( H h ) = C P ( H h 1 M,)p(M,)
- *M .e

(12)

m . u -

,=I

(b) With illmination invariant (a) Without illumit&tiooinvariant Fig. 4. Zemike moments and illumination changes

C. Probabilistic voting

In this paper, we propose a probabilistic voting based recognition method considering the stability of Zemike moments of the image model and the similarity of Zemike moments between the image model and the input scene. Object recognition using probabilistic voting means finding a model M i that maximizes
argmax P ( M i I S)
M,

The term P ( H h I M,) is computed by considering the stability and similarity measure of matching pairs. Basically, P ( H h IM,) has to be large when both the stability and the similarity measures are high. The stability measure reflects the incompleteness of repeatability of interest points and the similarity measure reflects the closeness of feature vectors in Euclidean space.

( I ) Stability ( os ): As equation (13), the stability of


Zernike moments is inversely proportional to the sensitivity which is the standard deviation of Zemike moments calculated at 5 positions around an interesting point, The smaller the sensitivity is, the more stable Zernike moments of the image model are.

(6)

where S represents the input scene. In this paper, we assumed only one model M i for each object in frontal view. But the model images can be extended to represent full 3D views. For each input feature, we form a set of matching pairs consisting of the corresponding model features whose Zernike moments are similar to that of the input feature. For k-th input feature, a set of matching pairs is

(2) Similarity ( C, ): As defined equation (14), the O similarity of a feature pair is inversely proportional to the Euclidean distance between the Zemike moments of input scene and that of the corresponding image model.

,wk ( z k , i k j ) 1 z kE ={

S , E~ , i = l , Z ; . . , ~ , } M~ ~

(7)

where Z,.denotes the Zemike moments of the k - th input feature, Zy means the Zernike moments of the model feature and N , is the number of corresponding model features. The hypothesis is

Therefore, P(Hh 1 Mi) is defined as

Hh = Z , j l Z 3 ?i = ~ @ N,} k j i ) ( i . j l,2,..., )h

(8)
where
P ( ( Z j , i , ) I Mi) =

where Ns is the number of interest points of input scene. The set of total feature pairs can be written as

H = { H IUH 2 . . . uHN, ]

(9)

k[-+)

j=i

if Z , E f ( M O
else

(16)

where N , is the size of product space as MFk x Hh . In each hypothesis, a model point can be assigned to multiple input points because corresponding pairs are made using Euclidean distance measure. Since H is composed of the model Zemike moments corresponding to the input scene S , we can substitute H for S . By the Bayes theorem, equation (6) becomes

a is the normalization factor and E is assigned if the corresponding model feature doesnt belong to a certain model. We use the approximate nearest neighbor search algorithm to find matching pairs [16]. It takes log time for linear search space.

D. Recognition verification
Recognition results are verified using feature pairs. We find optimal feature pairs by rejecting outliers using the area ratio which is preserved under the affine

2125

transformation. For given four points (4,P2,4, P,) shown in figure 5, we calculate the area ratio S , / S , = A P 2 ~ P 4 1 A P , P 2 P 3in the image model and S,'I S,'= AP', P',/ AP, P', in the input scene. If the P', P', two ratios differ with by a predetermined threshold, we reject the fourth feature point. We assume the first three points are matched.

Second, we experimented under complex environment. Figure7-(a) shows some examples of recognition results under view changed in cluttered background. Figure 7-(b) shows experimental results when cluttered background, illumination change and occlusion exist.

(a) Local Ceahlre of model (b) Local feahlre of scene Fig. 5 . Rejection of outliers using area ratio

Then we calculate an initial homography based on these optimal feature pairs randomly selected. A LMedS based method selects an optima1 homography from feature pairs. This homography is used as the initial pose estimation of 3D CAD model. We can also determine the percentage of outliers by applying this homography to the remaining image model points.

E. Object Recognition results


Figure 6 shows the image models consisting of twenty objects such as hooks, CDs, telephones, boxes, dolls, photos and etc. which can he found in sumounding us.
model 1 model 2 model 3 model 1 model 6

model 6

model 11

m
model 12 model 17

model 1

model 8

model 9

model 10
I , "

L.

Fig. 7. Sample object recognition results Cor images taken under various conditions

model 13

model 14

model 15

model 16

model 18

model 1S

model 20

Table 1 shows the statistical results of )the object recognition to model-IO. It is failed to recognize some scenes where sever specular, blurring, background clutter and low illumination exist.

Model #
Fig. 6. Image models used for object recognition

Iteration
118

Success
105

Recognition rate 1%1


88.9

IO First, we evaluated the recognition system performance for independent environments, such as scale change, illumination change, background clutter and occlusion. The system can recognize objects under scale changed up to factor 3, illumination intensity changed up to factor 2; somewhat complex background and half occlusion. For these cased, success rate is above 95%.

IV. REAL-TIME ALIGNMENT AND TRACKING


By aligning the projected 3D CAD model of the recognized object and computing the motion, we can track

2 126

the recognized object continuously. We utilize the general paradigm of Lie group formalism [ 171. Conventional approaches often estimate an initial pose of 3D CAD model manually. In this paper, we initialize the us,ng the pose of 3~ CAD homography computed in the recognition stage. If we have n correspondences between the recognizd image model points X = (um,vm,l) and the input Scene , T the points X, =(u,,v,,~)~, relations are written as

X, = Hx,
where H i s the 3 by 3 homography. From equation (17), we obtain

(17)

Figure &(a) shows the input image which contains model-10. Interesting points computed by the Harris comer detector are shown in figure 8-(b). By the probabilistic voting scheme proposed in this paper, we correctly recognized model-IO as shown in figure 8-(c). Finally, figure X-(d) shows the transformed image model features to the input scene using the homography computed from the feature correspondencesSince we have the initial pose matrix, we can project the 3D CAD model to the input scene . . . . a- shown in figure 9. . ~. "
~ ~

H = X,X,r(X,X,T)-'

(18)
I

The steps to compute the initial pose of 3D CAD model include Specify 4 corresponding points between a 3D CAD model and the corresponding image model. Compute the homography through object recognition. Transform the pre-selected model image points to th e input scene using the homography obtained in step2. The image feature points corresponding to the 3D CAD model points are obtained and transformed to the camera coordinate system using pre-determine camera intrinsic parameters [IS]. Calculate the rotation and translation matrices from these feature pairs.

initial a l i m e n t

frame#lO

4.

frame #20

hame #30

5.

v.:,

V. EXPERIMENTAL RESULTS We have tested the proposed system using various models shown in figure 6. Figure 8 shows the results of object recognition for each stage.

frame #50

frame #60

frame #70

frame #SO

frame #90

Fig. 9. Object tracking results for model 10.

The tracking system shows fairly good performance for various object motions through a long sequence of images. It takes 1-2sec for object recognition. The tracking
Fig. 8. Object recognition process for model-IO.

2127

performance shows 10-15 framesisec under AMD 1.4 GHz. VI. CONCLUSIONS

on Pattern Recognition and Machine Intelligence, vol.

In this paper we have proposed a practical object recognition and tracking system. Main contributions of the proposed system are: First, we proposed an object recognition system which is robust to view position, illumination change, occlusion and background clutter based on modified local Zemike moments. Second, we proposed an automatic pose initialization technique to track the recognized object in real-time by using the bomograpby computed from the feature correspondences. The experimental results demonstrate that robust object recognition is feasible by combining feature matching with 3D object tracking. For future work, we will develop a recognition system that can recognize and track full 3D object laid in random poses.
VII. ACKNOWLEDGMENTS This research has been partly supported by ADD and HWRS-ERC grant. VIII. REFERENCES [I] [2] [3]

[4] [5]

S . Ullman, High-level Vision: Object Recognition and Visual Cognition, f i e M T Press, 1997. C. Olson, Fast Object Recognition by Selectively Examining Hypothesis. Doctors Thesis. Universify of California, 1994. K. Rob and I. Kweon, 2-D Occluded and Curved Object Recognition using Invariant Descriptor and Projective Refinement, IElCE Transactions on Information andSystems, vol. E81-D, No. 8 , 1998. M. Stevens, J. Beveridge, Precise Matching of 3-D Target Models to Multisensor Data,CSU, Jan. 20, 1997. D. Keren, M. Osadchy, C. Gotsman, Antifaces: A Novel, Fast Method for Image Detection, IEEE Tras.

23, No.7, pp.747-761, 2001. [ 6 ] C. Schmid, A. Zesserman, R. Mohr, Integrating Geometric and Photometric Information for Image Retrieval, In International Workshop on Shape, Contour and Grouping in Computer Vision, 1998. [7] D. Lowe, Object Recognition from Local ScaleInvariant Features, Proc. International Conference on Computer Vision, IEEE Press, 1999. [E] K. Ohha, K. Ikeuchi, Detectability, Uniqueness, and Reliability of Eigen-Windows for Robust Recognition of Partially Occluded Objects, IEEE Pattern Analysis and Machinelntelligence 19(1997), 110.9, 1043-1048, 1997. [9] D. Jugessur, Robust Object Recognition using Local Appearance based Methods, Master Thesis, McGill Univ., 2000. [IO] A. Treisman, Features and Objects in Visual Processing, Scientific Aamerican, November 1986. [I I] C. Scbmid, U. Mohr, C. Bauckhage, Comparing and Evaluating Interest Evaluation of nterests, In ICCV, pp. 230-235, 1998. [ 121 C. Hams and M. Stehens. A Combined Comer and Edge Detector, In Nvey Vision Conference, pages 147-151, 1988. 1131 S . Abdallah, B. MengSc, Object Recognition via Invariance, DoctorS Thesis, The Univ. of Sydney, 2000. [14] C. The, R. Chin, On Image Analysis by the Methods of Moments, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 10, no. 4, 1988. [IS] T. Lindeberg, Scale-space thery: A Basic Tool for f Analysing Structures at Different Scales, Journal o AppliedStatistics, 20,2, pp. 224-270, 1994 [I61 S. Arya, et al., An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions, J. ACM45(6):891-923, Nov 1998. [17] T. Drummond, U. Cipolla, Real-time visual tracking of complex structures, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 24 Issue: 7 ,pp. 932 -946, July 2002. [I81 Z. Zhang, Flexible Camera Calibration by Viewing a Plane from Unknown Orientations, ICCV, 1999.

2128

You might also like