You are on page 1of 13

Journal of the Brazilian Society of Mechanical Sciences

Print version ISSN 0100-7386


J. Braz. Soc. Mech. Sci. vol.24 no.3 Rio de Janeiro July 2002

http://dx.doi.org/10.1590/S0100-73862002000300013

Performance evaluation of 3D computer vision techniques

A. M. Kabayama1; L. G. Trabasso

Instituto Tecnológico de Aeronáutica Divisão de Engenharia Mecânica-Aeronáutica


12.228-900 São José dos Campos, SP. Brazil. E-mail: gonzaga@mec.ita.cta.br

ABSTRACT

This work presents the implementation and comparison of three different techniques of
three-dimensional computer vision as follows:

• Stereo vision - correlation between two 2D images


• Sensorial fusion - use of different sensors: camera 2D + ultrasound sensor (1D);
• Structured light

The computer vision techniques herein presented took into consideration the following
characteristics:

• Computational effort ( elapsed time for obtain the 3D information);


• Influence of environmental conditions (noise due to a non uniform lighting,
overlighting and shades);
• The cost of the infrastructure for each technique;
• Analysis of uncertainties, precision and accuracy.

The option of using the Matlab software, version 5.1, for algorithm implementation of
the three techniques was due to the simplicity of their commands, programming and
debugging. Besides, this software is well known and used by the academic community,
allowing the results of this work to be obtained and verified. Examples of three-
dimensional vision applied to robotic assembling tasks ("pick-and-place") are
presented.

Keywords: computer vision, range finders, robotics, mechatronics

Introduction

Because of the increasing use of robots in industries, robotics has become an area
inside engineering with its own identity. The advance of sensor technologies and their
decreasing prices has allowed the construction of robots with more feedback
capabilities about their workspace. As a consequence, their positioning accuracy, speed
operation and functional flexibility have increased. It is normally accepted that
computer vision is the most powerful and flexible way to provide robot with feedback
about the environment they interact with and considerable efforts and resources have
been spent on research and development, according to Ruocco (1987).

The determination of tridimensional data from two-dimensional images is very


important in computer vision field and one way to perform that, is by using stereo
vision techniques. In a three dimensional scene recovered from a pair of stereo
images, two main problems must be solved. The first and more complex one is called
'stereo matching', that is, the establishment of correlation between two images of the
same object taken from two different points of view. One projected point in the first
image must match a point in the second image, which is the projection of the same
point of the tridimensional world. The second problem is the geometric calculation of
the tridimensional co-ordinate from the pairs of correlated points from both images
using triangulation techniques, as shown in Fu (1987). Most of the current stereo
vision techniques use area based algorithms. Those algorithms split both images into a
number of subregions and a degree of photometric similarity is used to establish the
matching between the respective subregions.

The sensorial fusion technique uses two types of sensors: a camera (two-dimensional
sensor) and a range finder (one-dimensional sensor). This technique uses the
computer vision capabilities to perform bidimensional measures like the co-ordinates of
the centre of area of an object, as well as its length, width and orientation. Then, the
range finder completes the missing information by measuring the height of the object's
centre area ordinate. All length, width and orientation information are used to calculate
the attitude of a robotic manipulator and allow it to grab an object in a typical robotic
pick and place task.

Structured lighting is used mainly in inspection and quality control applications. This
method is based on the extraction of geometric information encoded in a bidimensional
image. The object height information is extracted by measuring the deformity of a
pattern of light projected in its surface. As the model of this deformity behaviour is
known, it is possible to recover the object's height. All the three dimensional computer
vision techniques are described in detail as follows.
Stereo Vision

The stereo vision system herein presented was designed for a robotic 'pick-and-place'
application, as shown schematically in the scheme in Fig.1

The stereo images grabbing process takes place in two steps, with the camera filming
the scene from this top view. After the first image is grabbed, the camera is moved
away to (0,5 to 1,5 cm range) through a displacement mechanism driven by a step
motor. After this displacement, the second image is grabbed.

Stereo Vision Implementation

The development and tests of the algorithm have been carried out in four steps.

The first step was the development of the image correlation routines using MatLab®
version 5.1 and synthetic images generated by 3D Studio, Version 4.0. At the
beginning, the work was conducted with lower resolution images (200x300 pixels) to
make the routines development process faster due to the huge computational effort
involved in this technique. As soon as the correlation algorithm parameters were
optimised and settled to provide a good performance, the second step took place,
where the real image, grabbed by the camera with 512x512 pixels and 64 grey levels
resolution, replaced the synthetic images.

The third step was the development of a process for calibrating intrinsic camera lens
parameters. The most important parameter is the focal distance that is used to recover
the object's height information.

The fourth step was the recovery of tridimensional data about the scene from 2D
images. From the first image, which is always grabbed at an initial position, the
information about objects in scene is achieved in pixels, such as length and width. To
recover metric information about objects, it is necessary to find out the relationship
between metric and pixels scales.

The information about object height is calculated through a simple triangulation


technique from geometric optics model of stereo images configuration, as shown in Fu
(1987).

Image Correlation Process Using Relaxed Rotulation Labelling

'Relaxed rotulation labelling processes are iterative procedures heuristically developed


to solve certain constraint satisfaction problems, which have become a standard
technique in computer vision and pattern recognition', according to Pelillo (1994). The
result search is a coarse to fine process, which ends when the iterative procedure
reaches the required error. These algorithms are associated to a 'energy' function,
quantifying the degree of violation of constraints, which is minimised as the process
evolves.

The stereo matching algorithm implemented in this paper has been proposed by Feris
(1998).

Camera model and camera calibration

The camera model adopted is the 'pinhole' model, as shown in Fu (1987) and in Nalwa
(1993): the tridimensional projection of an object into a bidimensional surface is
achieved through straight lines that pass through one single point called 'projection
centre'. Camera calibration is the determination of all its inner geometric and optics
features. These inner features are called intrinsic parameters. Camera calibration also
means the determination of its position and orientation relative to the world co-
ordinate system. These features are called extrinsic parameters. Laudares (1997)
presents in detail an extrinsic camera calibration process, which is quite suitable for
the robotic application proposed in this work. The most important camera intrinsic
parameter is the focal distance , which is the distance between the centre of the lens
and the image sensor plane.

3D Object Height Recovery ( Triangulation)

The following conditions must be met for the model shown in the Fig 2:
• The cameras are identical;

• Both images co-ordinates systems are perfectly aligned, with different origin
location;

• The Z co-ordinate is the same for both images;

According to Fu (1987), the depth information recovery (Z co-ordinate) is achieved by


the following expression:

where:

•  is the focal distance, estimated by an experimental calibration process.

• x2 and x1 are co-ordinates in pixels, that must be converted in metric scale by


multiplying them by a scale factor that relates the size of sensor in x direction, in
meters, and, sensor resolution, in pixels, in the same direction.

• B is the baseline, that is the displacement between two optical centres.

Some improvements on Feris (1998) technique were included in order to increase the
algorithm performance and ease the correlation process, as shown in Kabayama
(1999).

Further information about focal distance and scale factor processes procedures and
calibration results can be found in Kabayama (1999).

Table 1 shows the results of some objects height measures using 30mm baseline
displacement.
Disparity is the difference between respective x co-ordinates in both images and
matches established is the number of correlated points.

Sensorial Fusion

The conception of the sensorial fusion technique for 3D-vision machine is shown in the
Fig. 3.

The sensor used was the Honeywell® 's 946-A4V-2D-2CO-175E ultrasound sensor,
which has the following features:

Minimum range: 200 mm ( programmable )

Maximum range: 2000mm ( programmable )


Weight: 0.210 kilograms

The sensor provides an analogic tension output proportional to the distance to be


measured. This proportional pattern can behave in a direct or in an inverse way,
depending on how the sensor is programmed (rising or falling modes). The curves of
this sensor relate output tension variation as a function of distance were determined
using both proportional modes for different range programs. The results showed that
the ultra sound sensor has a linear behaviour in all modes and this is an important and
desirable feature. The respective static calibration coefficients for each curve were
calculated and they are necessary to sensitivity to establish the relationship between
the output tension and the distance measured and for evaluating the sensitivity of the
programmed mode for noise and the resolution.

As for ultrasound beam characteristics, as shown in Fig. 4, it was necessary to study


the radial profile and the results are shown in Table 2.
The distances shown in Table 2 refer to the object top.

The determination of the ultrasound beam diameter in a given level was performed in
an experimental way: on the surface of the testing table, a grid paper has been fixed
and the sensor positioned in a certain range from it, aimed at its direction. An object
was moved on this surface towards the place that the sensor was pointing at. As soon
as the object was detected, the place where that happened was marked. This
procedure was repeated until a complete ultrasound profile in this level was
determined. This entire process was repeated for other levels, as shown in the Table 2.

From the knowledge about the sensor features, it is possible to estimate the minimum
size of the object that can be manipulated using this technique. For example, at 40cm
range, the object size must be 16cm at least. The size of the objects cannot be smaller
than the diameter of ultrasound beam in the object top. Besides, the object material
should not absorb the ultrasound waves and the object's top must be perpendicular to
the direction that the ultrasound beam may reach it.

Structured Lighting

Two different lighting patterns were studied to evaluate accuracy and check if this
technique is suitable for a pick-and-place robotic application.

The first pattern studied was a source of laser light from a presentation pointer device.
An external DC power source was adapted to the device in order to avoid the
decreasing light intensity due to batteries flattering process. Fig. 5 shows a general
view of the experiment using a laser light source and its respective model [Galbiati
(1990)] to recover the object height h information.
The full line and the dotted line shown in Fig. 5 represent two distinct situations in the
data acquisition process for the object height h information recovery.

The scene is filmed twice. At the first shot, represented by the dotted line, the object is
out of scene and P1 is the position of the laser beam centre area where it touches the
ground. At second shot, represented by the full line, the object is in scene and P2 is
the position of the laser beam centre area where it touches the top of the object. The
laser beam area centre is determined by computer vision system in both situations. P3
is the P2 projection in the horizontal plane. The laser beam reaches the ground with a
 angle and d is the distance, in pixels, between area centres P1 and P2. The object h
height is determined by a simple trigonometric relation ( see Fig. 5 ):

where:

d  Z+;

s  R+, is an experimental conversion factor between centimeters and pixels;

 Z+ , is the angle between P1P2 and P1P3 line segments.

Implementation

The first step was the determination of the conversion factor s. An A4 sheet of paper
has been used as a standard object shot five times. Then, the object size in pixels, in
each image, were measured. The s conversion factor yields from the average rate
between the standard object size measured in centimeters and in pixels. It is
important to take note if this conversion factor is determined at the same direction of
distance variation because the camera pixels are not square shaped.

The s factor determinated was:


s = 21.1 / 387 (cm / pixel)

The second step was the  angle calibration procedure. Five different objects with
known heights h were used as calibration standards and shot five times. Then, using
Eq. (3), the respective angles  were calculated for each distance d measured.

The angle  calibration results are shown in Table 3.

After completing the s and  calibration processes, the 3D structured lighting computer
vision system using laser light source was ready to be operated. Different object
heights were measured and the results are shown in Table 4.

The second pattern used in this experiment was a single stripe projected by an
overhead projector. Fig. 6 shows a general view of experiment and the Fig. 7 shows a
projected pattern detail.
In this case, the object height information recovery is similar to the laser case, using
the same principles and equations. The difference is that applying stripe pattern yields
in three object recognition by computer vision system as shown in Fig.7. Due to
digitalisation errors, the alignment of O2 and O3 objects can not always be obtained.
Because of this, the distance d is the average of the distance between O2 and O1 and
the distance between O3 and O1.
The conversion factor s used is the same as in the previous experiment. The  angle
calibration process was repeated, using five standard objects. The results are shown in
Table 5.

After completing the s and  calibration processes, the 3D structured lighting computer
vision system using the single stripe pattern was ready to use. Different object heights
were measured using the Eq. (2). Some results are shown in Table 6.

Analysis and Conclusions

Computer vision is a field with great potentials to be exploited and this work shows
that there are still many subjects to be tested and improved. Despite all literature
available about the techniques exposed in the current work, only the practical part
shows the researcher the difficulties involved in their implementation and allows the
evaluation about what assumptions and measurements can be taken.

The errors obtained in the objects measurements in the stereo vision and structured
lighting techniques experiments implemented are acceptable in typical pick-and-place
applications due to the robot end effector compliance compensation, even considering
the reported worst case, that was 1cm error.

Table 7 presents the analysis and conclusions compiled from experimental data and
from difficulties faced during the implementation of each technique exposed.
References

Feris, R.S. & Lages, W.F., 1998, Casamento de Imagens utilizando correlação e
rotulação relaxada, Anais do IV ENCITA- Encontro de Inciação Científica e Pós-
Graduação do ITA [ Links ]

Fu, K.S. & Gonzales, R.C. & Lee C.S.G., 1987, Robotics, Mc Grall Hill, [ Links ]

Galbiati, L. J., 1990, Machine vision and digital image processing fundamentals ,
Prentice Hall [ Links ]

Kabayama, A. M., 1999, Implementação e análise de técnicas de fusão sensorial


aplicadas à robótica, Tese de mestrado ITA. [ Links ]

Laudares, D., 1997, Procedimento automático para calibração de sistemas de visão


robótica para operações pick-and-place, Tese de mestrado ITA [ Links ]

Nalwa, V.S.,1993, A guided tour of computer vision, Adson-Wesley Publishing


Company [ Links ]

Pelillo, Marcello, 1994, On Dynamics of Relaxation Labeling Processes, IEEE


Transactions on pattern analysis and machine intelligence [ Links ]

Ruocco S.R., 1987, Robot sensors and transducers, Open University Press Robotics
Series, Edited by P.G. Davey [ Links ]

1 E-mail: alfred@mec.ita.cta.br

You might also like