Professional Documents
Culture Documents
http://dx.doi.org/10.1590/S0100-73862002000300013
A. M. Kabayama1; L. G. Trabasso
ABSTRACT
This work presents the implementation and comparison of three different techniques of
three-dimensional computer vision as follows:
The computer vision techniques herein presented took into consideration the following
characteristics:
The option of using the Matlab software, version 5.1, for algorithm implementation of
the three techniques was due to the simplicity of their commands, programming and
debugging. Besides, this software is well known and used by the academic community,
allowing the results of this work to be obtained and verified. Examples of three-
dimensional vision applied to robotic assembling tasks ("pick-and-place") are
presented.
Introduction
Because of the increasing use of robots in industries, robotics has become an area
inside engineering with its own identity. The advance of sensor technologies and their
decreasing prices has allowed the construction of robots with more feedback
capabilities about their workspace. As a consequence, their positioning accuracy, speed
operation and functional flexibility have increased. It is normally accepted that
computer vision is the most powerful and flexible way to provide robot with feedback
about the environment they interact with and considerable efforts and resources have
been spent on research and development, according to Ruocco (1987).
The sensorial fusion technique uses two types of sensors: a camera (two-dimensional
sensor) and a range finder (one-dimensional sensor). This technique uses the
computer vision capabilities to perform bidimensional measures like the co-ordinates of
the centre of area of an object, as well as its length, width and orientation. Then, the
range finder completes the missing information by measuring the height of the object's
centre area ordinate. All length, width and orientation information are used to calculate
the attitude of a robotic manipulator and allow it to grab an object in a typical robotic
pick and place task.
Structured lighting is used mainly in inspection and quality control applications. This
method is based on the extraction of geometric information encoded in a bidimensional
image. The object height information is extracted by measuring the deformity of a
pattern of light projected in its surface. As the model of this deformity behaviour is
known, it is possible to recover the object's height. All the three dimensional computer
vision techniques are described in detail as follows.
Stereo Vision
The stereo vision system herein presented was designed for a robotic 'pick-and-place'
application, as shown schematically in the scheme in Fig.1
The stereo images grabbing process takes place in two steps, with the camera filming
the scene from this top view. After the first image is grabbed, the camera is moved
away to (0,5 to 1,5 cm range) through a displacement mechanism driven by a step
motor. After this displacement, the second image is grabbed.
The development and tests of the algorithm have been carried out in four steps.
The first step was the development of the image correlation routines using MatLab®
version 5.1 and synthetic images generated by 3D Studio, Version 4.0. At the
beginning, the work was conducted with lower resolution images (200x300 pixels) to
make the routines development process faster due to the huge computational effort
involved in this technique. As soon as the correlation algorithm parameters were
optimised and settled to provide a good performance, the second step took place,
where the real image, grabbed by the camera with 512x512 pixels and 64 grey levels
resolution, replaced the synthetic images.
The third step was the development of a process for calibrating intrinsic camera lens
parameters. The most important parameter is the focal distance that is used to recover
the object's height information.
The fourth step was the recovery of tridimensional data about the scene from 2D
images. From the first image, which is always grabbed at an initial position, the
information about objects in scene is achieved in pixels, such as length and width. To
recover metric information about objects, it is necessary to find out the relationship
between metric and pixels scales.
The stereo matching algorithm implemented in this paper has been proposed by Feris
(1998).
The camera model adopted is the 'pinhole' model, as shown in Fu (1987) and in Nalwa
(1993): the tridimensional projection of an object into a bidimensional surface is
achieved through straight lines that pass through one single point called 'projection
centre'. Camera calibration is the determination of all its inner geometric and optics
features. These inner features are called intrinsic parameters. Camera calibration also
means the determination of its position and orientation relative to the world co-
ordinate system. These features are called extrinsic parameters. Laudares (1997)
presents in detail an extrinsic camera calibration process, which is quite suitable for
the robotic application proposed in this work. The most important camera intrinsic
parameter is the focal distance , which is the distance between the centre of the lens
and the image sensor plane.
The following conditions must be met for the model shown in the Fig 2:
• The cameras are identical;
• Both images co-ordinates systems are perfectly aligned, with different origin
location;
where:
Some improvements on Feris (1998) technique were included in order to increase the
algorithm performance and ease the correlation process, as shown in Kabayama
(1999).
Further information about focal distance and scale factor processes procedures and
calibration results can be found in Kabayama (1999).
Table 1 shows the results of some objects height measures using 30mm baseline
displacement.
Disparity is the difference between respective x co-ordinates in both images and
matches established is the number of correlated points.
Sensorial Fusion
The conception of the sensorial fusion technique for 3D-vision machine is shown in the
Fig. 3.
The sensor used was the Honeywell® 's 946-A4V-2D-2CO-175E ultrasound sensor,
which has the following features:
The determination of the ultrasound beam diameter in a given level was performed in
an experimental way: on the surface of the testing table, a grid paper has been fixed
and the sensor positioned in a certain range from it, aimed at its direction. An object
was moved on this surface towards the place that the sensor was pointing at. As soon
as the object was detected, the place where that happened was marked. This
procedure was repeated until a complete ultrasound profile in this level was
determined. This entire process was repeated for other levels, as shown in the Table 2.
From the knowledge about the sensor features, it is possible to estimate the minimum
size of the object that can be manipulated using this technique. For example, at 40cm
range, the object size must be 16cm at least. The size of the objects cannot be smaller
than the diameter of ultrasound beam in the object top. Besides, the object material
should not absorb the ultrasound waves and the object's top must be perpendicular to
the direction that the ultrasound beam may reach it.
Structured Lighting
Two different lighting patterns were studied to evaluate accuracy and check if this
technique is suitable for a pick-and-place robotic application.
The first pattern studied was a source of laser light from a presentation pointer device.
An external DC power source was adapted to the device in order to avoid the
decreasing light intensity due to batteries flattering process. Fig. 5 shows a general
view of the experiment using a laser light source and its respective model [Galbiati
(1990)] to recover the object height h information.
The full line and the dotted line shown in Fig. 5 represent two distinct situations in the
data acquisition process for the object height h information recovery.
The scene is filmed twice. At the first shot, represented by the dotted line, the object is
out of scene and P1 is the position of the laser beam centre area where it touches the
ground. At second shot, represented by the full line, the object is in scene and P2 is
the position of the laser beam centre area where it touches the top of the object. The
laser beam area centre is determined by computer vision system in both situations. P3
is the P2 projection in the horizontal plane. The laser beam reaches the ground with a
angle and d is the distance, in pixels, between area centres P1 and P2. The object h
height is determined by a simple trigonometric relation ( see Fig. 5 ):
where:
d Z+;
Implementation
The first step was the determination of the conversion factor s. An A4 sheet of paper
has been used as a standard object shot five times. Then, the object size in pixels, in
each image, were measured. The s conversion factor yields from the average rate
between the standard object size measured in centimeters and in pixels. It is
important to take note if this conversion factor is determined at the same direction of
distance variation because the camera pixels are not square shaped.
The second step was the angle calibration procedure. Five different objects with
known heights h were used as calibration standards and shot five times. Then, using
Eq. (3), the respective angles were calculated for each distance d measured.
After completing the s and calibration processes, the 3D structured lighting computer
vision system using laser light source was ready to be operated. Different object
heights were measured and the results are shown in Table 4.
The second pattern used in this experiment was a single stripe projected by an
overhead projector. Fig. 6 shows a general view of experiment and the Fig. 7 shows a
projected pattern detail.
In this case, the object height information recovery is similar to the laser case, using
the same principles and equations. The difference is that applying stripe pattern yields
in three object recognition by computer vision system as shown in Fig.7. Due to
digitalisation errors, the alignment of O2 and O3 objects can not always be obtained.
Because of this, the distance d is the average of the distance between O2 and O1 and
the distance between O3 and O1.
The conversion factor s used is the same as in the previous experiment. The angle
calibration process was repeated, using five standard objects. The results are shown in
Table 5.
After completing the s and calibration processes, the 3D structured lighting computer
vision system using the single stripe pattern was ready to use. Different object heights
were measured using the Eq. (2). Some results are shown in Table 6.
Computer vision is a field with great potentials to be exploited and this work shows
that there are still many subjects to be tested and improved. Despite all literature
available about the techniques exposed in the current work, only the practical part
shows the researcher the difficulties involved in their implementation and allows the
evaluation about what assumptions and measurements can be taken.
The errors obtained in the objects measurements in the stereo vision and structured
lighting techniques experiments implemented are acceptable in typical pick-and-place
applications due to the robot end effector compliance compensation, even considering
the reported worst case, that was 1cm error.
Table 7 presents the analysis and conclusions compiled from experimental data and
from difficulties faced during the implementation of each technique exposed.
References
Feris, R.S. & Lages, W.F., 1998, Casamento de Imagens utilizando correlação e
rotulação relaxada, Anais do IV ENCITA- Encontro de Inciação Científica e Pós-
Graduação do ITA [ Links ]
Fu, K.S. & Gonzales, R.C. & Lee C.S.G., 1987, Robotics, Mc Grall Hill, [ Links ]
Galbiati, L. J., 1990, Machine vision and digital image processing fundamentals ,
Prentice Hall [ Links ]
Ruocco S.R., 1987, Robot sensors and transducers, Open University Press Robotics
Series, Edited by P.G. Davey [ Links ]
1 E-mail: alfred@mec.ita.cta.br