You are on page 1of 24

International Journal of Optomechatronics

ISSN: 1559-9612 (Print) 1559-9620 (Online) Journal homepage: https://www.tandfonline.com/loi/uopt20

3-D Vision Feedback for Nanohandling Monitoring


in a Scanning Electron Microscope

Marco Jähnisch & Sergej Fatikow

To cite this article: Marco Jähnisch & Sergej Fatikow (2007) 3-D Vision Feedback for
Nanohandling Monitoring in a Scanning Electron Microscope, International Journal of
Optomechatronics, 1:1, 4-26, DOI: 10.1080/15599610701232630

To link to this article: https://doi.org/10.1080/15599610701232630

Published online: 30 Mar 2007.

Submit your article to this journal

Article views: 319

View related articles

Citing articles: 27 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=uopt20
International Journal of Optomechatronics, 1: 4–26, 2007
Copyright # Taylor & Francis Group, LLC
ISSN: 1559-9612 print=1559-9620 online
DOI: 10.1080/15599610701232630

3-D VISION FEEDBACK FOR NANOHANDLING


MONITORING IN A SCANNING ELECTRON
MICROSCOPE

Marco Jähnisch1 and Sergej Fatikow2


1
R&D-Division Microsystems Technology and Nanohandling, Oldenburg,
Germany
2
Division Microrobotics and Control Engineering, University of Oldenburg,
Oldenburg, Germany

In this article, a new 3-D imaging system for 3-D vision feedback for nanohandling in a
scanning electron microscope (SEM) is presented. The stereo images are generated by
beam tilting followed by the processing of the image data. Additionally, the 3-D module
consisting of a vergence and a stereo system is described in more detail. The proposed stereo
algorithm is biologically motivated and utilizes a new coherence detection analysis algor-
ithm. The 3-D imaging system provides a sharp and high density disparity map, and 3-D
plots in sub-pixel accuracy. The system is therefore suitable for micro- and nanohandling
in SEMs.

1. INTRODUCTION
In micro- and nanotechnology, handling and manipulation processes are neces-
sary. Because the object size is in the range of mm, sub-mm, and even down to a few
nm, SEMs are increasingly employed for observation of these processes. Through
observation, information such as the position of the objects and of the tools is
determined. This information is needed by the user or by a robot control system.
It is therefore important to know the relative position of the objects and tools in
all three dimensions of space. In addition, the determination has to be carried out
precisely, robustly, fast, and without any impact on the handling process.
Commercial standard SEMs deliver 2-D images without depth information.
However, for precise handling and working without disturbing the object, 3-D
information is needed. There are some solutions in the literature which try to
overcome the problem.
In (Kasaya et al. 1998) the SEM is equipped with an additional light micro-
scope for automatic object manipulation. They handle spherical objects with a size
of 30 mm. The electron microscope is used to deliver the top view of the objects.

This research was funded by the European Community, FP 6, NMP, Contract No. STRP 013680
NANORAC (Nanorobotic for assembly characterization) and by the Federal Ministry of Economics and
Technology of Germany.
Address correspondence to Marco Jähnisch, R&D-Division Microsystems Technology and
Nanohandling, OFFIS, Escherweg 2, 26121 Oldenburg, Germany. E-mail: marco.jaehnisch@offis.de

4
3-D VISION FEEDBACK FOR NANOHANDLING MONITORING 5

NOMENCLATURE

x is the frequency of the Gabor filter Cx,y is a set of DEUs in the coherence cluster
/ is the phase of the Gabor filter of the disparity stack at position (x,y)
r determines the width of the Gaussian S(i) is a mapping function which maps i to the
cþ complex cell which detects slightly tuned disparities s of one DEU
positive disparities D(i) set of disparity values ði E1 . . . NÞ
c complex cell which detects slightly N size of the disparity stack
negative disparities e threshold (coherence analyses)
c0 complex cell which detects zero tthres threshold (dynamic texture-based filter)
disparities
S determines the tuned disparity of one
disparity estimation unit (DEU)

By means of an edge detection algorithm, the 2-D position of the objects is determ-
ined. The additional light microscope, which is mounted in the sidewall of the SEM,
delivers a side view of the objects. Therefore, it is possible to detect whether the
object has been lifted up or not. Due to the limited resolution of the light microscope
and the fixed magnification of 1000, only relatively large objects can be observed.
The authors in Bürkle et al. (2000) used a triangulation approach for the deter-
mination of the third dimension and proposed two methods. In the first method, a
line is projected onto the object surface by a laser. The laser line is observed by a
miniaturized light microscope. The relative depth can be calculated through the rela-
tive shift between the base line and the projection line. In the second method, the
SEM is employed to project a marker with the electron beam on the object surface.
For this purpose, the object has to be treated with a luminous coating beforehand.
When the beam hits the object surface, a luminous spot appears which is detected by
a miniaturized light microscope which is mounted inside the vacuum chamber. Due
to the resolution limits of the light microscope (CCD camera), the accuracy of this
approach is less then 5 mm.
In all of the approaches, it is necessary to employ a special light microscope
and the restricted space inside the vacuum chamber makes it difficult to install com-
plex additional hardware. Furthermore, it is known that the resolution and accuracy
of the microscopes used are very limited. Therefore, these approaches are not suit-
able for handling tasks down to the nanometer-scale, because these methods do
not fulfill the desired requirements in accuracy and flexibility needed for handling
stations today (Jähnisch et al. 2005; Fatikow et al. 2006).
An alternative is to use a stereoscopic image system. It is known that by means
of, for example, two cameras, the third dimension can be calculated by stereoscopic
image approaches. In this case, images from two different perspectives and a stereo
algorithm are needed to determine the relative depths of tools and objects. Because
an SEM uses a completely different working system compared to a normal camera,
the usual methods for stereoscopic image acquisition are not applicable. Also, the
construction conditions of a conventional SEM mean that it is not possible to
generate two stereo images at the same time.
One known method for the generation of stereoscopic images of a sample in an
SEM is to tilt the sample concentrically between the two images (Alicona 2006).
Depending on the height of the observed object, shifts between homologous points
6 M. JÄHNISCH AND S. FATIKOW

in both stereo images are obtained. For a concentric tilting, it is important that the
tilt axis is in the middle of the object surface. In order to apply a concentric tilting, an
SEM must be equipped with a concentric or Cartesian motor table. In this case, an
arbitrary tilt angle depending on the specification of the table can be applied.
One disadvantage of this approach is that the handling station, which consists
of the handling object, robot, and tools, has to be tilted. This can take several min-
utes and the risk that the object changes its position is high. Therefore, this approach
is not suitable for the generation of stereo images for the observation of handling
processes. Another solution is the tilting of the electron beam (Abe et al. 2004). In
this case, the SEM is equipped with a special lens system which tilts the beam and
thus makes it possible to generate two images from different perspectives. The
advantage of this concept for handling processes is the fast image acquisition
without any impact on the process. Tilt angles of up to 8 are possible. However,
until now this technique has only been available in SEMs with a special electron
beam column and cannot be used in standard SEMs. Therefore, in most cases this
system is not available for generation of stereoscopic images. After the generation
of stereo images, suitable algorithms are needed for the determination of the depth
information by calculating the displacements (disparities) between the two images.
For this, the classical corresponding problem has to be solved. This means that for
every pixel in one image the corresponding pixel in the other image has to be found.
For the determination of 3-D information in SEMs, until now in most of the
cases, area-based approaches (Scherer et al. 1999; Hein et al. 1999; Bhat and Nayar
1998; Kuchler and Brendel 2003; Lacey et al. 1998) have been employed for surface
reconstruction. This approach directly compares intensity values or their rank values
within a small image patch of the two images (Scherer et al. 1999). It tries to minimize
the deviation or to maximize the correlation between the patches of the images to cal-
culate the disparities. A stable performance can be achieved if the patch size is large
enough. Sometimes a hierarchical approach is used for the improvement of the calcu-
lation speed. The calculation of the disparity begins at a coarse resolution of the images
and is carried on to the finer resolutions. However, there is no guarantee that the dis-
parity information which is calculated at the coarse level is valid. The disparity could be
wrong, might have a value differing from that at a fine resolution level, or may not be
detected at all. Because of the need for a large patch size and the assumption of con-
stant disparity within these patches, the disparity map is blurred, i.e., object contours
in the disparity map are blurred as well. This has a considerable negative effect on,
for example, the calculated position of objects and tools. The absolute accuracy of such
a stereoscopic approach depends on the resolution of the microscope and the accuracy
of the algorithm. Without special post-processing, the resulting disparity of an area-
based algorithm can only be calculated to pixel accuracy. In addition, heavy noise
arises in low-texture image regions as a result (disparity map) of a stereo algorithm.
Thus, misinterpretation of the objects and tool positions may result. Moreover, area-
based algorithms usually require stereo images with a relatively large perspective differ-
ence (i.e., a large tilt angle), so the demand on a stereo image acquisition system (e.g.,
beam tilt system) increases. The usage of this class of algorithms leads to difficulties in
the observation of handling processes in the SEM.
A more suitable class of algorithms for observing tasks are the energy models.
These models were developed by studying the visual cortex of mammals. There are cells
(simple and complex cells) in the cortex which are built up in a hierarchical network.
3-D VISION FEEDBACK FOR NANOHANDLING MONITORING 7

This network can be mathematically modelled by Gabor filters and quadrature pairs of
Gabor filters (see (Adelson and Bergen 1985; Daugman 1985; Fleet et al. 1991, 1996) for
further information). The disparity of a certain image region can be estimated by the
response of the cells (Fleet et al. 1991, 1996; Henkel 1997, 2000). The advantage of this
approach is that the calculated disparity map is sharp, i.e., object contours are clear and
in sub-pixel accuracy. Moreover, it is known that there are no difficulties with depth dis-
continuities, and small disparities can be detected quite well by this algorithm. This
means that the angle between the two images can be small. In addition, this approach
is relatively robust against noisy input images and because of the neural hierarchical
structure the algorithm is highly parallel. This means that to speed up the calculation,
a hardware implementation (e.g., on an FPGA), a multi-processor system, or a
computer cluster is useful. An application of an energy-model-based stereo algorithm
for SEM images is presented in Jähnisch and Schiffner (2006).
For the 3-D vision feedback monitoring of handling processes in the SEM, a
universally applicable and flexible 3-D imaging system is needed, which generates
stereo images without any impact on the handling process. For such tasks only
the precise relative depth is needed because only the relative position of the object,
in respect to the tool, is necessary. Therefore, algorithms are needed for determi-
nation of the relative displacements (disparity) and the 3-D information in disturbed
and texture-weak images. The algorithms have to be suitable for the observation and
support of handling processes.
In this article, a new 3-D imaging system for nanohandling is presented, which
fulfills these requirements.

2. STRUCTURE OF THE NEW 3-D IMAGING SYSTEM FOR SEM


In this section, the structure for the new 3-D imaging system which fulfills the
requirements of micro- and nanohandling is presented. The system is schematically
represented in Figure 1 and consists of five main components.
Since the 3-D imaging system should be suitable for the observation of
handling processes, the technique of beam tilting is employed. In order to overcome
the known disadvantages of handling in the beam tilting system, a new system has
been developed. This system does not need a special electron column and is univer-
sally and flexibly applicable in each standard SEM. The stereo images are acquired
by means of the beam tilting system, which is controlled by the beam-control unit.
These images are suitable for the following processing. Before the images are trans-
ferred into the 3-D module, they are pre-processed and filtered by standard methods,
e.g., the median filter. Thus, the noise level in the images is reduced and a contrast
enhancement is carried out. After pre-processing, the images are sent to the 3-D
module. The 3-D module consists of two sub-modules: the vergence system and
the stereo system. The vergence system is needed to align the images against each
other and to compensate unwanted shifts, rotations, and different zoom scales where
necessary. In this way, prepared images are passed on to the stereo system which
processes the image data and provides 3-D information (disparity maps). These
results are processed in the last module (see Figure 1), which provides the data for
the graphic user interface (GUI) or a robot control system. The image acquisition
and the 3-D module are described in more detail in the following sections.
8 M. JÄHNISCH AND S. FATIKOW

Figure 1. Structure of the new 3-D imaging system.

3. IMAGE ACQUISITION AND BEAM CONTROL


Beam-tilting is a suitable stereo image generation technique for nanohandling
tasks in the SEM. However, until now this technique has only been available in SEMs
with a special electron beam column (Abe et al. 2004; Thong and Breton 1992; Lee
and Thong 1999) and cannot be used in standard SEMs. Therefore, the new system
provides an external and flexible means of deflection which can be installed in every
SEM. The construction of this deflection system was carried out by the company
Surface Concept GmbH, Germany in an ongoing cooperation project.
For the beam-tilting system, a magnetic deflection is used. This is fixed under
the electron gun with the help of a metal arm (see Figure 2 – the sample holder is not
shown here, but is moved toward the cannon during the experiment). The arm can be
flexibly mounted on the side panel of the SEM and the position of the magnet can be
adjusted. Additionally, the deflection magnet is supplied with power via a vacuum-
suited cable entry. The task of the external deflection system is to deflect and tilt the
beam twice. With this procedure two images from different perspectives of a sample
are generated by deflected electron beams. By means of the integrated beam-shift
unit of the SEM, the beam is deflected in one direction, and with the additional
3-D VISION FEEDBACK FOR NANOHANDLING MONITORING 9

Figure 2. External deflection system; the sample holder is not shown.

external deflection system the beam is then deflected in the opposite direction. In
Figure 3, the principle is shown schematically. With this approach a tilt angle of
max. 3 was achieved.

Figure 3. Principle of the beam deflection.


10 M. JÄHNISCH AND S. FATIKOW

Figure 4. Structure of the beam control.

For the generation of two different perspectives a beam control unit is


needed. This unit is schematically represented in Figure 4 and consists of four main
components.
With the integrated SEM beam-shift unit, the beam entry position can be chan-
ged in reference to the center position of the deflection system. The image acquisition
unit acquires two images which have to be generated from different perspectives.
Therefore, the view direction has to be changed. Thus, the beam has to be shifted
in the right position and the polarity of the magnetic deflection system has to be
changed as well. This is carried out by the control system which sends the necessary
control commands to the electronic polarity unit and to the beam shift unit.

4. THE NEW 3-D MODULE


The 3-D module is an essential part of the 3-D imaging system. Its purpose is to
extract 3-D information from stereoscopic images. It consists of two components:
the vergence system and the stereo system (see Figure 5). The functionality and
the algorithms of these systems are described below.
The actual disparity calculations and therefore the acquisition of the 3-D infor-
mation are carried out in the stereo system. This receives a stereoscopic image pair
which is processed over two layers. The estimation layer calculates disparity esti-
mates and the coherence layer determines stable disparity values and outputs them
in the form of a disparity map. Besides this, other results are also calculated which
3-D VISION FEEDBACK FOR NANOHANDLING MONITORING 11

Figure 5. Flowchart of the 3-D module.

are described in the next section. Because of the additional special image acquisition
in an SEM, the stereo images are potentially distorted. Vertical shifts, counter
rotation, or different zoom scales might occur between the two stereo images. This
makes it difficult for the stereo system to extract the image disparities. However,
these distortions can be compensated within limits by a vergence system, so the
images can be processed properly by the stereo system.
Thus, the stereo system receives its input from the vergence system, which
aligns the stereo image pair. The vergence system may be passed through several
times. In each pass a fixation point is chosen and the image pair is aligned according
to the vergence signals (shift parameters). The aligned images are passed on to the
stereo system to calculate the disparities relative to the chosen fixation point. The
results of each pass are accumulated, which provides robust 3-D information.
In the next section, the stereo system is described first because the results of this
system are needed for the vergence system, which will be described afterwards.

4.1. Stereo System


The presented stereo algorithm consists of two processing layers: the esti-
mation layer and the coherence layer (see Figure 6).
In the estimation layer, several disparity estimations for every pixel are calcu-
lated. This leads to approximately correct disparity values, but also to random and
incorrect values. Therefore, the task of the coherence layer is to separate the correct
from the incorrect estimations and to calculate a stable result. The central data struc-
ture of the algorithm is the estimation cube. It is built up by the estimation layer and
analyzed by the coherence layer for disparity detection. Both the estimation and the
coherence layers are described in more detail in the following sections. The result of
the stereo algorithm is a sharp and high density disparity map which gives the relative
12 M. JÄHNISCH AND S. FATIKOW

Figure 6. Structure of the stereo algorithm.

depth, a validation map which gives the reliability of the disparity values, and a texture
map which shows the regions with low or high texture in the images. The validation
and texture values are needed for the vergence system, which is explained below.
4.1.1. Estimation Layer The task of the estimation layer is to estimate
several disparities for all image pixels. The base unit of this approach is a disparity
estimation unit (DEU). This unit is capable of estimating the disparity of one pixel
position. The estimation cube consists of lots of such units.
Figure 7 schematically shows a DEU (bottom of the figure). Such a DEU is
built up of several neuronal layers which are based on the concept of the energy

Figure 7. Structure of a disparity estimation unit (DEU); the simple cells of c  and c þ are not shown.
3-D VISION FEEDBACK FOR NANOHANDLING MONITORING 13

model described in (Adelson and Bergen 1985; Henkel 1997, 2000). The two major
layers are the simple cell layer and the complex cell layer. The receptive field of a
simple cell is modeled by Gabor filters which have the basic form (Daugman
1985; Henkel 1997, 2000).
2
=2r2
gðx; /Þ ¼ ex cosðxx þ /Þ: ð1Þ

In Eq. (1), x is the frequency and / is the phase of the cosine function and r deter-
mines the width of the Gaussian.
The discrete filter coefficients are normalized to eliminate the DC component.
The response of one simple cell is given by

X
n
sðx; yÞ ¼ gði; 0Þ  Il ðx þ i; yÞ þ ~gði; 0Þ  Ir ðx þ i; yÞ:
~ ð2Þ
i¼n

This means that a convolution between an image scan line and the coefficients of the
Gabor function is carried out.
The complex cells are modeled as a quadratic filter and the receptive fields are
calculated by the squared input from at least two simple cells (see Figure 8).
With these complex cells, disparity estimations can be carried out. However, it
is known that the response of complex cells depends on the local contrast. To avoid
this undesirable effect, a solution which is described in (Adelson and Bergen 1985) is
employed. For the disparity estimation, three complex cells are used and the
responses are compared. The complex cells differ in their parameters. The first com-
plex cell (cþ) detects slightly positive disparities, the second slightly negative (c ),
and the third zero disparities (c0). The difference between c þ and c- is normalized
by c0. This approach covers only a small disparity range.
Since the detection of larger disparities is desirable, a combination between the
phase shift and position shift model (Adelson and Bergen 1985; Fleet et al. 1991,

Figure 8. Structure of a binocular energy neuron (2-D version) (complex cell); the receptive fields are
phase-shifted by 90.
14 M. JÄHNISCH AND S. FATIKOW

1996; Ohzawa et al. 1990; Qian 1997) is used. This leads to the response of one
DEU by

cþ ðx; y; s; þ/Þ  c ðx; y; s; /Þ


dðx; y; sÞ ¼  s; ð3Þ
c0 ðx; y; s; 0Þ

whereby
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
cðx; y; s; /Þ ¼ sa ðx; y; s; /Þ2 þ sb ðx; y; s; /Þ2 ; ð4Þ

X
n
s
sa ðx; y; s; /Þ ¼ gði  ; þ/Þ  Il ðx þ i; yÞ
~
i¼n
2 ð5Þ
s
þ~gði þ ; /Þ  Ir ðx þ i; yÞ;
2

X
n
s p
sb ðx; y; s; /Þ ¼ gði  ; þ /Þ  Il ðx þ i; yÞ
~
i¼n
2 2 ð6Þ
s p
þ~gði þ ;  /Þ  Ir ðx þ i; yÞ:
2 2

The parameter s determines the tuned disparity of one DEU and / is fixed to p=4.
Because of the fact that ðcþ  c Þ=c0 gives the relative disparity to s, s has to be sub-
tracted from the result. A group of DEUs with different s parameters can cover a
bigger disparity space than a single DEU. If the number of DEUs increases, then
the detectable disparities increase as well. Values for s from 4 to 4 at the step size
of 1=3 are suitable for most cases.
The estimation cube (see Figure 6) is a three dimensional structure which con-
sists of numerous DEUs. The size of the cube in the x-y direction is the same size as
the input images. This is because the disparities for all pixels of the images are
estimated. The size of the third dimension depends on the disparity search space,
i.e., how many DEUs are needed to cover a desired disparity search space.

4.1.2. Coherence Layer The coherence layer analyzes the results of the esti-
mation layer, which are stored in the estimation cube. The result of one DEU can
either approximately represent the correct disparity or have a more or less random
value. The latter case occurs if the true disparity lies outside the working range of the
DEU. However, the estimation cube consists of several DEUs with overlapping
working ranges for one image position. Thus, the disparity search space can be
increased and a robust estimation can be carried out.
To find the correct disparity for one image position (x,y), all relevant DEUs
are grouped to a disparity stack, according to (Henkel 1997, 2000). Through coher-
ence detection, the correct disparity can be estimated, so the biggest cluster of DEUs
which has similar disparity estimations is determined. The final disparity of a
3-D VISION FEEDBACK FOR NANOHANDLING MONITORING 15

Figure 9. Search for the coherence cluster.

position (x,y) is calculated from the average of the estimated results of the coherence
cluster.

1 X
Disðx; yÞ ¼   dðx; y; SðiÞÞ; ð7Þ
Cx;y  i2Cx;y

where Cx,y is the set of DEUs in the coherence cluster of the disparity stack at position
(x,y), and S(i) is a mapping function which maps i to the tuned disparities s of the DEU.
Since the known coherence detection technique (Henkel 1997) is not suited for
a software solution, a new approach is proposed in this article. In every disparity
stack, the biggest DEU group with similar disparity values is searched (see Figure 9).
In Figure 9a, a typical curve in a disparity stack is shown. The DEUs are
shown on the abscissa and the calculated disparities are shown on the ordinate.
In the first step, the DEUs are sorted by their disparity values in ascending
order. Let the disparity values of the sorted DEUs be DðiÞ ði 2 1 . . . NÞ, with N
the size of the disparity stack. Then N groups of DEUs ðGl ; . . . ; GN Þ are built up
and determined by

Gi ¼ fDðiÞ; . . . ; Dðki Þg; ð8Þ

ki ¼ maxðfkj i < k  N ^ DðkÞ  DðiÞ  egÞ: ð9Þ

Figure 9b shows the first group of sorted DEUs. Let dðGi Þ ¼ Dðki Þ  DðiÞ, then the
group consists of as many as possible consecutive DEUs, where dðGi Þ is maximal
and is not greater than e. The biggest cluster which is found in this way is the coher-
ence cluster Cx,y of the disparity stack at the position (x,y). Figure 9c shows the big-
gest cluster for the example. In the worst case, several groups with the same size are
found. In this case, the group is chosen in which dðGi Þ is minimal, since a large group
with a small range of disparity values suggests a stable estimation.
The parameter e should be chosen with care. If e is too low, only small groups
are found which increases the likelihood for clusters of the same size, which in turn
leads to ambiguities. If e is too high, the average of the disparity result is less precise
because noise coding DEUs start to fall in the coherence cluster. In addition, the
16 M. JÄHNISCH AND S. FATIKOW

calculation time increases. It was found out experimentally that a value of e ¼ 0.5 is
a good compromise for SEM images.
4.1.3. Dynamic Texture-based Filter One major problem of the generated
images is that in most of the cases the likelihood of regions having low or no texture
is high. In these regions (e.g., a black background) only small changes in the intensity
exist. Thus, no disparity can be detected by simple cells. Therefore, the disparity map
is noisy in these regions. In this article, a new solution using a dynamic texture-based
filter for overcoming this problem is proposed. One important part of the filter is the
texture map. This shows regions with low and high texture of the images. With this
information, a dynamic texture-based filter can be built up, which reduces the noise
of the disparity map. For the creation of the texture map, the complex cells
c0(x,y;s,0) of all DEUs are used, since they detect the local texture in the stereo image
pair (Adelson and Bergen 1985). The final result is calculated by averaging all com-
plex cell c0 responses of a disparity stack. Therefore, the texture of an image position
(x,y) is given by

1X N
Tðx; yÞ ¼ c0 ðx; y; SðiÞ; 0Þ; ð10Þ
N i¼1

where S(i) as above is a mapping function for mapping the index i to the tuned
disparity s of the DEU and N is the size of the disparity stack. If the texture map
is calculated for every position (x,y) then each disparity value has a corresponding
texture value. In order to reduce the noise in the disparity map, all regions with a
texture value which is smaller than a threshold are removed.
Because of the fact that the range of texture values depends on the image data,
a fixed threshold is not suitable. Therefore, a dynamic adaptation of the threshold is
proposed. A solution can be found by analyzing the histogram of the texture map.
Figure 10a shows a typical histogram of an image pair with several textureless
regions. It can be seen that most of the values are located in the left part of the tex-
ture domain which is limited by the smallest minðfTðx; yÞgÞ and the biggest
maxðfTðx; yÞgÞ value. In contrast, Figure 10b, shows a histogram of an image pair
with few low-texture regions. Here, most texture values are located in the middle.
These histograms lead to the conclusion that a threshold close to the dotted line is

Figure 10. Typical texture histograms: I) many low-texture regions and II) few low-texture regions.
3-D VISION FEEDBACK FOR NANOHANDLING MONITORING 17

desirable. With such a threshold, it is possible to filter large regions of the disparity
map if the noise level is high; on the other hand, it is possible to filter only small
regions of the map if the noise level is low. The threshold can be calculated by

tthres ¼ ðmaxðfTðx; yÞgÞ  minðfTðx; yÞgÞÞ  p


ð11Þ
þ minðfTðx; yÞgÞ

where all disparity values Tðx; yÞ < tthres are ignored and filtered. The parameter p
determines the portion of the filtered texture values. Through experiments
with SEM images of handling processes showing technical structures, e.g., gripper,
cantilever, or nanotubes, p could be determined. A value of p ¼ 0.05 yields good
results in most cases for images showing handling processes.

4.2. Vergence System


Because of the special acquisition of the stereo images in the SEM, the images
may vary in how well they can be used for processing. In a high quality stereo image
pair only horizontal disparities occur, which means that image regions are either
shifted to the left or to the right (or not shifted at all in the case of zero disparities).
However, inaccuracies during image acquisition may lead to additional vertical dis-
parities, which let image regions also be shifted up or down. Figure 11 shows three
examples.
If the SEM beam does not hit the specimen on the exact same spot after
changing the viewing angle, the resulting images will have a global vertical shift
(see Figure 11a). A counter rotation of the images leads to vertical disparities,
especially in the left and right parts of the images (see Figure 11b). If the images dif-
fer in a slight zoom scale, vertical disparities will occur in the top and bottom parts
of the images (see Figure 11c).
The first case is the easiest to compensate because the vertical disparities are
constant over all image regions. The other two cases are more difficult since the ver-
tical disparities differ from region to region. The occurrence of vertical shifts
strongly affects the results of the stereo system, since it is only able to detect horizon-
tal disparities. During image processing it is not known if and where vertical dispa-
rities are present. Therefore, a pre-processing of the images which compensates, for
example, rotational differences, is not possible.

Figure 11. Possible distortions in the images.


18 M. JÄHNISCH AND S. FATIKOW

A vergence system as described in (Henkel 1999) can be used to diminish these


problems by aligning the images not only horizontally, but also vertically. Thus, ver-
tical disparities are eliminated. Images that have different vertical disparities in dif-
ferent image regions can be processed in multiple passes using appropriate vertical
shifts in each pass. The validation values of the coherence layer indicate the image
regions where the disparity calculations have been successful. Therefore, after each
pass disparities that have higher validation values than the disparities calculated in
prior passes are accumulated.
The vergence system is used as an enhancement of the stereo system. The stereo
system is not changed in its functionality, but called multiple times with different
input parameters. Prior to each call a fixation point is chosen. Vergence signals
are generated accordingly, i.e., horizontal and vertical shift parameters, to fuse the
input images at the fixation point. The aligned images are passed on to the stereo
system for calculation and the results are accumulated.
Following, the vergence system is described in more detail.
4.2.1. Fixation Point Selection For areas of the stereo images that bear verti-
cal shifts, the stereo system does not yield stable results. Low validation values are usually
generated. The selection of a suitable fixation point leads to an alignment of the input
images so that the stereo system can properly detect the (horizontal) image disparities.
The main goal for selecting a fixation point is to identify an image region where
no stable results have been calculated so far. For this, the accumulated validation
values serve as indicator (Henkel 1999) and a fixation point is chosen that has low
validation values in its neighborhood. However, textureless image regions also yield
low validation values. If a fixation point in a low texture region is chosen, the stereo
system will not be able to improve the overall results. Therefore, in this article a new
enhanced solution is presented, using the texture information of the texture map and
the dynamically determined texture threshold.
The best results of all passes of the stereo system are collected in an accumu-
lator. At the beginning of the calculations the accumulator is empty, i.e., the vali-
dation values for all image regions are 0. Therefore, the default fixation point is
chosen to be in the center of the stereo image for the first pass of the vergence system.
For all subsequent passes the sum of the validation values and the texture
values in the neighborhood of each pixel (excluding the edges of the image) are com-
puted. The pixel position with the lowest validation sum and sufficient texture
(above the texture threshold) is chosen as the next fixation point.
4.2.2. Vergence Signals Generation Given a fixation point, this processing
step computes the (horizontal and vertical) shift parameters of the stereo image pair
so that they are fused in the fixation point.
For this, the normalized correlation (Lewis 1995) is used. The images are ana-
lyzed in a coarse-to-fine manner. Starting at a coarse resolution the images are
aligned against each other. A square correlation window is used, the size of which
can be defined separately for each resolution.
The determined shift is used as the vergence signals for the current resolution.
Before continuing to analyze the images on the next finer resolution with the corre-
lation technique, the shift value is doubled, since the resolutions in these experiments
differ by a factor of two. This way, the vergence signals are refined from step to step.
3-D VISION FEEDBACK FOR NANOHANDLING MONITORING 19

Experiments have shown that using more than two resolutions is not beneficial. The
sizes of the correlation windows are 32  32 and 64  64 pixels.
The result of the computation is a (whole numbered) shift Dx, Dy that specifies
how many pixels the right image must be shifted against the left image in order to
fuse the images at the fixation point.
4.2.3. Image Alignment After the generation of the vergence signals the
images have to be aligned accordingly before the stereo system can start the disparity
calculations. The images are shifted against each other so that they are fused at the
fixation point, which lets the disparity in this point become zero. The shifted images
are passed on to the stereo system as input parameters.
4.2.4. Accumulation The accumulator collects the best results of all passes
of the stereo system. The accumulator holds memory spaces for the disparity,
validation, and texture values.
Once the stereo system has computed the disparity, validation, and texture
maps the values for each pixel coordinate are copied in to the accumulator if the cur-
rent validation value is higher than the validation value in the accumulator gained
from previous passes. In the first pass, this is the case for all pixel coordinates, since
the accumulator is initially empty. In each pass a different fixation point is chosen.
Since the stereo system calculates the disparities relative to the current fixation point,
the disparity values of all passes are only comparable if they are adjusted
accordingly. Therefore, the horizontal shift of the current pass is added to the result-
ing disparity values before being copied into the accumulator. The vertical shift is
interpreted as an error during image acquisition and therefore ignored.

5. APPLICATION OF THE 3-D MODULE


In this section, two setups are described. In the first setup the beam tilting sys-
tem is employed and thus the 3-D module receives images where the distortions are
small. In the second setup the mechanical tilting technique for the image generation
is employed with the intention to generate distorted images, for example, with ver-
tical shifts, rotations, or different zoom scale. These images are more suitable to test
the vergence system.

5.1. Setup I
Figures 12a and 12b, shows the input images and Figures 12c–12h the results of
the stereo algorithm. The tilt angle is small (about 2) so only small disparities are
detectable. The input images are generated by a low scan speed and therefore the
level of noise is low. By increasing the scan speed and so decreasing the image acqui-
sition time, the noise will increase. For images with a high noise level, it is better to
use a noise filter. The images show a gripper (Mølhave and Hansen 2005) (produced
by Dept. Micro and Nanotechnology Denmark –MIC ) with a jaw size of about 2 mm
and a nanotube (produced by Cambridge University Engineering Department) of
about 300 nm diameter and 12 mm length seen from the side with a magnification
of 3,000 times. The texture around the objects is low and there is a contrast difference
between the images as well. The calculation time for 256  192 pixel size images on Intel
20 M. JÄHNISCH AND S. FATIKOW

Figure 12. Input images and the resulting images of the stereo algorithm for one nanotube and a gripper.
3-D VISION FEEDBACK FOR NANOHANDLING MONITORING 21

Figure 13. Another 3-D plot of the filtered disparity map from Figure 12 g.

P4 3.00 GHz is about 1 second. The calculated disparity map (see Figure 12e) shows
dark regions for high distances and bright regions for low distances with reference to
the observer. In Figure 12f, a 3-D plot of the disparity map is shown.
The proposed algorithm calculates a high density disparity map in sub-pixel
accuracy. The smallest step size of the gripping process in the direction of the
nanotube was 1 mm, which could be detected by the new approach.
It is also noteworthy that the nanotube in the disparity map is displayed wider
than in the input images. This is due to the size of the receptive fields of the DEUs.
This effect can be reduced by increasing the image resolution (and possibly by defin-
ing a region of interest) or reducing the size of receptive fields.
It should be noted that because of the low-texture regions around the gripper
and the nanotube, noise arises around it in the disparity map (see Figure 12e) and the
tool and the tube are not clearly seen in the 3-D plot in Figure 12f. This is a common
problem for all stereo algorithms, since the corresponding problem cannot be solved
in textureless regions. However, the proposed algorithm provides a texture map (see
Figure 12d) which shows regions of high and low texture in the input images. There-
fore, with this map, a dynamic threshold can be calculated and used to filter the
noisy disparity map. The result of the filtering is shown in Figure 12g and the 3-D
plot is shown in Figure 12h. In the result image and the 3-D plot, the noisy regions
of the disparity map are removed and the contours of the cantilever and the nano-
tube are shown. This result is more suitable for the further processing steps than
the noisy disparity map. Because of the textureless region inside the gripper, only
the contour is shown in the disparity map. In Figure 13, another 3-D plot of the fil-
tered disparity map is shown whereby the contours of the objects are thinned out.

5.2. Setup II
In the second setup, the stereo system is tested with disturbed input images
(same size as in setup I) and the vergence system is used for the improvement of
22 M. JÄHNISCH AND S. FATIKOW

Figure 14. Results of the stereo system with 2 rotated input images of a close arrangement of nanotubes.

the results. Here, a pair of images with a high magnification (approx. 11,000 times) is
used which shows a very close arrangement of nanotubes. Figure 14 shows the dis-
parity map of the pair of images, whereby the left input image is rotated by 2 in the
clockwise direction around its center, and the validation map.
In the disparity map a global gradient of grey tone from above to below is seen.
This is due to the rotation, because additional horizontal disparities result, whereby
the disparities in the upper region are increased and in the lower decreased. In
addition, the disparity values in the left and right regions are noisy. Because of
the rotation, vertical disparities arise which cannot be detected by the stereo system.
If they increase too much, as in this example, the horizontal disparities cannot be
determined correctly any more. The validation map (see Figure 14) shows therefore
small values in these regions.
3-D VISION FEEDBACK FOR NANOHANDLING MONITORING 23

Figure 15. Three intermediate results of the five passes of the vergence system with different vergence signals.

Figure 16. End result of the five passes of the vergence system.
24 M. JÄHNISCH AND S. FATIKOW

This result can be improved by the use of the vergence system (see Figures 15
and 16) which is passed five times. Only three intermediate results (run I, III, and IV)
are discussed as representative of the five passes. In the first run, a fixation point in
the top left region is selected and the images are vertically shifted by 1 pixel. The
determined disparities are stable in the top left and in the middle part and noisy
in the right and bottom part of the map. Thus, the validation map shows high values
in the top left and in the middle region. In the third run, a fixation point in the top
right region is selected and the image is horizontally and vertically shifted by 2 pixels.
Therefore, the disparity is stable in the top right and middle region and noisy in the
left and bottom part. In the forth run, a fixation point in the right bottom region is
selected and the image is horizontally shifted by 2 pixels. The determined disparities
are stable in the left part and noisy in the right region of the map. Thus, the vali-
dation map shows high values in the left part.
After each run the disparity values with the highest validation values are trans-
ferred to the accumulator. The intermediate results of the accumulator are not presented.
Figure 15 shows the accumulated disparity map and validation map after the last run.
The vergence system stops if no significant improvement of the results can be achieved.
In comparison with Figure 14, a significant improvement of the results is seen.
The total computation time of the five runs is 5 seconds. Tests with vertically shifted
and differently zoomed images are also carried out. The vergence system leads to
improvements (the results are not shown) of the result as well. Because of the usage
of a correlation algorithm the vergence system has difficulties if image areas with
strong disparity changes are processed. This can however be compensated by the
stereo system. The vergence system has an accuracy of one pixel. This accuracy is
precise enough for the adjustment of the input images which are processed by the
stereo system which has a sub-pixel accuracy.

6. CONCLUSION
In this article, a new universally applicable 3-D imaging system providing
vision feedback for nanohandling monitoring in a SEM is presented. In the new sys-
tem, stereo images are generated by beam tilting followed by processing of the image
data. This is suitable for micro- and nanohandling in an SEM. The 3-D module con-
sisting of a vergence and a stereo system is described. Furthermore, a stereo algor-
ithm based on a biologically motivated energy model by means of coherence
detection, suitable for depth determination for technical handling and manipulation
processes within an SEM, is proposed. Moreover, a new algorithm for coherence
detection analysis suitable for software simulation is presented. By means of the used
stereo approach, a sharp and high density disparity map in sub-pixel accuracy can be
calculated. The algorithm is able to deal with small disparities and can therefore
minimize the effort to generate a stereo image (i.e., only a small tilt angle is required).
The approach is invariant against contrast differences in the stereo images. Also, a
new dynamic filter is proposed which gives the stereo algorithm the capability to
detect regions of low texture in the input images and therefore remove the noise
of the disparity map in these regions.
Additionally, a vergence system with an enhanced solution for fixation point
selection is presented. This makes it possible for the 3-D module to supply accurate
3-D VISION FEEDBACK FOR NANOHANDLING MONITORING 25

and robust results, even with disturbed input images (rotated, vertically shifted or
different zoom scales).
The experiments show that the new system is accurate enough to determine a
distance between objects of less than 1 mm. In future research, a detailed verification
of the new system will be carried out. In addition, an improved software interface
will be available for the user. The implementation of the algorithm will be optimized
and implemented on hardware or a multiprocessor system. Thus, the computation
time will be decreased significantly. The system will be able to calculate only a cer-
tain region of interest, so that only the interesting parts (e.g., the region between
gripper and nanotube) can be observed in more detail.

REFERENCES

Abe, Kazuo, Kouji Kimura, Yasuko Tsuruga, Shin-ichi Okada, Hitoshi Suzuki, Nobuo
Kochi, Hirotami Koike, Akira Hamaguchi, and Y. Yamazaki. 2004. Three-dimensional
measurement by tilting and moving objective lens in CD-SEM(II). Proceedings of SPIE
5375:1112–1117.
Adelson, Edward H. and James R. Bergen. 1985. Spatiotemporal energy models for the per-
ception of motion. Journal of the Optical Society of America A 2(2):284–299.
Alicona. http://www.alicona.com/ (accessed November 2006).
Bhat, Dinker N. and Shree K. Nayar. 1998. Ordinal measures for image correspondence.
IEEE Transaction On Pattern Analysis and Machine Intelligence 20(4):415–423.
Bürkle, A. and F. Schmoeckel. 2000. Quantitative measuring system of a flexible microrobot-
based microassembly station. Proceedings of the 4th Seminar on Quantitative Microscope
Semmering, Austria:191–198.
Daugman, J. G. 1985. Uncertainty relation for resolution in space, spatial frequency, and
orientation optimized by two-dimensional visual cortical filters. Journal of the Optical
Society of America A 2(7):1160–1169.
Fatikow, Sergej, Thomas Which, Torsten Sievers, Helge Hülsen, and Marco Jähnisch. 2006.
Microrobot system for automatic nanohandling inside a scanning electron microscope.
IEEE Int. Conf. on Robotics & Automation –ICRA Orlando, USA:1402–1407.
Fleet, David J., Allan D. Jepson, and Michael R.M. Jenkin. 1991. Phase-based disparity
measurement. Computer Vision, Graphics and Image Processing; Image Understanding
53(2):198–210.
Fleet, David J., Hermann Wagner, and David J. Heeger. 1996. Neural encoding of binocular
disparity: energy model, position shifts and phase shifts. Vision Research 36(12):1839–
1857.
Hein, L. R. O., F. A. Silva, A. M. M. Nazar, and J. J. Ammann. 1999. Three-dimensional
reconstruction of fracture surfaces: area matching algorithms for automatic parallax
measurements. Scanning 21:253–263.
Henkel, R. D. 1997. Fast stereovision by coherence detection. In Proceeding of CAIP’97
LCNS 1296, ed. G. Sommer, K. Daniilidis, and J. Pauli. Springer, Berlin, Germany:
297–303.
Henkel, R. D. 1999. Locking onto 3D-structure by a combined vergence- and fusionsystem.
Proceeding of the 2. Int. Conf. on 3-D Digital Imaging and Modeling Ottawa, Canada:70–76.
Henkel, R. D. 2000. Synchronization, coherence-detection and three-dimensional vision.
Technical Report-Institute of Theoretical Physics University of Bremen, Germany.
Jähnisch, M., H. Huelsen, T. Sievers, and S. Fatikow. 2005. Control system of a nanohandling
cell within a scanning electron microscope. 13th Mediterranean Conf. on Control and
Automation. Limassol Cyprus:964–969.
26 M. JÄHNISCH AND S. FATIKOW

Jähnisch, Marco and Marc Schiffner. 2006. Stereoscopic depth-detection for handling and
manipulation tasks in a scanning electron microscope. IEEE Int. Conf. on Robotics &
Automation-ICRA. Orlando, USA:908–913.
Kasaya, Takeshi, Hideki Miyazaki, Shigeki Saito, and Tomomaso Saito. 1998. Micro object
handling under SEM by vision-based automatic control. Proceedings of SPIE 3519:
181–192.
Kuchler, Gregor and Rolf Brendel. 2003. Reconstruction of surface of randomly Textured
silicon. Progress in Photovoltaics: Research and Application 11(2):89–95.
Lacey, A. J., N. A. Thacker, S. Crossley, and R. B. Yates. 1998. A multi-stage approach to the
dense estimation of disparity from stereo SEM images. Image and Vision Computing
16:373–383.
Lee, K. W. and J. T. L. Thong. 1999. Improving the speed of scanning electron microscope
deflection systems. Journal of Measurement Science and Technology 10:1070–1074.
Lewis, J. P. 1995. Fast normalized cross-correlation. Vision Interface 120–123.
Mølhave, K. and O. Hansen. 2005. Electro-thermally actuated microgrippers with integrated
force-feedback. Journal of Micromechanics and Microengineering 15:1265–1270.
Ohzawa, I., G. C. DeAngelis, and R. D. Freeman. 1990. Stereoscopic depth discrimination in
the visual cortex, neurons ideally suited as disparity detectors. Science 249:1037–1041.
Qian, N. 1997. Binocular disparity and the perception of depth. Neuron 18:359–368.
Scherer, S., P. Werth, A. Pinz, A. Tatschl, and O. Kolednik. 1999. Automatic surface recon-
struction using SEM images based on a new computer vision approach. Institute of
Physics Conference Series (Bristol, U.K.) 161(3):107–110.
Thong, J. T. L. and B. C. Breton. 1992. A topography measurement instrument based on the
scanning electron microscope. Journal of Science Instrument 63(1):131–138.

You might also like