Professional Documents
Culture Documents
VIGILANTE: A Machine and Algorithms For Neural "Rich Pixel" Image Recognition at Video-Rate and Faster
VIGILANTE: A Machine and Algorithms For Neural "Rich Pixel" Image Recognition at Video-Rate and Faster
Introduction
In spite of recent gains, image recognition still suffers from disappointing recognition performance and low
processing speed. Generally, system designers are forced onto the horns of a dilemma between algorithms that are
optimized for limited computational architectures and algorithms that might perform better, but which cannot be
effectively tested given what computer resources are readily available today. Many ATR designers would prefer to
develop recognition algorithms based upon various eye-brain theories, but these present special hardware
challenges, perhaps requiring a new architecture. The VIGILANTE neural processor was designed to provide an
experimental processor for such ATR research at video rates over a wide variety of algorithms. Its architecture is
based upon a rich-pixel processing paradigm.
2.
3.
4.
The importance of making this distinction is that the structure of these four tasks helps simplify hardware. Many
image processing systems suffer from trying to perform steps 2-4 on the same processing architecture, even though
the processes themselves have very different structures. To speed up recognition, designers often resort to specialpurpose circuits that implement a particular algorithm quickly, but which lack generality for other types of
problems. VIGILANTEs philosophy, however, is to map the above functions to a relatively small set of special-
purpose hardware that, when properly configured, can implement a wide variety of algorithms. For example, the
regularity of synthetic image generation tasks (spatial filtering, motion, correspondence) justifies special-purpose
hardware. Pixel-level fusion, although less structured can be performed on regular parallel architectures such as
SIMD arrays. Semantic analysis involves a wide variety of algorithmic approaches. However, it seldom presents a
significant computational bottleneck compared to the other functions. This task can be handled with generalpurpose hardware. This hardware mapping concept is illustrated in Figure 2.
Examples
Type of Process
C
Convolve ap , Kn , 1 , 1
k
k
C
Normalize C , 0 , 255
k
ap
K
0
K
1
N C
K
2
K
3
K
4
K
5
K
6
N4
Convolution
Gray-scale morphology
rotation/scale invariant
patterns
N2 - point ops
T ( P , 180)
50
summing
thresholding
masking
Other
100
h
i
Optimal Hardware
100
200
i
300
Connected component
Local histograms
semantic interpretation
Massively parallel
architectures (in some
cases) or serial processors
based upon the algorithm
High-speed, large format convolutions (for N4 operations) operating at about 2 x 1012 OPS
A SIMD point operations processor (for N2 operations) operating at about 10 nominal GigaOPS
A high-bandwidth link between the convolution processor and the point operations processor
General purpose computer with adequate (PCI) bussing to remainder of system for program support
ANTE is capable of performing sophisticated feature-based, context-sensitive image recognition at video frame
rates. ANTE takes advantage of a general ATR process flow which is depicted in Figure 5:
1.
The 3DANN-M network produces 64 simultaneous convolutions with 64x64 masks. This corresponds to the
N4 operations shown above.
2.
The 64 analog values generated by 3DANN-M are converted to 8-bit digital values and passed along to the
associated feedback memory and Point Operation Processor (POP).
3.
POP takes the output from the 3DANN-M and performs those target recognition/tracking functions that can be
performed at a pixel level. This corresponds to the N 2 operations above.
4.
vi ew Macintosh picture.
equivalent of as many as 64 video streams, or about 128Mbytes/sec. To allow high-speed transfer of output, data is
staged into a memory and then sent over 16 custom I/O circuits.
A simple way to perform this task is to take convolutions between the base image and feature detectors (kernels)
such as the ones shown in Figure 6. The next step is to take develop a zero-mean version of each kernel, i.e.
normalizing the kernel such that
ni, j
correlation. Where the kernel is a good match in the tested image, the output (convolved) image is bright, thus one
might be tempted to use thresholding of these convolution outputs is a simple way of detecting features.
Unfortunately, the performance of this approach for images other than the reference image is generally
disappointing. Figure 6 shows how the reference image produces several false alarms after the threshold was set
sufficiently low to detect all features. The same figure also shows how performance is far worse using the test
image. In fact, the simple method of convolution followed by thresholding generally fails whenever the system
looks for features in a new image.
mouth
r. eye
l. eye
nose
r. ear
l. ear
chin
Key:
- Eyes (left, right)
- Ears (left, right)
- Nose
- Mouth
- Chin
Reference Image
Test Image
Figure 6: Convolution filters K 0 through K6 and features detected based upon those filters
mouth
r. eye
l. eye
nose
r. ear
l. ear
chin
1. Blur
Bn Cn * k disk
2. Shift
S ni , j B ni I
, j Jn
M i , j tanh 1 [a ( Ti , j t )]
E ni , j Z ni I
Z M * k gaussian
4. shift centroid
out for masking
E
0
E
1
E
2
E
3
E
4
E
5
E
6
H
0
H
1
H
2
H
3
H
4
H
5
H
6
, j Jn
5. mask
H i , j Ci , j E i , j
6. Threshold
Fi , j ( E i , j t )
Conclusions
In this paper we have discussed the architecture of the ANTE processor and how it applies to a processing
paradigm of rich-pixel image recognition, where synthetic images are generated and then fused. This
architecture provides for the possibility of end-to-end sophisticated image recognition at video frame rates. It also
allows for the mixing of evidence among spatial, motion and spectral features, all done in parallel at a pixel level.
The ANTE hardware should be functional by the end of summer 1997, and the results from running algorithms
similar to those shown should be available then. Future directions for research include extensive mapping of
algorithms to this architecture an the architectures expansion to include more highly-integrated and streamlined
circuitry, particularly in the areas of internal communications and the Point Operations Processor (POP).
References
J. Carson, On focal plane array feature extraction using a 3-D artificial neural network (3DANN), Proc. SPIE, vol.
1541, Part I: pp. 141-144, Part II: pp. 227-231, 1991.
2
T. Duong, S. Kemeny, T. Daud, A. Thakoor, C. Saunders, and J. Carson, Analog 3-D neuroprocessor for fast frame
focal plane image processing, SIMULATION, vol. 65, no. 1, pp. 11-24, 1995.
3
T. Duong, T. Thomas, T. Daud, A. Thakoor, and B. Lee, 64x64 Analog input array for 3-dimensional neural network
processor, Proceedings of the 3rd International Conference on Neural Networks and Their Applications, Marseilles,
France, 1997.
4
D. Hammerstrom, E. Means, M. Griffin, G. Tahara, K. Knopp, R. Pinkham, and B. Riley, An 11 million transistor
digital neural network execution engine, Proceedings of the IEEE International Solid-State Circuits Conference, pp.
180-181, 1991.
5
M. Turk and A. Pentland, Eigenfaces for recognition, J. of Cognitive Neuroscience, vol. 3, pp. 71-86, 1991.
C. Padgett, G. Cottrell, and R. Adolphs, Categorical perception in facial emotion classification, Proceedings of the
18th Annual Conference of the Cognitive Science Society, Hilldale, pp. 201-207, 1996.
7
C. Padgett, M. Zhu, and S. Suddarth, Detection and object identification using VIGILANTE processing system, Proc.
SPIE, vol. 3077, 1997.