Professional Documents
Culture Documents
1, JANUARY 2010 15
Abstract—Stereo vision is a well-known ranging method be- In spite of its usefulness as a range sensor, stereo vision
cause it resembles the basic mechanism of the human eye. has limitations for real-life applications due to its considerable
However, the computational complexity and large amount of data
computational expense [6]. The instruction cycle time delay
access make real-time processing of stereo vision challenging be-
cause of the inherent instruction cycle delay within conventional caused by numerous repetitive operations causes the real-time
computers. In order to solve this problem, the past 20 years of processing of stereo vision to be difficult when using a conven-
research have focused on the use of dedicated hardware architec- tional computer. For example, several seconds are required to
ture for stereo vision. This paper proposes a fully pipelined stereo execute a medium-sized stereo vision algorithm for a single
vision system providing a dense disparity image with additional
pair of images on a 1 GHz general-purpose microprocessor
sub-pixel accuracy in real-time. The entire stereo vision process,
such as rectification, stereo matching, and post-processing, is [7]. This low frame rate results in limited applicability of
realized using a single field programmable gate array (FPGA) stereo vision, especially for real-time applications, because
without the necessity of any external devices. The hardware quick decisions must be made based on the vision data [1].
implementation is more than 230 times faster when compared To overcome this limitation, various approaches have
to a software program operating on a conventional computer,
been developed since the late 1980s in order to perform
and shows stronger performance over previous hardware-related
studies. the 3-D depth calculation with stereo vision in real-time
using hardware-based systems such as digital signal proces-
Index Terms—Field programmable gate arrays, integrated
sors (DSP), field programmable gate arrays (FPGA), and
circuit design, stereo vision, video signal processing.
application-specific integrated circuits (ASIC). Kimura et al.
[8] designed a convolver-based nine-eye stereo machine called
SAZAN which generates dense stereo depth maps with 25
I. Introduction stereo disparities with 320 × 240 images at 20 frames per
second (f/s). Woodfill et al. [1] developed a DeepSea stereo
TEREO vision is a traditional method for acquiring
S 3-D information from a stereo image pair. Stereo vision
has many advantages over other 3-D sensing methods in
vision system based on the DeepSea processor, an ASIC,
which computes absolute depth at 200 f/s with a 512 × 480
input image pair and 52 stereo disparities. FPGA-based stereo
terms of safety, undetectable characteristics, cost, operating
vision systems have also been introduced due to the rapid
range, and reliability [1]. For these reasons, stereo vision is
development of programmable devices. Darabiha et al. [9]
widely used in many application areas including intelligent
developed a phase-based stereo vision design which generates
robots, autonomous vehicles, human–computer interfaces, and
20 stereo disparities using 256 × 360 pixel images at 30 f/s
security and defense applications [2]–[5].
by four Xilinx XCV2000E FPGAs. Jia et al. [10] developed
Manuscript received October 29, 2008; revised January 16, 2009. First MSVM-III, which computes trinocular stereopsis using one
version published July 7, 2009; current version published January 7, 2010. Xilinx XC2V2000 FPGA. This system runs at approximately
This research was performed for the Intelligent Robotics Development Pro- 30 f/s with 640 × 480 pixel images within a 64 pixel disparity
gram, one of the 21st Century Frontier Research and Development Programs,
funded by the Ministry of Commerce, Industry and Energy, Korea. This paper search range. Even though considerable progress has been
was recommended by Associate Editor Y.-K. Chen. made, improvements are still needed because the previously
S. Jin and J. W. Jeon are with the School of Information and Communication proposed systems have various deficiencies in regard to the
Engineering, Sungkyunkwan University, Suwon, Gyeonggi-do 440-746, Korea
(e-mail: coredev@ece.skku.ac.kr; jwjeon@ yurim.skku.ac.kr). stereo matching method, frame rate, scalability, and one-chip
J. Cho is with the Department of Computer Science and Engineer- integration of pre and post-processing functions. All of the
ing, University of California, San Diego, CA 92093-0404 USA (e-mail: real-time stereo vision systems developed so far have only
jucho@cs.ucsd.edu).
X. D. Pham is with the Information Technology Faculty, Saigon Institute of partially satisfied the requirements in terms of an architectural
Technology, Hochiminh City, Vietnam (e-mail: xuanpd@saigontech.edu.vn). point of view.
K. M. Lee is with the Department of Electrical and Computer En- This paper proposes dedicated hardware architecture for
gineering, Seoul National University, Seoul 151-742, Korea (e-mail: ky-
oungmu@snu.ac.kr). real-time stereo vision and integrates it within a single chip to
S.-K. Park and M. Kim are with the Center for Intelligent Robotics, overcome these limitations. The entire stereo vision process,
Korea Institute of Science and Technology, Seoul 136-791, Korea (e-mail: including rectification, census transform, stereo matching, and
skee@kist.re.kr; munsang@kist.re.kr).
Digital Object Identifier 10.1109/TCSVT.2009.2026831 post-processing is designed and implemented using a single
c 2010 IEEE
1051-8215/$26.00
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:4326 UTC from IE Xplore. Restricon aply.
16 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 1, JANUARY 2010
FPGA. Regarding the issue of scalability, the proposed system applied to solve the correspondence problem. Since pixels with
is intensively pipelined and synchronized with a pixel sam- the same intensity value can appear many times within the
pling clock. In general, a sequential bottleneck is eliminated image, a group of pixels, called windows, is used to compare
when we synchronize the entire system with an individual the corresponding points. Several stereo vision algorithms
data element, rather than on the flow of control. As a result, were introduced based on different correspondence methods–
the overall performance improves [11]. For the vision-related local and global. The local method includes block matching,
systems, in the same manner, the throughput of the imple- feature matching, and gradient-based optimization, while the
mented system is continuously increased following the input global method includes dynamic programming, graph cuts,
pixel frequency, unless the pixel frequency does not exceed the and belief propagation. Because we can restrict the searching
maximum allowed frequency of the design. For this reason, the range of stereo matching to one-dimension, local methods can
implemented system can generate disparity images of VGA be very efficient compared with global methods. Moreover,
(640 × 480) resolution at a conventional video-frame rate, 30 even though local methods experience difficulties in locally
and 60 f/s, with a pixel clock of 12.2727 and 24.5454 MHz, ambiguous regions, they provide acceptable depth information
respectively. Because no other processing clock is used, the for the scene with the aid of accurate calibration [13]. For
implemented system can generate disparity images with the this reason, we employed the local stereo method as the cost
same resolution at the designated frame rate when connected function of the proposed system. In particular, we used census-
to the cameras with different pixel clock frequencies without based correlation because of its robustness to the random noise
the requirement for adaption of the internal system design. As within a window and its bit-oriented cost computation [19].
a result, the implemented system is flexible with respect to the The census transform maps the window surrounding the
camera input parameters, such as frame rate and image size. pixel p to a bit vector representing the local information about
The remainder of this paper is organized as follows: Sec- the pixel p and its neighboring pixels. If the intensity value
tion II presents common background information about the of a neighboring pixel is less than the intensity value of pixel
stereo vision process, including pre and post-processing, Sec- p, then the corresponding bit is set to 1, otherwise it is set to
tion III gives a detailed description about the implementation 0. The dissimilarity between two bit strings can be measured
of the proposed real-time stereo vision system, Section IV through the hamming distance, which determines the number
discusses and evaluates the experimental performance results, of bits that differ between these two bit strings. To compute
and Section V draws the conclusion. the correspondence, the sum of these hamming distances over
the correlation window is calculated in (3), where I1′ and I2′
represents the census transform of template window I1 and
II. Stereo Vision Background
candidate window I2
Stereo vision refers to the issue of determining the 3-D
structure of a scene from two or more images taken from Hamming(I1′ (u, v), I2′ (x + u, y + v)). (3)
distinct viewpoints [13]. In the case of a binocular stereo, the (u,v)∈W
depth information for the scene is determined by searching the The results for stereo matching provide a disparity, which
corresponding pairs for each pixel within the image. Since this indicates the distance between the two corresponding points.
search method is based on a pixel-by-pixel comparison, which For all possibilities, the disparity should be computed and
consumes much computational power, most stereo matching evaluated at every pixel within the searching range. However,
algorithms have assumptions about the camera calibration and reliable disparity estimations cannot be calculated on surfaces
epipolar geometry [14]. with no texture or repetitive texture when utilizing a local
An image transformation, known as rectification, is applied stereo method [20]. In this case, the disparities at multiple
in order to obtain a pair of rectified images from the original locations within the image may point to the same location in
images as in (1)–(2). In the equations, (x, y) and (x′′ , y′′ ) the other image despite each location within one image being
are the coordinates of a pixel in the original images and assigned, at most, one disparity. This unique test method tracks
the rectified images, respectively. To avoid problems such the three smallest matching results, v1 , v2 , and v3 , instead of
as reference duplication, reverse mapping is used with inter- seeking only the minimum. The pixel has a unique minimum
polation. Once the image pair is rectified, 1-D searching with if the minimum, v1, lies between v2 and v3 , as in (4), where
the corresponding line is sufficient to evaluate the disparity N is an experimental parameter. Usually, the value of N lies
[15]–[18] between 1.25 and 1.5; we used a value of 1.33 in all our
⎡ ′ ⎤ ⎡ ′′ ⎤ ⎡ ′ ⎤ ⎡ ′′ ⎤ experiments
xl xl xr xr
⎢ ′ ⎥
y = H −1 ⎢ ′′ ⎥
y ,
⎢ ′ ⎥
y = H −1 ⎢ ′′ ⎥
y (1) v2 > Nv1 and v3 > Nv1 . (4)
⎣ l ⎦ l ⎣ l ⎦ ⎣ r ⎦ r ⎣ r ⎦
′ ′ The occlusions can also mislead the disparity result since
zl ⎡1′ ⎤ zr 1
xl ⎡ xr′ ⎤ a scene is captured with two different viewpoints and no
xl ′
⎢ zl ⎥ xr z′r correlated information exists between the image pair. The
= ⎣ ′ ⎦, = ⎣ ′ ⎦. (2)
yl yl yr yr left–right consistency check (LR-check) occludes the points
z′l z′r
where the two images are not negatives of each other [21].
After rectification, stereo matching (given a pixel in one An experimental parameter, T, is compared with Error in (5)
image, find the corresponding pixel in the other image) is and (6) in order to evaluate the disparity result, where xr is
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:4326 UTC from IE Xplore. Restricon aply.
JIN et al.: FPGA DESIGN AND IMPLEMENTATION OF A REAL-TIME STEREO VISION SYSTEM 17
Fig. 1. High-level hardware architecture of the proposed real-time stereo vision system.
the right coordinate, xl′ is an estimated left match at disparity the size of the image. In our experiments, we used 400 as a
dxrr′ , and dxl ′ is the left-based estimated disparity threshold value
l
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:4326 UTC from IE Xplore. Restricon aply.
18 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 1, JANUARY 2010
Ih
Ih
Bu = MAX
MAX
′′
(y − H r (y ))
,
y′′ =0
y =0
Ih
Ih
′′ ′′
MIN(y′′ − Hl (y′′ ))
it requires more than one clock cycle when fetching data Bl = MAX
MIN
(y − H r (y ))
,
y′′ =0
y′′ =0
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:4326 UTC from IE Xplore. Restricon aply.
JIN et al.: FPGA DESIGN AND IMPLEMENTATION OF A REAL-TIME STEREO VISION SYSTEM 19
Fig. 4. Hardware architecture for the parallel processing of window pixels. registers which contain the pixels in the window. Since the
window buffer consists of registers, it guarantees instant access
to its elements.
B. Stereo Matching Module The scan-line buffer used in the proposed system consists
After the rectification process, stereo matching is performed of 10 dual-port memories, and each memory can store one
between the two corresponding images. As shown in Fig. 3, scan-line of an input image. Assuming that the coordinates
stereo matching is divided into two stages, the census trans- of the current input pixel are (x, y) and the intensity value
form stage and the correlation stage. In the census transform of the pixel is I(x, y), the connections between the memory
stage, the left and right images are transformed into images are shown in Fig. 4. Fig. 4 also shows the scan-line buffer
with census vector pixel values instead of gray-level intensity conceptually converting a single pixel input into a pixel
pixel values. We use an 11 × 11 window size for the census column vector output. A window buffer is a square-shaped
transform to generate a 120 bit long transformed bit vector per set of shift registers, and each register can store one pixel
each pixel. Since the census transform is a type of window- intensity value in the input image. The right-most column of
based processing, the neighboring pixels of the processing the window buffer acquires the pixel column vector, which is
pixel need to be accessed simultaneously. To achieve this, we generated from the scan-line buffer, and the intensity values
use a scan-line buffer and a window buffer. A scan-line buffer in each register are shifted from right to left at each pixel
is a buffer which is able to contain several numbers of scan- clock. As a result, all 11 × 11 pixels exist in the window buffer,
lines in the input image, and a window buffer is a set of shift and each pixel can be accessed simultaneously. After a pixel
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:4326 UTC from IE Xplore. Restricon aply.
20 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 1, JANUARY 2010
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:4326 UTC from IE Xplore. Restricon aply.
JIN et al.: FPGA DESIGN AND IMPLEMENTATION OF A REAL-TIME STEREO VISION SYSTEM 21
TABLE I
Coefficients of Interpolation Filter
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:4326 UTC from IE Xplore. Restricon aply.
22 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 1, JANUARY 2010
TABLE II
Performance Summary of Reported Stereo Vision Systems
Implemented system Image size Matching method Disparity range Window size Rectification Frames per second
SAZAN 320 × 240 SSAD 25 13 × 9 No 20
FPGA
Kuhn et al. SSD/ Census (3 × 3)
ASIC 256 × 192 Census 25 Corr (10 × 3) No 50
Darabiha et al.
256 × 360 Local weighted 20 N/A No 30
FPGA
MSVM-III SSAD/
FPGA 640 × 480 Trinocular 64 N/A Hard-wired 30
DeepSea Census (N/A)
ASIC 512 × 480 Census 52 Corr (N/A) Firmware 200
Software program Census (11 × 11)
3.2 GHz, SSE2 640 × 480 Census 64 Corr (15 × 15) Software 1.1
This paper Census (11 × 11)
FPGA 640 × 480 Census 64 Corr (15 × 15) Hard-wired 230
TABLE III
Evaluation Result of the Proposed System Using Middlebury Stereo-Pairs
Stereo-pair Ground truth Disparity Bad pixels (δd = 1.0, nonocc) Percent of
bad pixels
nonocc: 9.79
all : 11.56
disc : 2029
nonocc: 3.59
all : 5.27
disc : 36.82
nonocc: 12.50
all : 21.50
disc : 30.57
nonocc: 7.34
all : 17.58
disc : 21.01
whether the selected disparity is a unique minimum, double the uniqueness test module, with C(di −2) and C(di +2) giving
minimum or non-unique minimum [22], [24]. additional precision. By using the parabola fitting method
If a disparity result passes the LR-check and uniqueness test, in (7), the sub-pixel estimation is performed with ease. To
sub-pixel estimation and spike removal are applied to increase decrease the complexity and latency of the operations, the
the reliability and accuracy. The sub-pixel estimation again combination of shifting, addition, and subtraction is used for
utilizes C(di ), C(di − 1) and C(di + 1), which was tracked in each performance of multiplication and division, as shown in
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:4326 UTC from IE Xplore. Restricon aply.
JIN et al.: FPGA DESIGN AND IMPLEMENTATION OF A REAL-TIME STEREO VISION SYSTEM 23
Fig. 10. Resultant disparity image with a different post-processing level. Images are captured online under the common indoor scenery. (a) Raw image (Left),
(b) disparity result, (c) LR-check, uniqueness test, sub-pixel, and (d) final disparity with spike removal.
Fig. 7. After a total 23 cycles of clock latency, a calculated The performance verification of the proposed system is
sub-pixel value is obtained and synchronized with the pixel completed using cameras with different frames per second
clock. values. Both the 30 and 60 f/s cameras passed the work-
Spike removal is implemented based on the window pro- ing test just under 12.2727 and 24.5454 MHz, respectively.
cessing architecture, and described as part of the stereo match- In the same manner, we can expect the maximum per-
ing module. Started by the label “1,” the disparity output formance of the proposed system to occur when using a
generated by the stereo matching module is compared with camera with 230 f/s, especially when considering the under-
its neighboring disparity values and labeled by following (8). estimation characteristic of the synthesis tool [12]. Since
Generally, the spike occupies only a small region and does not the maximum frame rate of the camera available for this
have concave behavior. For this reason, we can skip the sorting research is only 60 f/s, the performance of the proposed
of the equivalent label pairs to simplify the removal process. system is verified through the timing simulation using Mentor
Fig. 8 shows a block diagram of the implemented spike Graphics ModelSim 6.1f as the simulation environment with
removal module. While counting the number of disparities the test vectors which describe the actual behavior of the
with identical labels, each disparity value is stored within the camera.
line buffer because the vertical size of the region should be The software program corresponding to the proposed system
considered. If the non-spike condition is satisfied, the resultant was implemented using ANSI-C language for the use of per-
disparity value is delivered to the system output, including formance and disparity quality comparison. Several optimiza-
any latency caused by the line buffer. Otherwise, the disparity tion methods and processor-specific functions, such as mul-
value is replaced with a blank. timedia extensions (MMX) and streaming single instruction
multiple data extensions 2 (SSE2), are considered for the full
utilization of the hardware resources [25]. The experimental
IV. Experimental Results environment for the software program is described as follows:
Microsoft Visual C++ 7.0, Microsoft Windows XP SP2 Profes-
The proposed real-time stereo vision system is desig- sional, Pentium 4 3.2 GHz, and 2 GB DDR SDRAM. Table II
ned and coded using VHDL and implemented using a shows the performance comparison result. As shown in Table
Virtex-4 XC4VLX200-10 FPGA from Xilinx. Fig. 9 shows II, the proposed system, even though it was implemented in
the implemented system. The implemented system inter- an FPGA, shows improvements over the previous systems in
faces two VCC-8350CL cameras from the CIS corporation both performance and integration.
through standard camera-link format, or directly controls a Table III shows the evaluation results of the cost function
MT9M112 CMOS sensor from Micron as a stereo camera which was used in the proposed system. We measured the
pair. Table I summarizes the device utilization reports from percentage of bad pixels in the Middlebury stereo-pairs, which
the Xilinx synthesis tool in ISE release 9.1.03i, XST J.33. consist of Tsukuba, Venus, Teddy, and Cones [26]. The per-
The number of used slices and the maximum allowed fre- centage of bad pixels was obtained as in (10), where dC(x, y)
quency of the proposed system are 51,191 (about 57% of and dT (x, y) are the computed disparity map and the ground
the device) and 93.0907 MHz, as shown in Table I. Since truth map, respectively, N is the total number of pixels, and
the system is based on the local pixel processing, the size δd is the disparity error tolerance. We used δd = 1.0 for further
of the input image does not affect the device utilization comparison with the state-of-the-art stereo vision algorithms
directly. However, both the disparity range and window size on the Middlebury stereo evaluation page
are required to be adjusted to maintain the resulting disparity
quality if the size of the image is changed. Among the sub- 1
B= (|dC (x, y) − dT (x, y)| > δd ). (10)
modules listed in Table I, the amount of the logic resources N (x,y)
consumed by census transform and correlation modules are
linearly increased as the disparity range and window size Since the hardware was built for real-time processing of
increase. The other modules of the system are relatively less an incoming image, the disparity results of the proposed
affected. design were generated through HDL functional simulation,
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:4326 UTC from IE Xplore. Restricon aply.
24 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 1, JANUARY 2010
Fig. 11. Online results of the proposed stereo vision system. (a) Raw image (left), (b) disparity result (proposed), (c) disparity result (bumblebee), (d) raw
image (left), (e) disparity result (proposed), and (f) disparity result (bumblebee).
using the disparity search range limitation for each stereo- format of video signals as their input and output. Because the
pair. As shown in the figure, the proposed system provided video signals define the format of the image, each module
acceptable disparity accuracy, considering the real-time per- can be considered as an independent imaging sensor. As a
formance. result, the flexibility of the entire system, as well as individual
Fig. 10 is the resultant disparity image captured in a modules, increased with respect to resolution and frame rate.
common indoor environment. The images were processed in In addition, we can expect performance enhancement since
real-time and obtained online from the implemented system at the proposed system is designed to be synchronized on indi-
different post-processing levels. As the post-processing levels vidual pixel data flow, which results in the elimination of the
advanced, the uncertain pixels were removed to increase the sequential bottleneck caused by instruction cycle delay.
confidence of the disparity result. Because the disparity truth In the future, the proposed real-time stereo vision system
of the common scene is difficult to obtain in real-time, a will be implemented with an ASIC for the purpose of commer-
qualitative comparison was performed using a commercial cial use. Considering the efficiency of implementing gate logic
product, Digiclops stereo vision software with a Bumblebee within an ASIC, we can expect performance enhancement with
camera from Point Grey Research, Inc. (PGR). To compare both quality and speed. The proposed system can be used
the disparity results with those of a similar configuration, for higher level vision applications such as intelligent robots,
the Digiclops stereo vision system was configured to have surveillances, automotives, and navigation. The additional
a disparity range of 0–63 and a 21 × 21 stereo mask. As application areas in which the proposed stereo vision system
shown in Fig. 11, the proposed system generated disparity can be used will continue to be evaluated and explored.
results comparable to the commercial product, with the real-
time performance. References
[1] J. I. Woodfill, G. Gordon, and R. Buck, “Tyzx DeepSea high speed stereo
V. Conclusion vision system,” in Proc. IEEE Comput. Soc. Workshop Real-Time 3-D
Sensors Use Conf. Comput. Vision Pattern Recog., Washington D.C.,
In this paper, we have built an FPGA-based stereo vision Jun. 2004, pp. 41–46.
system using census transform, which can provide dense dis- [2] M. Bertozzi and A. Broggi, “GOLD: A parallel real-time stereo vision
system for generic obstacle and lane detection,” IEEE Trans. Image
parity information with additional sub-pixel accuracy in real- Process., vol. 7, no. 1, pp. 62–81, Jan. 1998.
time. The proposed system was implemented within a single [3] A. Bensrhair, M. Bertozzi, A. Broggi, A. Fascioli, S. Mousset, and G.
FPGA including all the pre and post-processing functions such Toulminet, “Stereo vision-based feature extraction for vehicle detection,”
in Proc. IEEE Intell. Vehicle Symp., Paris, France, vol. 2. Jun. 2002, pp.
as rectification, LR-check, and uniqueness test. The real-time 465–470.
performance of the system is verified under a common real-life [4] N. Uchida, T. Shibahara, T. Aoki, H. Nakajima, and. K. Kobayashi,
environment to measure the further applicability. “3-D face recognition using passive stereo vision,” in Proc. IEEE Int.
Conf. Image Process., Genoa, Italy, vol. 2. Sep. 2005, pp. 950–953.
To achieve the targeted performance and flexibility, we have [5] D. Murray and C. Jennings, “Stereo vision-based mapping and navi-
focused on the intensive use of pipelining and modularization gation for mobile robots,” in Proc. IEEE Int. Conf. Robotics Autom.,
while designing the dedicated datapath of the proposed stereo Albuquerque, NM, vol. 2. Apr. 1997, pp. 1694–1699.
[6] J. Woodfill, B. von Herzen, and R. Zabih, Frame-rate robust
vision system. We made all the functional elements operate in stereo on a PCI board. Available: http://www.cs.cornell.edu/rdz/Papers/
synch with a single pixel clock as well as utilizing the same Archive/fpga.pdf
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:4326 UTC from IE Xplore. Restricon aply.
JIN et al.: FPGA DESIGN AND IMPLEMENTATION OF A REAL-TIME STEREO VISION SYSTEM 25
[7] M. Kuhn, S. Moser, O. Isler, F. K. Gurkaynak, A. Burg, N. Felber, H. Junguk Cho received the B.S., M.S., and Ph.D.
Kaeslin, and W. Fichtner, “Efficient ASIC implementation of a real-time degrees from the School of Information and Com-
depth mapping stereo vision system,” in Proc. IEEE Int. Circuits Syst. munication Engineering, Sungkyunkwan University,
(MWSCAS ’03), Cairo, Egypt, vol. 3. Dec. 2003, pp. 1478–1481. Suwon, Korea, in 2001, 2003, and 2006, res-
[8] S. Kimura, T. Shinbo, H. Yamaguchi, E. Kawamura, and K. Nakano, pectively.
“A convolver-based real-time stereo machine (SAZAN),” in Proc. IEEE From 2006 to 2007, he was a Research Instructor
Comput. Soc. Conf. Comput. Vision Pattern Recognit., Fort Collins, CO, in the School of Information and Communication
vol. 1. Jun. 1999, pp. 457–463. Engineering, Sungkyunkwan University. In 2008, he
[9] A. Darabiha, J. Rose, and W. J. Maclean, “Video-rate stereo depth was with the Department of Computer Science and
measurement on programmable hardware,” in Proc. IEEE Comput. Soc. Engineering, University of California, San Diego, as
Conf. Comput. Vision Pattern Recognit., Madison, WI, vol. 1. Jun. 2003, a Postdoctoral Scholar. His research interests include
pp. 203–210. motion control, image processing, and embedded systems.
[10] Y. Jia, X. Zhang, M. Li, and L. An, “A miniature stereo vision machine
(MSVM-III) for dense disparity mapping,” in Proc. 17th Int. Conf.
Pattern Recognit., Cambridge, U.K., vol. 1. Aug. 2004, pp. 728–731. Xuan Dai Pham received the B.S. and M.S. deg-
[11] W. J. Dally and S. Lacy, “VLSI architecture: Past, present, and future,” rees in computer science from the University
in Proc. 20th Anniv. Conf. Adv. Res. VLSI, Atlanta, GA, Mar. 1999, pp. of Hochiminh City, Hochiminh City, Vietnam, in
232–241. 1994 and 2001, respectively, and the Ph.D. de-
[12] J. Diaz, E. Ros, F. Pelayo, E. M. Ortigosa, and S. Mota, “FPGA- gree in electrical and computer engineering from
based real-time optical-flow system,” IEEE Trans. Circuits Syst. Video Sungkyunkwan University, Suwon, Republic of Ko-
Technol., vol. 16, no. 2, pp. 274–279, Feb. 2006. rea, in 2008.
[13] M. Z. Brown, D. Burschka, and G. D. Hager, “Advances in computa- In 2008, he was with the Information Technology
tional stereo,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 8, Faculty, Saigon Institute of Technology, Hochiminh
pp. 993–1008, Aug. 2003. City, Vietnam, as a Lecturer and a Researcher.
[14] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense His research interests include image processing and
two-frame stereo correspondence algorithms,” Int. J. Comput. Vision, computer vision.
vol. 47, pp. 7–42, 2002.
[15] L. D. Stefano, “A PC-based real-time stereo vision system,” Mach.
Graphics Vision Int. J., vol. 13, no. 3, pp. 197–220, Jan. 2004.
Kyoung Mu Lee received the B.S. and M.S. degrees
[16] M. Sonka, V. Hlavac, and R. Boyle, “3-D vision, geometry,” in Image
in control and instrumentation engineering from
Processing, Analysis, and Machine Vision, 2nd ed., Pacific Grove, CA:
Seoul National University, Seoul, Korea, in 1984
PWS Publishing, ch. 11, 1999.
and 1986, respectively, and the Ph.D. degree in elec-
[17] R. I. Hartley and A. Zisserman, “Epipolar geometry and the fundamental
trical engineering from the University of Southern
matrix,” in Multiple View Geometry. Cambridge, U.K.: Cambridge
California, Los Angeles, in 1993.
University Press, ch. 9, 2000.
From 1994 to 1995, he was with Samsung
[18] D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach, 1st
Electronics Company Limited, Korea as a Senior
ed., Englewood Cliffs, NJ: Prentice Hall, 2002.
Researcher. Since September 2003, he has been with
[19] R. Zabith and J. Woodfill, “Non-parametric local transforms for com-
the Department of Electrical and Computer Engi-
puting visual correspondence,” in Proc. 3rd Eur. Conf. Comput. Vision,
neering, Seoul National University, as a Professor.
Stockholm, Sweden, May 1994, pp. 151–158.
His primary research interests include computer vision, image understanding,
[20] M. H. Lin and C. Tomasi, “Surfaces with occlusions from layered
pattern recognition, robot vision, and multimedia applications.
stereo,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 8, pp.
1073–1078, Aug. 2004.
[21] G. Egnal and R. P. Wildes, “Detecting binocular half-occlusions: Empir-
ical comparisons of five approaches,” IEEE Trans. Pattern Anal. Mach. Sung-Kee Park received the B.S. and M.S. degrees
Intell., vol. 24, no. 8, pp. 1127–1133, Aug. 2002. in mechanical engineering from Seoul National Uni-
[22] K. Mühlmann, D. Maier, J. Hesser, and R. Männer, “Calculating dense versity, Seoul, Korea, in 1987 and 1989, respectively,
disparity maps from color stereo images, an efficient implementation,” and the Ph.D. degree from the Department of Auto-
Int. J. Comput. Vision, vol. 47, nos. 1–3, pp. 79–88, Apr. 2002. mation and Design Engineering, Korea Advanced
[23] R. Yang, M. Pollefeys, and S. Li, “Improved real-time stereo on Institute of Science and Technology, Daejeon, Korea,
commodity graphics hardware,” in Proc. Conf. Comput. Vision Pattern in 2000.
Recognit., Jun. 2004, pp. 36–43. Since 2000, he has been with the Center for
[24] C. L. Zitnick and T. Kanade, “A cooperative algorithm for stereo Intelligent Robotics, Korea Institute of Science and
matching and occlusion detection,” IEEE Trans. Pattern Anal. Mach. Technology, Seoul, Korea, as a Research Scientist.
Intell., vol. 22, no. 7, pp. 675–684, Jul. 2000. His current research interests include computer vi-
[25] Middlebury stereo vision page [Online]. Available: sion, robot vision, and intelligent robots.
http://vision.middlebury.edu/stereo/
[26] D. Murray and J. Little, “Using real-time stereo vision for mobile robot
navigation,” Auton. Robots, vol. 8, no. 2, pp. 161–171, 2000. Munsang Kim received the B.S. and M.S. degrees
[27] K. Mühlmann, D. Maier, J. Hesser, and R. Männer, “Calculating dense in mechanical engineering from Seoul National Uni-
disparity maps from color stereo images, an efficient implementation,” versity, Seoul, Korea, in 1980 and 1982, respectively,
Int. J. Comput. Vision, vol. 47, pp. 79–88, 2002. and the Ph.D. degree in engineering robotics from
the Technical University of Berlin, Berlin, Germany,
in 1987.
Since 1987, he has been with the Center for
Seunghun Jin received the B.S. and M.S. de-
Intelligent Robotics, Korea Institute of Science and
grees in electrical and computer engineering from
Technology, Seoul, Korea, as a Research Scientist.
Sungkyunkwan University, Suwon, Korea, in 2005
He has led the Advanced Robotics Research Center
and 2006, respectively. He is currently working to-
since 2000, and became the Director of the Intelli-
ward the Ph.D. degree at Sungkyunkwan University.
gent Robotics Development Program, Frontier 21 Program, since 2003. His
His research interests include image and speech
current research interests are design and control of novel mobile manipulation
signal processing, embedded systems, and real-time
systems, haptic device design and control, and sensor application to intelligent
applications.
robots.
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:4326 UTC from IE Xplore. Restricon aply.
26 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 1, JANUARY 2010
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:4326 UTC from IE Xplore. Restricon aply.