You are on page 1of 5

{Your Projects Name} Specification Document

{FPGA Stereovision}
{ ChandraKanth Pamrthi} {} { Prasanth Verma} { Submitted for CoreEL Digilent Design Contest 2013 {19 04 - 2013} Advisor: { Joycee Mekie} {IIT Gandhinagar} {Ahmedabad, Gujarat}

page 1 of 5

{Your Projects Name} Specification Document

FPGA Stereovision

Brief Overview:
This project describes the development of an integrated stereovision sensor intended to be on mobile platforms on mobile platforms like robots or intelligent vehicles. Companies which make Intelligent Transportation system (ITS), are eager to integrate sensors and perceptual algorithms on cars, for different applications; obstacle detection on motorway or in urban traffic, lane departure detection, parking assistance, navigation, cockpit and driver monitoring etc. Monocular vision has been proposed to detect obstacles (cars or pedestrians) in urban scenes, but without assumptions on the environment (no flat road approximation for example). Monocular vision does not allow to cope with complex situations and is generally with other kind of sensors (e.g radar or laser devices). Stereovision is widely used in the robotics community, typically to evaluate the terrain navigability at short distances (Matthies 1992). Several companies (Videre Design Company) propose stereo rigs with a short baseline (10cm), well suited for indoor perception. Stereovision has been also evaluated in ITS applications for many years, but the real-time requirements, the limitations of the depth field, the lack of robustness makes difficult to use stereovision in changing contexts. This algorithm/theory is used in many applications. Eg: Stereo is the main sensor used for outdoor terrestrial robot, detection of free parking sloat and assistance for parking manoevre and pedestrian detection in urban scene etc

Design Overview:
Intially the original right and left images are processed independently. The distortion correction and rectification step allows to provide two aligned rectified images. The Key components of this project are: - Multiple Image sensors (only 2 cameras in this project) - FPGA interfacing with Etherenet - Interface board to connect the image sensors to FPGA board Image rectification is a crucial first-step in many image processing tasks and especially in stereovision. Most stereovision algorithms depend on the input images conforming to simplified epipolar geometry with coplanar images. This allows for the assumption that a given point in one image can be found in the same row of the other image (provided that point is not occluded) thus dramatically reducing search space.
page 2 of 5

{Your Projects Name} Specification Document

we cant easily build a system with cameras/lenses and perfect alignment, so the image rectification step is required to take real-world image data and turn it into something resembling the ideal case. A calibration process is run on the un-corrected stereo image data to determine what sort of transformation the rectification step has to perform.

FPGA logic: The most straight-forward way for an FPGA to implement this rectification step is by using look-uptables: for each rectified output pixel, you have a table entry that indicates the source pixel. A naive implementation that allows each table entry to reference anywhere in the entire source image would be very memory-intensive; sub-pixel resolution would only compound matters. A better implementation might, for example, encode coordinate differences between adjacent pixels (under the perfectly-reasonable assumption that the source coordinate for a particular output pixel will be very similar to the source coordinate for its neighbor). An 8-bit value could encode both a 4-bit X and Y difference, which could themselves be fixed-point fractions (e.g. with a range of +3.75 to 4.00).

page 3 of 5

{Your Projects Name} Specification Document

Alternatively (or possibly in conjunction with), one could use a lower resolution look-up-table with simple linear interpolation between entries. The size of that lower resolution table would depend on the severity of the distortion being corrected, and the amount of error that is tolerable (relative to an ideal full-resolution table). With coordinates in hand, the FPGA can then use a simple sampling algorithm (e.g. bilinear interpolation) to generate the output pixels. For all but nearest-neighbor interpolation, the sampling algorithm would need to read multiple source pixels for each output pixel, so a cache would be needed to reduce external memory accesses. For well-controlled distortion, it would be possible to perform rectification in a streaming fashion without any reliance on external memory. Each image sensor would write directly into an internal memory that is large enough to hold several rows worth of image data. Then, each output row could be generated entirely from this input buffer. Within each output row, the range of source Ycoordinates would have to span less than the height of the input buffer.

Implementation plans:
Development Environment

ISE Webpack Simulator and Analysis Tool, PlanAhead

ChipScope Pro

FPGA board chosen

Spartane 3e FPGA

page 4 of 5

{Your Projects Name} Specification Document

Digilent products requested for the project:

No. 1. 2.

Product Spartan 3E MT9V032 LVD camera

Quantity 1 2

Notes Image sensor


{Participants printed name, signature}

{Advisors printed name, signature}

{Participants printed name, signature}

{Participants printed name, signature}

page 5 of 5