You are on page 1of 16

Machine Vision Camera

A machine vision camera is a digital camera that captures an image using a light
sensitive sensor and sends the image to a processor or a processing unit for image
processing and analysis. The camera is one of the important hardware for the machine
vision system.
A Machine Vision Camera

Basic Mechanism:
The sensor used in the camera is made of number of pixels. Each pixel captures
light and converts photons (quantum of electromagnetic energy) into electrical charges.
The amount of electrical charge indicates the light intensity/brightness falling on that
pixel. These electrical charges are then converted into an electronic image either within
the camera itself or in the frame grabber and are displayed in the monitor.
Types of sensor:
There are basically two types of light sensors (imagers) used within all generally
available digital cameras at present:
CCD - Charge Coupled Device

CMOS - Complementary Metal Oxide Semiconductor

Both CCD and CMOS are pixelated metal oxide semiconductors (photo-diodes)
made from silicon. They have basically the same sensitivity within the visible and nearIR spectrum. They both convert the light that falls onto them into electrons by the same
process and can be considered basically similar in operation. Both CMOS and CCD
imagers can only sense the level/amount of light but not its color.
Neither of the names 'CCD' or 'CMOS' has anything to do with image sensing;
'Charge Coupled Device' is a description of the technology used to move and store the
electron charge and 'Complementary Metal Oxide Semiconductor' is the name of the
technology used to make a transistor on a silicon wafer.
The fundamental difference between the two technologies is the architecture of
the imager within the chip and camera. The difference is less 'what' they are and more
'how' they are used. This is because the imager not only measures the light that falls on it
must also undertake a host of other electronic tasks. Where and how these tasks are
handled is what differentiates the two types of sensor.

Block diagram of a CCD sensor

Within a CCD imager all image processing is done off-chip, away from the
sensor, allowing for a versatile approach to sensor and camera design. Only the 'Photonto-Electron' conversion is done within the pixel providing the maximum amount of space
within each pixel to be used for capturing image information. The 'electron-to-voltage'
conversion is done on the chip and data therefore still leaves the CCD in an analogue
form to be digitized within the supporting camera circuitry before downloading to
memory.

Block diagram of a CMOS sensor

With the CMOS imager both the 'Photon-to-Electron' conversion and the
'Electron-to-Voltage' conversion is done within the pixel, leaving less room for the light
receptive part of the sensor. This means the CMOS chip has less area to actually receive
the light and normally some form of micro-lens is needed to capture the light the pixel
would otherwise miss.
The extra sensing area within the CCD imager allows the CCD based camera to
capture more light, which will normally provide higher quality than a camera based on
the CMOS. On the other hand, the CMOS chip has the advantage of having everything
that it needs to work within the chip, thus it is able to provide a 'camera-on-a-chip'
whereas the CCD camera needs many other secondary chips (from 3 to 8) as well as the
imager to allow it to work. A CCD produces analogue signals that are moved away from
the chip before they are digitized and converted to 1's and 0's by an external dedicated
chip. On the other hand a CMOS undertakes the digitization within the chip by
undertaking the image-capture process within each pixel.
Up until quite recently the CCD based camera has been considered the 'de facto'
standard within the digital market and a great deal of development has been invested in
producing these sensors to maximize their quality potential. They can offer high
resolution (depending on how they are utilized) and high quality, albeit at a fairly high
price.
The CMOS based camera is generally much simpler to manufacture as there are
far fewer components due to most of the processing technology being included within the
chip. At the time of writing both the CMOS and CCD sensors have their strengths and
weaknesses and developers have overcome most of the limitations of the early CMOS
sensor.
A comparison of CCD and CMOS image sensors:
CCD

CMOS

Long history of high quality performance Lower performance in past, but now providing
comparable quality
High Dynamic range

Moderate Dynamic range

Low noise

Noisier, but getting better quickly

Well established technology

Newer technology

High power consumption

Relatively low power consumption

Moderately reliable

More reliable due to integration of chip

Small pixel size

Larger pixel size

Needs lots of external circuitry

All circuitry on chip

High Fill Factor

Lower Fill Factor

CCD creates analogue signal that is


digitized off the chip

CMOS creates a digital signal on chip

Camera Parameters:

Resolution
Speed
Interface standard
Color Format
Trigger
Optical Format of Sensor

Resolution of a Camera:
In general the resolution of the camera refers to the number of pixels in the image.
A pixel is the smallest picture element of the image and the imaging sensor. The image is
made up of these pixels arranged in a matrix format. The No. of bits each pixel occupies
is called as bit-depth. This bit depth is dependent on the output format of the camera. The
shape of the pixels is important in a machine vision system. It is always better to go with
a square pixel. The resolution will be given as (No. of columns x No. of Rows) 640 x
480, 2048 x 1536, 1024 x 768 etc. The resolution for a machine vision system is
calculated based on the smallest feature that has to be detected by the system when
considering a lot of other parameters like contrast level, lighting variations, pixel errors
etc
Speed:
The rate at which the camera can send the images is defined as the speed of the
camera. The speed of the camera is largely dependent on the speed of the interface, the
sensor speed, the resolution of the camera and the bit-depth (No. of bits per pixel) of the
camera.
Frame Grabber:
A frame grabber is an electronic plug-in board installed in a computer that is used
to interface the camera and the computer. It takes the image data from the camera and
gives to PC and also provides signals to control the camera parameter like exposure time,
trigger etc. A frame grabber has on-board memory and also does not involve computers
processor for data transfer from camera, hence it uses very little RAM present in the
computer and allowing the computer to concentrate more on processing the images than
communicating with the camera. Some computers has ports that allow the camera to be
connected directly with the computer without the need for a frame grabber, in such
conditions the communication between the camera and the computer takes up lots of
processing capability and RAM space of the computer thereby reducing the RAM &
processor availability for processing images or any other work. In addition to the above
mentioned functions an analog frame grabber has an analog-to-digital converter that
converts the analog signal into digital images.

A Frame Grabber

Driver:
A driver is software program which takes care of the communication between the
hardware device and the computer. This software program lies in inner part of the
operating system and communicates with the software and hardware that we use. In
general, it acts as an intermediate person between two different components. Only
because of this software program also called as device driver, our software is able to
access the hardware.
For the cameras also drivers are needed to communicate with the computers.
Different drivers are needed for different kinds of interfaces also these can change from
one vendor to the other.
Interface standard:
There are many kinds of interfaces available to connect the camera to the
computer through a frame grabber. Some standard interfaces are:
Camera Link
USB
GigE
FireWire
Analog
Camera Link:
Camera Link is a serial communication protocol designed for computer vision
applications. It was designed for the purpose of standardizing scientific and industrial
video products including cameras, cables and frame grabbers. The standard is maintained
and administered by the Automated Imaging Association or AIA, the global machine
vision industry's trade group. The camera link has three modes Base mode, Medium
mode and Full mode. These modes differ in the speed with which datas are transferred.
The speed of the camera link interface ranges from 250Mbytes/sec for the base mode to
650 Mbytes/sec for the full mode. One major disadvantage of camera link interface is that
there is no power transmitted through the camera link cable. Power has to be given
seperately to the camera.
USB:

USB Universal Serial Bus is the most widely used interface for computer
peripherals. Machine vision also adopted this interface for communication between the
camera and the computer. USB can support up to 60 Mbytes/s. The advantage of using a
USB interface is that it is very inexpensive and can also provide power to the camera.
The major disadvantage of this interface is that it uses much of the processing capability
of the CPU for data transfer and also the cables are not industrial standard hence are
prone to noise interference.
GigE:
GigE standard for machine vision was developed based on the Gigabit Ethernet
standard. The GigE standard is developed by a group of machine vision component
manufacturers under the supervision of the Automated Imaging Association (AIA). GigE
interface has the longest cable length (about 100m) without the use of repeaters or
boosters. Data is transferred at the rate of 125 Mbytes/s. The communication is using the
standard protocol TCP or UDP. The GigE interface does not transmit power to the
camera.
FireWire (IEEE 1394):
The IEEE 1394 interface is a serial bus interface standard for high-speed
communications and isochronous real-time data transfer, frequently used by personal
computers, as well as in digital audio, digital video, automotive, and aeronautics
applications. IIDC (Instrumentation & Industrial Digital Camera) is the FireWire data
format standard for live video. The system was designed for machine vision systems, but
is also used for other computer vision applications and for some webcams. Although they
are easily confused since they both run over FireWire, IIDC is different from, and
incompatible with, the ordinary DV (Digital Video) camcorder protocol. The speed of this
interface is around 400 Mbits/s for IEEE 1394a standard and 800Mbits/s for IEEE 1394b
standard. FireWire can carry power hence can supply to the camera.
Analog Interface PAL/NTSC:
Analog interfaces are generally prone to noise and standard interfaces cannot
support high resolution images. They generally support VGA resolution for both color
and monochrome images. As mentioned earlier the digitization of the images is done on
the frame grabber and not on the camera. The data transfer rate of analog interfaces is 62
Mbits/s. The analog interfaces are getting outdated and are slowly being replaced by
digital interfaces.
Color Format:
The output of the camera can either be color image or monochrome image.
Generally the sensors used in the cameras measure only the intensity of the light falling
on it and not the color of the light. Hence to get the color image either a filter is used or a
beam splitter is used. In general majority of the applications can be addressed using just a
monochrome camera.
Monochrome camera:

Monochrome camera gives out image in shades of gray (intensity information)


and no color information is given out. The photon to voltage conversion in the CCD or
CMOS is used as the image signal. The image signal is converted to its digital domain
using an ADC (on chip in the CMOS). Each pixel is then represented using a certain
binary value usually 8 bit. The value 0 corresponds to no light or dark level and 255
corresponds to saturation or white level.
Bayer Decoded Image:
A bayer image is a color image captured using a sensor having a bayer filter. A
Bayer filter is a color filter array (CFA) for arranging RGB color filters on a grid of
photosensors. Its particular arrangement of color filters is used in most single-chip digital
image sensors to create a color image. The filter pattern is 50% green, 25% red and 25%
blue, hence is also called GRGB or other permutation such as RGGB.

A Bayer Filter Pattern

A cross section of the sensor

Representation of Bayer Filter on a sensor

The sensor array captures the light as obtained from the filter. Each pixel, as
shown, is sensitive to only one color component. Hence the image has encoded color
information. The actual color can be decoded by one of the many bayer decoding
algorithms.
RGB Color 3 chip sensor:
Light entering from the lens is split using a trichroic prism assembly. The prism
assembly splits the image color components as Red, green and blue. Each of these is
projected onto a sensor. The outputs of the 3 sensors are then combined to produce a
color image.
Three-CCD cameras are generally regarded to provide superior image quality to
cameras with only one CCD. By taking a separate reading of red, green, and blue values
for each pixel, three-CCD cameras achieve much better precision than single-CCD
cameras. Almost all single-CCD cameras use a bayer filter, which allows them to detect
only one-third of the color information for each pixel. The other two-thirds must be
interpolated with a demosaicing algorithm to 'fill in the gaps'.
Three-CCD cameras are generally much more expensive than single-CCD
cameras because they require three times more elements to form the image detector, and
because they require a precision color-separation beam-splitter optical assembly.

Functioning of a 3 Chip sensor

Trigger:
A trigger is used to tell the camera when to take the image. The trigger can be
given by using software or by directly connecting to the camera. When the camera is
given trigger directly by connecting the trigger source directly to the camera, the trigger
ratings has to be adhered to. For some cameras the trigger can be given only using
software.

Timing diagram for a trigger based acquisition

Optical formats of Sensor:


The imaging sensor (CCD & CMOS) used in machine vision applications has
some standard formats. The formats are based on the Vidicon tubes of the 1950s. The
diagonal value of the sensor is roughly 2/3rd of this format. Typically machine vision
cameras will have 1/4, 1/3, 1/2, 2/3 and 1 sensor format in an aspect ratio of 4:3.
The imaging sensor format has to be equal to or less than the lens format to utilize all the
pixels in the sensor.

Some image sensor formats.

Type

1/3.6" 1/3.2" 1/3" 1/2.7" .5" 1/2" 1/1.8" 1/1.7" 2/3" 1" 4/3"

Diagonal
(mm)

5.00

5.68 6.00 6.72 7.18 8.00 8.93

9.50 11.0 16.0 21.6

Width
(mm)

4.00

4.54 4.80 5.37 5.76 6.40 7.18

7.60 8.80 12.8 17.3

Height
(mm)

3.00

3.42 3.60 4.04 4.29 4.80 5.32

5.70 6.60 9.6 13.0

Area
(mm2)

12.0

15.5 17.3 21.7 24.7 30.7 38.2

43.3 58.1 123 225

Imaging Terminology:
Sensitivity:
The amount of light that a sensors photodiode can collect is termed as sensitivity.
The more light that is collected, the stronger the signal and therefore the image quality
improves. The surface area of the pixel is directly proportional to the sensitivity. Better
sensitivity infers better image (noise free).
Signal to Noise Ratio (SNR):
The quantity of noise present in a given signal level is given in terms of Signal to
Noise Ratio. A high ratio gives better picture quality as compared to a low ratio.
Dynamic Range:
The ratio of the amount of light it takes to saturate the sensor to the least amount
of light detectable above background noise is termed as dynamic range. A good dynamic
range allows very bright and very dim areas to be seen simultaneously. It is used to
describe the number of gray levels that can be resolved. Typical industrial scene can have
100000:1 contrast ratio. Standard machine vision camera can resolve 256:1. If the camera
has a very low dynamic range then we will miss information.

Low dynamic range image and high dynamic range image

Responsivity:
Responsivity is a measure of the input-output gain of the sensor. It refers to the
amount of signal the sensor delivers per unit of input optical energy. CMOS imagers are
marginally superior to CCDs, in general, because gain elements are easier to place on a
CMOS image sensor.
Spectral Response:
The spectral response gives the response of the sensor across different
wavelengths of light. The requirements of optical filters are judged based on this factor.
For example when IR rays give good response then to avoid interference of IR rays on
the image we use IR filters on the camera or lens, this eliminates IR rays from affecting
the image. An example of the spectral response of a camera is shown below for different
colors.

Spectral Response

Fill Factor:
The percentage of a pixel devoted to collecting light is called the pixels fill
factor. CCDs have a 100% fill factor but CMOS cameras have much less this is due to the
fact that some of the related circuits are inbuilt into the CMOS sensor. The lower the fill
factor, the less sensitive the sensor is and relatively longer exposure times must be used.
Blooming:
Blooming happens when too many photons are being produced to be received by
a pixel. The pixel overflows and causes the electrons to go to adjacent pixels. Blooming
is similar to overexposure in film photography, except that in digital imaging, the result is
a number of vertical and/or horizontal streaks appearing from the light source in the
picture. Blooming happens when imaging very bright objects or light sources.

Effects of Blooming

Windowing:
One unique capability of CMOS technology is the ability to read out a portion of
the image sensor. This allows elevated frame or line rates for small regions of interest.
CCDs generally have no or limited abilities in windowing. Windowing is an adjustable
parameter (for CMOS cameras) and can be set by the user.
Brightness:
Brightness level is an adjustable parameter used to increase the overall brightness
of the image. This parameter is an additive factor where by a factor is added to the
intensity value of the pixels.
Gain:
Gain is again an adjustable parameter used to increase the contrast. It is a
multiplicative factor where by a factor is multiplied to the intensity values of the pixels.
The gain parameter adjusts the responsivity factor. Increasing the gain also increases the
noise level in the image.
White Balancing:
White balancing is used in machine vision only for bayer decoded color image. In
a bayer image the number of pixels showing green values is more than that of blue and
red, hence the image can appear with a greenish tint, to remove or reduce the greenish tint
we use white balancing. White balancing is done by adjusting the gain individually for
the three color planes. Please note that not all cameras have a white balancing feature.
Also white balancing is only an aesthetic feature and it does not significantly improve the
performance of the system.

Images before white balancing and after white balancing

Exposure Time/ Integration time:


Exposure time is the period for which the sensor is open/sensitive to the external
light. Typically the exposure time in machine vision application will me in microseconds
or in milliseconds. The exposure time will directly affect the amount of light falling on
the sensor and the brightness of the image produced. For high speed application the

exposure time will be less to avoid motion blur. Integration time is the sum of exposure
time and the time taken to read out the image from the sensor.
Area Scan & Line Scan Cameras:
The cameras are classified based on whether the effective pixels are arranged in a
matrix format (2D array) or in a single array in the sensor. If it is arranged in a matrix
(similar to consumer digital cameras) it is called as Area scan camera. If it is arranged in
a 1D fashion it is called as Line scan camera.

An Area scan sensor and a Line scan sensor

Area Scan Camera:


Area scan cameras in machine vision are used for inspecting components that are
stationary or relatively slow moving. Setting up an area scan camera is relatively easy
when compared to that of the line scan camera. These cameras are used widely in the
industries and can address almost all applications except those where the components are
moving at a very high speed or where it is rotating. These cameras have two kinds of
electronic shutters Rolling shutter and Global Shutter. The rolling shutter is used to take
images of stationary objects or very slow moving objects. Here integration of pixels
(exposure to light) happens one row at a time (W), one by one across the entire array (H),
hence there is a possibility of skew if the object moves fast in front of the camera. In
general the rolling shutter cameras have a better fill factor. In global shutter the
integration happens one frame at a time (W x H), hence there is no skew happening due
to the movement of the objects and therefore can be used also for high speed applications.

Line Scan Cameras:


These cameras are almost solely used in industrial settings to capture an image of
a constant stream of moving material. Data coming from the line-scan camera has a
frequency, where the camera scans a line, waits, and repeats. The data coming from the
line-scan camera is commonly processed by a computer, to collect the one-dimensional
line data and to create a two-dimensional image. The collected two-dimensional image
data is then processed by image-processing methods for industrial purposes. The line rate
or frequency of the line scan camera has to be matched to get proper image. If the line
rate is not set properly then the image will look like the component is either elongated or
shrunk. Typically line scan camera needs more illumination but at a concentrated area

How an image is constructed using a line scan camera

How to choose a Machine Vision Camera:


The most important factor to remember when choosing a machine vision camera
is the resolution number of pixels needed for that particular application. To calculate the
number of pixels one must know what is the smallest feature that has to be detected on
the component if it is presence/absence kind of application or what is the measurement
accuracy needed if it is a dimensioning kind of application. Two pixels are enough to
resolve a feature under ideal condition ie, when the contrast levels are high, no lighting
variations, no positional variations etc. But the real world conditions are far from ideal;
hence we take eight to ten pixels to resolve a feature in an image. Once the number of
pixels needed to resolve the smallest feature is calculated then we can calculate the
resolution using the following formula
No. of pixels in X plane = [Length of the FOV (in mm) x 8]/ smallest feature (in mm)
No. of pixels in Y plane = [Width of the FOV (in mm) x 8]/ smallest feature (in mm)
Here the 8 stands for the number of pixels needed for detecting the smallest feature.
Therefore the resolution is given by
Resolution = No of pixels in X plane x No of pixels in Y plane.

As mentioned earlier the cameras available in the market come in standard resolution like
640 x 480, 1024 x 768, 2048 x 1536 etc, hence after this calculation choose the resolution
that is closest to your calculated value.
Similarly for a dimensioning kind of application each pixel should cover an area
of approximately one-tenth of the required accuracy. Using this principle and the above
formula the resolution can be calculated.
The next important factor is the speed of the application. If the component is
moving in front of the camera at low speed even a rolling shutter camera can address the
application but if the component is moving at high speed only a global shutter camera
will work for the application. If the component is moving at very high speed where even
a global shutter camera cant address the application then we have to use a line scan
camera.
The other things to keep in mind is that the camera interface standard, color or
monochrome image output, provision for external trigger and the optical format of the
sensor. The optical format of the camera should be equal to or less than that of the lens
and also the lens mount should match with that of the camera.
Common errors and Trouble shooting:
The common errors that people make while using the machine vision camera is
during the installation of the camera. These errors include not proper installation of the
drivers, mounting of the camera is not rigid, improper selection of lens, insufficient
lighting, improper installation of frame grabber cards, cable not connected properly,
power connector if available is not connected properly, not configuring the triggers to the
needed specification,

You might also like