You are on page 1of 33

Artificial Intelligence

Chapter# 03
Computer Vision
The Computer Vision Process
• A computer mimics human sight in four basic steps.

• These are image acquisition, image processing,


image analysis, and image understanding.

• Let’s consider each of these steps in more detail.


Image Acquisition
• A computer vision system needs an eye.
• In most computer vision systems, that eye is a TV camera.
• The camera translates a scene or image into electrical signals.
• These signals can be translated into binary numbers which
the computer can work with.
• The output of the television camera is an analog signal whose
frequency and amplitude represent the brightness detail in a
scene.
• The camera observes a scene, a line at a time, scanning and
dividing it into hundreds of fine horizontal lines.
• Each line creates an analog signal whose amplitude
represents the brightness changes along that line.
Image Processing
• The next stage of computer vision involves some initial
manipulation of the binary data.
• Image processing helps improve the quality of the image
to analyze and comprehend it more efficiently.
• Image processing improves the signal-to-noise ratio.
• The signal, of course, is information representing objects
in the image.
• Noise is any interference, flaw or aberration that
obscures the objects.
• Through various computational means, it is possible to
improve the signal-to-noise ratio.
• For example, the contrast in a scene can be
improved.
• Flaws, such as unwanted reflections, can be
removed.
• The process is somewhat like to retouching a
photograph to improve its quality.
• Once the image has been cleaned up and enhanced,
it is ready for analysis.
Image Analysis
• Image analysis explores the scene to determine what
is there.
• A computer program begins looking through the
numbers that represent the visual information to
identify specific features and characteristics.
• More specifically, the image analysis program looks
for edges and boundaries.
• The computer produces a simple line drawing of all
the objects in the scene, just as an artist would draw
outlines of all the objects.
Image Comprehension
• The final step in the computer vision process is
understanding, by identifying specific objects and their
relationship.
• The portion of the computer vision process employs
artificial intelligence techniques.
• Understanding what is in a scene requires template
matching.
• The computer is preprogrammed with pre-stored binary
images or templates that represent specific objects.
• When a match occurs, an object is identified.
• The computer then knows what is being viewed.
Image Acquisition
Video Cameras
• The two devices most commonly used in computer vision
cameras to convert light into an electrical signal are the
vidicon tube and a CCD array.
• The vidicon tube has been around for many years and is
still the primary device used in commercial television
cameras.
• However, for computer vision systems, charged coupled
devices (CCD’s) are far more widely used.
• These semiconductor devices offer small size, greater light
sensitivity, and lower power operation than vidicons.
• However, since both the vidicon and the CCD are still used.
Vidicon Tubes

Charge Coupled Device


Charge Coupled Devices
• CCD Stands for "Charged Coupled Device."
• CCDs are sensors used in digital cameras and video
cameras to record still and moving images.
• The CCD captures light and converts it to digital data that is
recorded by the camera.
• The quality of an image captured by a CCD depends on the
resolution of the sensor.
• In digital cameras, the resolution is measured in
Megapixels.
• Therefore, an 8MP digital camera can capture twice as
much information as a 4MP camera. The result is a larger
photo with more detail.
• CCDs in video cameras are usually measured by
physical size.
• For example, most consumer digital cameras use
a CCD around 1/6 or 1/5 of an inch in size.
• More expensive cameras may have CCDs 1/3 of
an inch in size or larger.
• The larger the sensor, the more light it can
capture, meaning it will produce better video in
low light settings.
Analog-to-Digital Conversion
• The video output signal from the camera is fed
to an analog-to-digital converter (ADC).
• The ADC then periodically samples the analog
signal and converts the amplitude into a
parallel binary number.
• Many different methods are used to produce
analog-to-digital conversions.
Pixels
• Each time the video signal is sampled by the ADC, we say
that a pixel has been created.
• A pixel is the value of light intensity at one particular
point on a scan line.
• A pixel, therefore, is a small element into which each
scan line is broken.
• Each scan line will contain approximately 200 to 500
pixels.
• These samples, then give a fairly accurate representation
of line intensity variation across the scan line.
• Naturally, the more pixels per line, the higher the
definition.
• In any case, the pixel is a point of light that is, in effect,
some shade of gray.
• This shade of gray is designated by a particular binary
number.
• By sampling the video signal we are converting each
scan line into dots of light of varying gray levels.
• The effect is to represent the entire scene by a matrix
of pixels.
• Each pixel represents a light value occurring during the
sampling process.
RAM
• Each pixel is represented by an 8-bit binary number
that is stored in a large random access memory (RAM).
• Semiconductor RAM chips are used in these memories.
• Their storage access time must be extremely fast to
accept the high speed output from the ADC.
• The memory must be very large to store the many pixel
bytes that make up a scene.
• For example, if a 512 x 512 CCD is used, the scene will
contain 512 x 512 = 262,144 pixels.
• This means that a RAM capable of storing 262,144
bytes.
• In most computer vision systems, this RAM is separate
from the RAM used in the computer.
• It is usually called a buffer RAM or frame buffer.
• Computer vision systems have their own dedicated RAM.
• At this point, the computer has stored in its memory a
digital representation of a scene to be analyzed and
understood.
• Once this binary image of the scene is in memory, the
computer can take over and perform many different
operations on the scene to enhance it, analyze it,
translate it into different forms, and to ultimately
comprehend what is there.
3D to 2D
• Video cameras do not see in 3D.
• What we get is a two-dimensional representation of anything the
camera looks at.
• We see the accurate height and width of our subjects, but the
missing dimension is depth.
• Without dept information, it is difficult to determine the distance
between different objects in the scene.
• A more direct approach to overcoming this problem is to more
accurately emulate the human vision system.
• We are able to perceive depth for one reason, we have two eyes.
• As a result, the brain gets two similar but slightly different images
of a scene because of the spacing between two eyes.
• To sense depth in a computer vision system, the answer is to
use two cameras.
• This produces binocular or stereo vision which permits
depth to be determined.
• In 3D vision systems, the same scene is viewed by two
cameras.
• The scenes from the two cameras are then digitized and
stored in memory.
• Once objects in the scene have been identified, the
computer can perform various mathematical operations to
help compute the distances to objects and between objects.
Image Processing
Image Processing
• With the binary version of the scene stored in
memory, image processing can now begin.
• Image processing, also known as image enhancement,
is the process of improving the quality of the image.
• Anything that can be done to make the image clearer
will simplify analysis and lead to improved
understanding.
• Extremely low light levels can produce a scene which
is difficult for the camera to see.
• The camera itself may not be sensitive enough to
clearly capture the fine definition in the scene.
• Another problem is noise.
• In an electrical sense, noise is unwanted additions which
obscure the desired signal.
• Noise shows up as “snow” or a salt and pepper background
that obscures features in the scene.
• Regardless of the sources of the degradation of the signal,
processing techniques can be used to eliminate or minimize
these problems.
• In fact, many processing techniques are designed to enhance
the desired features while ignoring the noise and distortion.
• This process is known as image enhancement.
Preprocessing
• Before image enhancement occurs, some preprocessing can
take place to improve the scene.
• First, optical filtering can be used.
• Filters can be attached to the lens to control the amount of
light, its color, and the contrast of the various objects in the
scene.
• Second, many computer vision systems operate in a controlled
environment where it is possible not only to control
illumination level, but to position light sources or the objects
to be viewed for maximum visibility and comprehension.
• When a computer vision system is set up, the camera is
pointed toward the scene and it is monitored on a video
screen.
Noise Reduction
• Image averaging helps to eliminate noise and
distortion.
• In this process, the vision system captures sequential
views of the scene and then averages them.
• In any case, the averaging process takes several views
of the scene and stores them in memory.
• Corresponding pixels in the various binary images are
averaged by adding them and dividing by the number
of pixels averaged.
• The result is a composite scene that usually has better
clarity.
Image Analysis
Image Analysis
• Up to this point we have been generally vague in describing the
scene.
• It could be an outdoor landscape, an aerial photograph, a
human face.
• Image analysis begins the process of locating and defining the
various objects in the scene.
• The artificial intelligence process then attempts to determine
what the objects are.
• Image analysis is accomplished by identifying regions and
boundaries, or edges.
• Edges represent boundaries where two surfaces come together.
• They also identify the interface between two different surfaces
or between an object and a background.
• The line between an object and its shadow, and the
outline of the shadow itself form edges.
• Edges and regions or surfaces completely define the
scene.
• Regions are large, flat areas of an object or scene
that have the same intensity value and occur
between the various edges and boundary lines.
• Various mathematical techniques have been
developed for detecting edges and surfaces, and
these form the core of image analysis.
Image Comprehension
• Up to this point in the computer vision process, a lot of
computation has taken place.
• Yet, none of it is what you could really call Artificial
Intelligence.
• Even though an image has been acquired, enhanced,
and analyzed, the computer still does not know what
the scene means.
• The computer is not aware of the contents of the scene,
what objects are represented, and how they are related
to one another.
• The final stage of computer vision then is to give the
computer some knowledge about the things it may see
in a scene.
• Object shapes and some kind of AI search and pattern
matching program will enable the computer to examine the
incoming scene and compare it to those objects in the
knowledge base.
• The computer should be able to identify the objects there
and thus understand what it sees.
• A simple template matching technique can be used to pick
out specific object shapes.
• The template, which is stored in memory, is an outline of an
object that it knows.
• The comparison process that take places during search and
pattern matching can produce identification.
Applications of Computer Vision
• Machine Vision
• Robot Vision
Machine Vision
• The biggest application of computer vision is machine
vision.
• Machine vision refers to the use of computer vision
equipment and techniques with manufacturing processes
usually carried out by some type of machine.
• The purpose machine vision in manufacturing applications
is to replace people in some tasks and to help speed up or
simplify the manufacturing process in others.
• For some applications, particularly highly repetitive and
boring tasks, a machine does a better job.
• Human beings get tired and make mistakes, machines
don’t.
Robot Vision
• One of the major applications of computer vision is
with robots.
• They are not smart because they cannot think for
themselves.
• However, when attached to a computer with an
artificial intelligence program, they do take on more
intelligent characteristics.
• But to be truly intelligent, a robot must have sight.
• This sight provides feedback that allows it to adjust its
operation to fit varying conditions.
• Computer vision, therefore, helps make robots
intelligent.

You might also like