Chapter# 03 Computer Vision The Computer Vision Process • A computer mimics human sight in four basic steps.
• These are image acquisition, image processing,
image analysis, and image understanding.
• Let’s consider each of these steps in more detail.
Image Acquisition • A computer vision system needs an eye. • In most computer vision systems, that eye is a TV camera. • The camera translates a scene or image into electrical signals. • These signals can be translated into binary numbers which the computer can work with. • The output of the television camera is an analog signal whose frequency and amplitude represent the brightness detail in a scene. • The camera observes a scene, a line at a time, scanning and dividing it into hundreds of fine horizontal lines. • Each line creates an analog signal whose amplitude represents the brightness changes along that line. Image Processing • The next stage of computer vision involves some initial manipulation of the binary data. • Image processing helps improve the quality of the image to analyze and comprehend it more efficiently. • Image processing improves the signal-to-noise ratio. • The signal, of course, is information representing objects in the image. • Noise is any interference, flaw or aberration that obscures the objects. • Through various computational means, it is possible to improve the signal-to-noise ratio. • For example, the contrast in a scene can be improved. • Flaws, such as unwanted reflections, can be removed. • The process is somewhat like to retouching a photograph to improve its quality. • Once the image has been cleaned up and enhanced, it is ready for analysis. Image Analysis • Image analysis explores the scene to determine what is there. • A computer program begins looking through the numbers that represent the visual information to identify specific features and characteristics. • More specifically, the image analysis program looks for edges and boundaries. • The computer produces a simple line drawing of all the objects in the scene, just as an artist would draw outlines of all the objects. Image Comprehension • The final step in the computer vision process is understanding, by identifying specific objects and their relationship. • The portion of the computer vision process employs artificial intelligence techniques. • Understanding what is in a scene requires template matching. • The computer is preprogrammed with pre-stored binary images or templates that represent specific objects. • When a match occurs, an object is identified. • The computer then knows what is being viewed. Image Acquisition Video Cameras • The two devices most commonly used in computer vision cameras to convert light into an electrical signal are the vidicon tube and a CCD array. • The vidicon tube has been around for many years and is still the primary device used in commercial television cameras. • However, for computer vision systems, charged coupled devices (CCD’s) are far more widely used. • These semiconductor devices offer small size, greater light sensitivity, and lower power operation than vidicons. • However, since both the vidicon and the CCD are still used. Vidicon Tubes
Charge Coupled Device
Charge Coupled Devices • CCD Stands for "Charged Coupled Device." • CCDs are sensors used in digital cameras and video cameras to record still and moving images. • The CCD captures light and converts it to digital data that is recorded by the camera. • The quality of an image captured by a CCD depends on the resolution of the sensor. • In digital cameras, the resolution is measured in Megapixels. • Therefore, an 8MP digital camera can capture twice as much information as a 4MP camera. The result is a larger photo with more detail. • CCDs in video cameras are usually measured by physical size. • For example, most consumer digital cameras use a CCD around 1/6 or 1/5 of an inch in size. • More expensive cameras may have CCDs 1/3 of an inch in size or larger. • The larger the sensor, the more light it can capture, meaning it will produce better video in low light settings. Analog-to-Digital Conversion • The video output signal from the camera is fed to an analog-to-digital converter (ADC). • The ADC then periodically samples the analog signal and converts the amplitude into a parallel binary number. • Many different methods are used to produce analog-to-digital conversions. Pixels • Each time the video signal is sampled by the ADC, we say that a pixel has been created. • A pixel is the value of light intensity at one particular point on a scan line. • A pixel, therefore, is a small element into which each scan line is broken. • Each scan line will contain approximately 200 to 500 pixels. • These samples, then give a fairly accurate representation of line intensity variation across the scan line. • Naturally, the more pixels per line, the higher the definition. • In any case, the pixel is a point of light that is, in effect, some shade of gray. • This shade of gray is designated by a particular binary number. • By sampling the video signal we are converting each scan line into dots of light of varying gray levels. • The effect is to represent the entire scene by a matrix of pixels. • Each pixel represents a light value occurring during the sampling process. RAM • Each pixel is represented by an 8-bit binary number that is stored in a large random access memory (RAM). • Semiconductor RAM chips are used in these memories. • Their storage access time must be extremely fast to accept the high speed output from the ADC. • The memory must be very large to store the many pixel bytes that make up a scene. • For example, if a 512 x 512 CCD is used, the scene will contain 512 x 512 = 262,144 pixels. • This means that a RAM capable of storing 262,144 bytes. • In most computer vision systems, this RAM is separate from the RAM used in the computer. • It is usually called a buffer RAM or frame buffer. • Computer vision systems have their own dedicated RAM. • At this point, the computer has stored in its memory a digital representation of a scene to be analyzed and understood. • Once this binary image of the scene is in memory, the computer can take over and perform many different operations on the scene to enhance it, analyze it, translate it into different forms, and to ultimately comprehend what is there. 3D to 2D • Video cameras do not see in 3D. • What we get is a two-dimensional representation of anything the camera looks at. • We see the accurate height and width of our subjects, but the missing dimension is depth. • Without dept information, it is difficult to determine the distance between different objects in the scene. • A more direct approach to overcoming this problem is to more accurately emulate the human vision system. • We are able to perceive depth for one reason, we have two eyes. • As a result, the brain gets two similar but slightly different images of a scene because of the spacing between two eyes. • To sense depth in a computer vision system, the answer is to use two cameras. • This produces binocular or stereo vision which permits depth to be determined. • In 3D vision systems, the same scene is viewed by two cameras. • The scenes from the two cameras are then digitized and stored in memory. • Once objects in the scene have been identified, the computer can perform various mathematical operations to help compute the distances to objects and between objects. Image Processing Image Processing • With the binary version of the scene stored in memory, image processing can now begin. • Image processing, also known as image enhancement, is the process of improving the quality of the image. • Anything that can be done to make the image clearer will simplify analysis and lead to improved understanding. • Extremely low light levels can produce a scene which is difficult for the camera to see. • The camera itself may not be sensitive enough to clearly capture the fine definition in the scene. • Another problem is noise. • In an electrical sense, noise is unwanted additions which obscure the desired signal. • Noise shows up as “snow” or a salt and pepper background that obscures features in the scene. • Regardless of the sources of the degradation of the signal, processing techniques can be used to eliminate or minimize these problems. • In fact, many processing techniques are designed to enhance the desired features while ignoring the noise and distortion. • This process is known as image enhancement. Preprocessing • Before image enhancement occurs, some preprocessing can take place to improve the scene. • First, optical filtering can be used. • Filters can be attached to the lens to control the amount of light, its color, and the contrast of the various objects in the scene. • Second, many computer vision systems operate in a controlled environment where it is possible not only to control illumination level, but to position light sources or the objects to be viewed for maximum visibility and comprehension. • When a computer vision system is set up, the camera is pointed toward the scene and it is monitored on a video screen. Noise Reduction • Image averaging helps to eliminate noise and distortion. • In this process, the vision system captures sequential views of the scene and then averages them. • In any case, the averaging process takes several views of the scene and stores them in memory. • Corresponding pixels in the various binary images are averaged by adding them and dividing by the number of pixels averaged. • The result is a composite scene that usually has better clarity. Image Analysis Image Analysis • Up to this point we have been generally vague in describing the scene. • It could be an outdoor landscape, an aerial photograph, a human face. • Image analysis begins the process of locating and defining the various objects in the scene. • The artificial intelligence process then attempts to determine what the objects are. • Image analysis is accomplished by identifying regions and boundaries, or edges. • Edges represent boundaries where two surfaces come together. • They also identify the interface between two different surfaces or between an object and a background. • The line between an object and its shadow, and the outline of the shadow itself form edges. • Edges and regions or surfaces completely define the scene. • Regions are large, flat areas of an object or scene that have the same intensity value and occur between the various edges and boundary lines. • Various mathematical techniques have been developed for detecting edges and surfaces, and these form the core of image analysis. Image Comprehension • Up to this point in the computer vision process, a lot of computation has taken place. • Yet, none of it is what you could really call Artificial Intelligence. • Even though an image has been acquired, enhanced, and analyzed, the computer still does not know what the scene means. • The computer is not aware of the contents of the scene, what objects are represented, and how they are related to one another. • The final stage of computer vision then is to give the computer some knowledge about the things it may see in a scene. • Object shapes and some kind of AI search and pattern matching program will enable the computer to examine the incoming scene and compare it to those objects in the knowledge base. • The computer should be able to identify the objects there and thus understand what it sees. • A simple template matching technique can be used to pick out specific object shapes. • The template, which is stored in memory, is an outline of an object that it knows. • The comparison process that take places during search and pattern matching can produce identification. Applications of Computer Vision • Machine Vision • Robot Vision Machine Vision • The biggest application of computer vision is machine vision. • Machine vision refers to the use of computer vision equipment and techniques with manufacturing processes usually carried out by some type of machine. • The purpose machine vision in manufacturing applications is to replace people in some tasks and to help speed up or simplify the manufacturing process in others. • For some applications, particularly highly repetitive and boring tasks, a machine does a better job. • Human beings get tired and make mistakes, machines don’t. Robot Vision • One of the major applications of computer vision is with robots. • They are not smart because they cannot think for themselves. • However, when attached to a computer with an artificial intelligence program, they do take on more intelligent characteristics. • But to be truly intelligent, a robot must have sight. • This sight provides feedback that allows it to adjust its operation to fit varying conditions. • Computer vision, therefore, helps make robots intelligent.