You are on page 1of 70

1.

Introduction
The Problem

Recently face recognition is attracting much attention in the society of network multimedia
information access. Areas such as network security, content indexing and retrieval, and video
compression benefits from face recognition technology because "people" are the center of
attention in a lot of video. Network access control via face recognition not only makes hackers
virtually impossible to steal one's "password", but also increases the user-friendliness in human-
computer interaction. Indexing and/or retrieving video data based on the appearances of
particular persons will be useful for users such as news reporters, political scientists, and
moviegoers. For the applications of videophone and teleconferencing, the assistance of face
recognition also provides a more efficient coding scheme.

The project is basically divided into two stages. The first stage is to detect human face from a
given image captured by the webcam, this stage is called face detection. In the second stage we
match the captured image with our stored database of images this stage is called face recognition.

Face Detection is a technique for automatically detecting human face in digital images. The
system relies on a two step process which first detects regions which are likely to contain human
skin in the color image and then extracts information from these regions which might indicate
the location of a face in the image. The skin detection is performed using a skin filter which
relies on color and texture information. The face detection is performed on a grayscale image
containing only the detected skin areas. A combination of thresholding and mathematical
morphology are used to extract object features that would indicate the presence of a face. The
face detection process works predictably and fairly reliably, as test results show. The process for
detection of faces in this project was based on a two-step approach. First, the image is filtered so
that only regions likely to contain human skin are marked. This filter was designed using basic
mathematical and image processing functions in MATLAB and was based on the skin filter
Modifications to the filter algorithm were made to offer subjective improvement to the output.
The second stage involves taking the marked skin regions and removing the darkest and brightest
regions from the map.

1
1.1 Objective
The objective of the project is to detect the human face from an image taken from an image
acquisition device and then to match that image with the stored images of a database.

2
BLOCK DIAGRAM OF FACE RECOGNITION

IMAGE ACQUISITION

IMAGE DATABASE

SMOOTHING

SKIN COLOUR

SKIN REGION
DETECTION
2.DESCRIPTION

2.1IMAGE:-
FACE
An image (from Latin: imago) is an artifact, for example a two-dimensional picture, that has a
similar appearance to some subject—usually a physical object or a person.

Characteristics DETECTED AREA

RECOGNITION
3
Images may be two-dimensional, such as a photograph, screen display, and as well as a three-
dimensional, such as a statue or hologram. They may be captured by optical devices—such
as cameras, mirrors, lenses, telescopes, microscopes, etc. and natural objects and phenomena,
such as the human eye or water surfaces.

A volatile image is one that exists only for a short period of time. This may be a reflection of an
object by a mirror, a projection of a camera obscura, or a scene displayed on a cathode ray tube.
A fixed image, also called a hard copy, is one that has been recorded on a material object, such
aspaper or textile by photography or digital processes.

A still image is a single static image, as distinguished from a moving image. This phrase is used
in photography, visual media and the computer industry to emphasize that one is not talking
about movies, or in very precise or pedantic technical writing such as a standard.

A film still is a photograph taken on the set of a movie or television program during production,
used for promotional purposes.

2.1.1 Pixel:-

The pixel (a word invented from "picture element") is the basic unit of programmable color on a
computer display or in a computer image. Think of it as a logical - rather than a physical - unit.
The physical size of a pixel depends on how you've set the resolution for the display screenPixel
is the basic building block of a digital image. In digital imaging, a pixel, or pel, (picture
element) is a single point in a raster image, or the smallest addressable screen element in
a display device; it is the smallest unit of picture that can be represented or controlled. Each pixel
has its own address. The address of a pixel corresponds to its coordinates. Pixels are normally
arranged in a two-dimensional grid, and are often represented using dots or squares. Each pixel is
a sample of an original image; more samples typically provide more accurate representations of
the original. The intensity of each pixel is variable. In color image systems, a color is typically
represented by three or four component intensities such as red, green, and blue, or cyan,
magenta, yellow, and black.

In some contexts (such as descriptions of camera sensors), the term pixel is used to refer to a
single scalar element of a multi-component representation (more precisely called a photosite in
the camera sensor context, although the neologism sensel is also sometimes used to describe the

4
elements of a digital camera's sensor),] while in others the term may refer to the entire set of such
component intensities for a spatial position. In color systems that use chroma subsampling, the
multi-component concept of a pixel can become difficult to apply, since the intensity measures
for the different color components correspond to different spatial areas in a such a representation.

The word pixel is based on a contraction of pix ("pictures") and el (for "element"); similar
formations with el for "element" include the words voxel and texel.

2.2 IMAGE FORMATS:-

There are multiple image formats available for storing an image.But the important and related
ones are as follows

2.2.1 YCbCr:-

5
YCbCr or Y′CbCr, sometimes written YCBCR or Y′CBCR, is a family of color spaces used as a
part of the color image pipeline in video and digital photography systems. Y′ is the luma
component and CB and CR are the blue-difference and red-difference chroma components. Y′
(with prime) is distinguished from Y which is luminance, meaning that light intensity is non-
linear encoded using gamma correction.Y′CbCr is not an absolute color space, it is a way of
encoding RGB information. The actual color displayed depends on the actual RGB colorants
used to display the signal. Therefore a value expressed as Y′CbCr is predictable only if standard
RGB colorants or an ICC profile are used.
YCbCr and Y′CbCr are a practical approximation to color processing and perceptual uniformity,
where the primary colors corresponding roughly to red, green and blue are processed into
perceptually meaningful information. By doing this, subsequent image/video processing,
transmission and storage can do operations and introduce errors in perceptually meaningful
ways. Y′CbCr is used to separate out a luma signal (Y′) that can be stored with high resolution or
transmitted at high bandwidth, and two chroma components (CB and CR) that can be bandwidth-
reduced, subsampled, compressed, or otherwise treated separately for improved system
efficiency.
One practical example would be decreasing the bandwidth or resolution allocated to "color"
compared to "black and white", since humans are more sensitive to the black-and-white
information (see image example to the right).
YCbCr is sometimes abbreviated to YCC. Y′CbCr is often called YPbPr when used for analog
component video, although the term Y′CbCr is commonly used for both systems, with or without
theprime. Y′CbCr is often confused with the YUV color space, and typically the terms YCbCr
and YUV are used interchangeably, leading to some confusion; when referring to signals in
video or digital form, the term "YUV" mostly means "Y′CbCr" Y′CbCr signals (prior to scaling
and offsets to place the signals into digital form) are called YPbPr, and are created from the
corresponding gamma-adjusted RGB (red, green and blue) sourceusing two defined constants
KB and KR as follows where KB and KR are ordinarily derived from the definition of the
corresponding RGB space. (The equivalent matrix manipulation is often referred to as the "color
matrix".) Here, the prime ′ symbols mean gamma correction is being used; thus R′, G′ and B′
nominally range from 0 to 1, with 0 representing the minimum intensity (e.g., for display of the
color black) and 1 the maximum (e.g., for display of the color white). The resulting luma (Y)

6
value will then have a nominal range from 0 to 1, and the chroma (PB and PR) values will have a
nominal range from -0.5 to +0.5. The reverse conversion process can be readily derived by
inverting the above equations. When representing the signals in digital form, the results are
scaled and rounded, and offsets are typically added. For example, the scaling and offset applied
to the Y′ component per specification (e.g. MPEG-2[1]) results in the value of 16 for black and
the value of 235 for white when using an 8-bit representation. The standard has 8-bit digitized
versions of CB and CR scaled to a different range of 16 to 240. Consequently, rescaling by the
fraction (235-16)/(240-16) = 219/224 is sometimes required when doing color matrixing or
processing in YCbCr space, resulting in quantization distortions when the subsequent processing
is not performed using higherbitdepths. The scaling that results in the use of a smaller range of
digital values than what might appear to be desirable for representation of the nominal range of
the input data allows for some "overshoot" and "undershoot" during processing without
necessitating undesirable clipping. This "head-room" and "toe-room" can also be used for
extension of the nominal color gamut, as specified by xv YCC. Since the equations defining
YCbCr are formed in a way that rotates the entire nominal RGB color cube and scales it to fit
within a (larger) YCbCr color cube, there are some points within the YCbCr color cube that
cannot be represented in the corresponding RGB domain (at least not within the nominal RGB
range). This causes some difficulty in determining how to correctly interpret and display some
YCbCr signals. These out-of-range YCbCr values are used by xvYCC to encode colors outside
the BT.709 gamut.

JFIF usage of JPEG allows Y′CbCr where Y′, CB and CR have the full 8-bit range of 0-255

Digital Y′CbCr (8 bits per sample) is derived from analog R'G'B' as follows:

7
Y=0 Y=0.5 Y=1

2.2.2 RGB:-

The RGB color model is an additive color model in which red, green, and blue light are added
together in various ways to reproduce a broad array of colors. The name of the model comes
from the initials of the three additive primary colors, red, green, and blue.

8
The main purpose of the RGB color model is for the sensing, representation, and display of
images in electronic systems, such as televisions and computers, though it has also been used in
conventional photography. Before the electronic age, the RGB color model already had a solid
theory behind it, based in human perception of colors.

RGB is a device-dependent color model: different devices detect or reproduce a given RGB
value differently, since the color elements (such as phosphors or dyes) and their response to the
individual R, G, and B levels vary from manufacturer to manufacturer, or even in the same
device over time. Thus an RGB value does not define the same color across devices without
some kind of color management.

Typical RGB input devices are color TV and video cameras, image scanners, and digital
cameras. Typical RGB output devices are TV sets of various technologies (CRT, LCD, plasma,
etc.), computer and mobile phone displays, video projectors, multicolor LED displays, and large
screens such as JumboTron, etc. Color printers, on the other hand, are not RGB devices, but
subtractive color devices (typically CMYK color model).

This article discusses concepts common to all the different color spaces that use the RGB color
model, which are used in one implementation or another in color image-producing technology.

9
RGB COLOR SCALE

Additive primary colors

To form a color with RGB, three colored light beams (one red, one green, and one blue) must be
superimposed (for example by emission from a black screen, or by reflection from a white
screen). Each of the three beams is called a component of that color, and each of them can have
an arbitrary intensity, from fully off to fully on, in the mixture.

The RGB color model is additive in the sense that the three light beams are added together, and
their light spectra add, wavelength for wavelength, to make the final color's spectrum.[1][2]

Zero intensity for each component gives the darkest color (no light, considered the black), and
full intensity of each gives a white; the quality of this white depends on the nature of the primary
light sources, but if they are properly balanced, the result is a neutral white matching the
system's white point. When the intensities for all the components are the same, the result is a
shade of gray, darker or lighter depending on the intensity. When the intensities are different, the
result is a colorized hue, more or less saturated depending on the difference of the strongest and
weakest of the intensities of the primary colors employed.

When one of the components has the strongest intensity, the color is a hue near this primary
color (reddish, greenish, or bluish), and when two components have the same strongest intensity,

10
then the color is a hue of a secondary color (a shade of cyan, magenta or yellow). A secondary
color is formed by the sum of two primary colors of equal intensity: cyan is green+blue, magenta
is red+blue, and yellow is red+green. Every secondary color is the complement of one primary
color; when a primary and its complementary secondary color are added together, the result is
white: cyan complements red, magenta complements green, and yellow complements blue.

The RGB color model itself does not define what is meant by red, green,
and blue colorimetrically, and so the results of mixing them are not specified as absolute, but
relative to the primary colors. When the exact chromaticities of the red, green, and blue primaries
are defined, the color model then becomes an absolute color space, such as sRGB or Adobe
RGB; see RGB color spaces for more details.

Physical principles for the choice of red, green, and blue

The choice of primary colors is related to the physiology of the human eye; good primaries are
stimuli that maximize the difference between the responses of the cone cells of the human retina
to light of different wavelengths, and that thereby make a large color triangle.[3]

The normal three kinds of light-sensitive photoreceptor cells in the human eye (cone cells)
respond most to yellow (long wavelength or L), green (medium or M), and violet (short or S)
light (peak wavelengths near 570 nm, 540 nm and 440 nm, respectively[3]). The difference in the
signals received from the three kinds allows the brain to differentiate a wide gamut of different
colors, while being most sensitive (overall) to yellowish-green light and to differences
between hues in the green-to-orange region.

As an example, suppose that light in the orange range of wavelengths (approximately 577 nm to
597 nm) enters the eye and strikes the retina. Light of these wavelengths would activate both the
medium and long wavelength cones of the retina, but not equally — the long-wavelength cells
will respond more. The difference in the response can be detected by the brain and associated
with the concept that the light is orange. In this sense, the orange appearance of objects is simply
the result of light from the object entering our eye and stimulating the relevant kinds of cones
simultaneously but to different degrees.

11
Use of the three primary colors is not sufficient to reproduce all colors; only colors within
the color triangle defined by the chromaticities of the primaries can be reproduced by additive
mixing of non-negative amounts of those colors of light.

Photography

First experiments with RGB in early color photography were made in 1861 by Maxwell himself,
and involved the process of three color-filtered separate takes. To reproduce the color
photograph, three matching projections over a screen in a dark room were necessary.

The additive RGB model and variants such as orange–green–violet were also used in
the Autochrome Lumière color plates and other screen-plate technologies such as the Joly color
screen and the Paget process in the early twentieth century. Color photography by taking three
separate plates was used by other pioneers, such as Russian Sergey Prokudin-Gorsky in the
period 1909 through 1915.[5] Such methods last until about 1960 using the expensive and
extremely complex tri-color carbro Autotype process.

When employed, the reproduction of prints from three-plate photos was done by dyes or
pigments using the complementary CMY model, by simply using the negative plates of the
filtered takes: reverse red gives the cyan plate, and so on.

Television
Before the development of practical electronic TV, there were patents on mechanically scanned
color systems as early as 1889 in Russia. The color TV pioneer John Logie Bairddemonstrated
the world's first RGB color transmission in 1928, and also the world's first color broadcast in
1938, in London. In his experiments, scanning and display were done mechanically by spinning
colorized wheels.

The Columbia Broadcasting System (CBS) began an experimental RGB field-sequential color
system in 1940. Images were scanned electrically, but the system still used a moving part: the
transparent RGB color wheel rotating at above 1,200 rpm in synchronism with the vertical scan.
The camera and the cathode-ray tube (CRT) were both monochromatic. Color was provided by
color wheels in the camera and the receiver. More recently, color wheels have been used in
field-sequential projection TV receivers based on the Texas Instruments monochrome DLP
imager.

12
The modern RGB shadow mask technology for color CRT displays was patented by Werner
Flechsig in Germany in 1938.

Personal computers
Early personal computers of the late 1970s and early 1980s, such as those
from Apple, Atari and Commodore, did not use RGB as their main method to manage colors, but
rathercomposite video. IBM introduced a 16-color scheme (one bit each for RGB and Intensity)
with the Color Graphics Adapter (CGA) for its first IBM PC (1981), later improved with
theEnhanced Graphics Adapter (EGA) in 1984. The first manufacturer of a truecolor graphic
card for PCs (the TARGA) was Truevision in 1987, but it was not until the arrival of the Video
Graphics Array (VGA) in 1987 that RGB became popular, mainly due to the analog signals in
the connection between the adapter and the monitor which allowed a very wide range of RGB
colors.

RGB devices
One common application of the RGB color model is the display of colors on a cathode ray
tube (CRT), liquid crystal display (LCD),plasma display, or LED display such as a television, a
computer’s monitor, or a large scale screen. Each pixel on the screen is built by driving three
small and very close but still separated RGB light sources. At common viewing distance, the
separate sources are indistinguishable, which tricks the eye to see a given solid color. All the
pixels together arranged in the rectangular screen surface conforms the color image.

During digital image processing each pixel can be represented in the computer memory or
interface hardware (for example, a graphics card) as binary values for the red, green, and blue
color components. When properly managed, these values are converted into intensities or
voltages via gamma correction to correct the inherent nonlinearity of some devices, such that the
intended intensities are reproduced on the display.

The Quattron released by Sharp uses RGB color and adds yellow as a sub-pixel, supposedly
allowing an increase in the number of available colors.

Video electronics

RGB is also the term referring to a type of component video signal used in the video electronics
industry. It consists of three signals—red, green, and blue—carried on three separate cables/pins.

13
Extra cables are sometimes needed to carry synchronizing signals. RGB signal formats are often
based on modified versions of the RS-170 and RS-343 standards for monochrome video. This
type of video signal is widely used in Europe since it is the best quality signal that can be carried
on the standard SCART connector. This signal is known as RGBS (4 BNC/RCA terminated
cables exist as well), but it's not directly compatible with RGBHV used for computer monitors
(usually carried on 15-pin cables terminated with 15-pin D-sub or 5 BNC connectors), which
carries separate horizontal and vertical sync signals. There exist integrated circuits that decode
the composite sync signal into vertical and horizontal components, for instance theLM1881.The
newer LMH1980 and LMH1981 are advertised as supporting composite RGB sync separation.
They require fewer additional circuitry, and by auto-detecting the video format can be used for
separating the sync components in other types of component video with the same circuit.

Outside Europe, RGB is not very popular as a video signal format; S-Video takes that spot in
most non-European regions. However, almost all computer monitors around the world use RGB.

Video Framebuffer
A framebuffer is a digital device for computers which stores data in the so-called video
memory (comprising an array of Video RAM or similar chips). This data goes either to
three digital-to-analog converters (DACs) (for analog monitors), one per primary color, or
directly to digital monitors. Driven by software, the CPU (or other specialized chips) write the
appropriate bytes into the video memory to define the image. Modern systems encode pixel color
values by devoting eight bits to each of the R, G, and B components. RGB information can be
either carried directly by the pixel bits themselves, or provided by a separate Color Look-Up
Table (CLUT) if indexed color graphic modes are used.

A CLUT is a specialized RAM that stores R, G, and B values that define specific colors. Each
color has its own address (index) -- consider it as a descriptive reference number that provides
that specfic color when the image needs it. The content of the CLUT is much like a palette of
colors. Image data that uses indexed color specifies addresses within the CLUT to provide the
required R, G, and B values for each specific pixel, one pixel at a time. Of course, before
displaying, the CLUT has to be loaded with R, G, and B values that define the palette of colors
required for each image to be rendered.

14
This indirect scheme restricts the number of available colors in an image (typically 256),
although each color in the table has typically 8 bits for each of the R, G, and B primaries. This
means that any given color can be one of approx. 16.7 million possible colors. However, the
advantage is that an indexed-color image file can be significantly smaller than it would be with 8
bits per pixel for each primary. Modern storage, however, is far less costly, greatly reducing the
need to minimize image file size.

By using an appropriate combination of red, green, and blue intensities, many colors can be
displayed. Current typical display adaptersuse up to 24-bits of information for each pixel: 8-bit
per component multiplied by three components (see the Digital representationssection below).
With this system, 16,777,216 (2563 or 224) discrete combinations of R, G and B values are
allowed, providing millions of different (though not necessarily distinguishable) hue, saturation,
and lightness shades.

For images with a modest range of brightnesses from the darkest to the lightest, eight bits per
primary color provides good-quality images, but extreme images require more bits per primary
color as well as advanced display technology. For more information see High Dynamic
Range (HDR) imaging.

2.2.3 GRAYSCALE:-

15
In photography and computing, a grayscale or greyscale digital image is an image in which the
value of each pixel is a single sample, that is, it carries only intensity information. Images of this
sort, also known as black-and-white, are composed exclusively of shades of gray, varying from
black at the weakest intensity to white at the strongest

Grayscale images are distinct from one-bit black-and-white images, which in the context of
computer imaging are images with only the two colors, black, and white (also
called bilevel or binary images). Grayscale images have many shades of gray in between.
Grayscale images are also called monochromatic, denoting the absence of
any chromatic variation (ie: no color).

Grayscale images are often the result of measuring the intensity of light at each pixel in a single
band of the electromagnetic spectrum (e.g. infrared, visible light, ultraviolet, etc.), and in such
cases they are monochromatic proper when only a given frequency is captured. But also they can
be synthesized from a full color image; see the section about converting to grayscale

Numerical representation

The intensity of a pixel is expressed within a given range between a minimum and a maximum,
inclusive. This range is represented in an abstract way as a range from 0 (total absence,
black) and 1 (total presence, white), with any fractional values in between. This notation is
used in academic papers, but it must be noted that this does not define what "black" or
"white" is in terms of colorimetry.

Another convention is to employ percentages, so the scale is then from 0% to 100%. This is
used for a more intuitive approach, but if onlyinteger values are used, the range
encompasses a total of only 101 intensities, which are insufficient to represent a broad
gradient of grays. Also, the percentile notation is used in printing to denote how much ink
is employed in halftoning, but then the scale is reversed, being 0% the paper white (no
ink) and 100% a solid black (full ink).

In computing, although the grayscale can be computed through rational numbers, image pixels
are stored in binary, quantized form. Some early grayscale monitors can only show up to
sixteen (4-bit) different shades, but today grayscale images (as photographs) intended for
visual display (both on screen and printed) are commonly stored with 8 bits per sampled

16
pixel, which allows 256 different intensities (i.e., shades of gray) to be recorded, typically
on a non-linear scale. The precision provided by this format is barely sufficient to avoid
visible banding artifacts, but very convenient for programming due to the fact that a single
pixel then occupies a single byte.

Technical uses (e.g. in medical imaging or remote sensing applications) often require more
levels, to make full use of the sensor accuracy (typically 10 or 12 bits per sample) and to
guard against roundoff errors in computations. Sixteen bits per sample (65,536 levels) is a
convenient choice for such uses, as computers manage 16-bitwords efficiently.
The TIFF and the PNG (among other) image file formats supports 16-bit grayscale
natively, although browsers and many imaging programs tend to ignore the low order 8
bits of each pixel.

No matter what pixel depth is used, the binary representations assume that 0 is black and the
maximum value (255 at 8 bpp, 65,535 at 16 bpp, etc.) is white, if not otherwise noted.

Converting color to grayscale

Conversion of a color image to grayscale is not unique; different weighting of the color channels
effectively represent the effect of shooting black-and-white film with different-colored
photographic filters on the cameras. A common strategy is to match the luminance of the
grayscale image to the luminance of the color image. To convert any color to a grayscale
representation of its luminance, first one must obtain the values of its red, green, and blue
(RGB) primaries in linear intensity encoding, by gamma expansion. Then, add together
30% of the red value, 59% of the green value, and 11% of the blue value (these weights
depend on the exact choice of the RGB primaries, but are typical). Regardless of the scale
employed (0.0 to 1.0, 0 to 255, 0% to 100%, etc.), the resultant number is the desired linear
luminance value; it typically needs to be gamma compressed to get back to a conventional
grayscale representation.

This is not the method used to obtain the luma in the Y'UV and related color models, used in
standard color TV and video systems as PAL and NTSC, as well as in the L*a*b color

17
model. These systems directly compute a gamma-compressed luma as a linear combination
of gamma-compressed primary intensities, rather than use linearization via gamma
expansion and compression.

To convert a gray intensity value to RGB, simply set all the three primary color components red,
green and blue to the gray value, correcting to a different gamma if necessary Grayscale
as single channels of multichannel color images

Color images are often built of several stacked color channels, each of them representing value
levels of the given channel. For example, RGB images are composed of three independent
channels for red, green and blue primary color components; CMYK images have four
channels for cyan, magenta, yellow and black ink plates, etc.Here is an example of color
channel splitting of a full RGB color image. The column at left shows the isolated color
channels in natural colors, while at right there are their grayscale equivalence

18
2.2.4 BINARY IMAGE:-

A binary image is a digital image that has only two possible values for each pixel. Typically the
two colors used for a binary image are black and white though any two colors can be used. The
color used for the objects in the image is the foreground color while the rest of the image is the
background color.

Binary images are also called bi-level or two-level. This means that each pixel is stored as a
single bit (0 or 1). The names black-and-white, B&W,monochrome or monochromatic are often
used for this concept, but may also designate any images that have only one sample per pixel,
such as grayscale images. In Photoshop parlance, a binary image is the same as an image in
"Bitmap" mode.

Binary images often arise in digital image processing as masks or as the result of certain
operations such as segmentation, thresholding, and dithering. Some input/output devices, such
as laser printers, fax machines, and bilevel computer displays, can only handle bilevel images.

A binary image is usually stored in memory as a bitmap, a packed array of bits. A 640×480
image requires 37.5 KiB of storage.

Binary images can be interpreted as subsets of the two-dimensional integer lattice Z2; the field
of morphological image processing was largely inspired by this view.

Interpretation

The interpretation of the pixel's binary value is also device-dependent. Some systems interprets
the bit value of 0 as black and 1 as white, while others reversed the meaning of the values.

19
2.3 Digital image processing:-

Digital image processing is the use of computer algorithms to perform image


processing on digital images. As a subcategory or field of digital signal processing,
digital image processing has many advantages over analog image processing. It allows
a much wider range of algorithms to be applied to the input data and can avoid
problems such as the build-up of noise and signal distortion during processing. Since
images are defined over two dimensions (perhaps more) digital image processing may
be modeled in the form of Multidimensional Systems.

Many of the techniques of digital image processing, or digital picture processing as it often was
called, were developed in the 1960s at the Jet Propulsion Laboratory, Massachusetts
Institute of Technology, Bell Laboratories, University of Maryland, and a few other
research facilities, with application to satellite imagery, wire-photo standards
conversion, medical imaging, videophone, character recognition, and photograph
enhancement.[1] The cost of processing was fairly high, however, with the computing
equipment of that era. That changed in the 1970s, when digital image processing
proliferated as cheaper computers and dedicated hardware became available. Images then
could be processed in real time, for some dedicated problems such as television standards
conversion. As general-purpose computers became faster, they started to take over the role
of dedicated hardware for all but the most specialized and computer-intensive operations.

With the fast computers and signal processors available in the 2000s, digital image processing
has become the most common form of image processing and generally, is used because it
is not only the most versatile method, but also the cheapest.

Digital image processing technology for medical applications was inducted into the Space
Foundation Space Technology Hall of Fame in 1994 In pattern recognition and in image
processing, feature extraction is a special form of dimensionality reduction.

20
When the input data to an algorithm is too large to be processed and it is suspected to be
notoriously redundant (much data, but not much information) then the input data will be
transformed into a reduced representation set of features (also named features vector).
Transforming the input data into the set of features is called feature extraction. If the
features extracted are carefully chosen it is expected that the features set will extract the
relevant information from the input data in order to perform the desired task using this
reduced representation instead of the full size input.

Digital camera images

Digital cameras generally include dedicated digital image processing chips to convert the raw
data from the image sensor into a color-corrected image in a standard image file format.
Images from digital cameras often receive further processing to improve their quality, a
distinct advantage that digital cameras have over film cameras. The digital image
processing typically is executed by special software programs that can manipulate the
images in many ways.

Many digital cameras also enable viewing of histograms of images, as an aid for the
photographer to understand the rendered brightness range of each shot more readily

21
Fundamental steps in image processing:

1. Image acquisition:
to acquire a digital image
2. Image preprocessing:
to improve the image in ways that increase the chances for success
of the other proccess
3. Image segmentation:
to partitions an input image into its constituent parts or objects.
4. Image representation:
to convert the input data to a form suitable for computer processing.
5. Image description:
to extract features that result in some quantitative information of
interest or features that are basic for differentiating one class of objects from another.
6. Image recognition:
to assign a label to an object based on the information provided by
its descriptors.
7. Image interpretation:
to assign meaning to an ensemble of recognized objects.

22
2.3.1 IMAGE ACQUISITION:-

Digital imaging or digital image acquisition is the creation of digital images, typically from a
physical scene. The term is often assumed to imply or include the processing,
compression, storage, printing, and display of such images.

Digital imaging was developed in the 1960s and 1970s, largely to avoid the operational
weaknesses of film cameras, for scientific and military missions including the KH-11 program.
As digital technology became cheaper in later decades it replaced the old film methods for many
purposes.

Alternatively, it may be obtained from another image in an analog medium, such


asphotographs, photographic film, or printed paper, by an image scanner or similar device. Many
technical images—such as those acquired with tomographic equipment, side-scan sonar, or radio
telescopes—are actually obtained by complex processing of non-image data. This digitalization
of analog A digital image may be created directly from a physical scene by a camera or similar
devices. real-world data is known as digitizing, and involves sampling(discretization)
and quantization.

Finally, a digital image can also be computed from a geometric model or mathematical formula.
In this case the name image synthesis is more appropriate, and it is more often known
as rendering.

Digital image authentication is an issue for the providers and producers of high resolution digital
images such as health care organizations, law enforcement agencies and insurance companies.
There are methods emerging in forensic science to analyze a digital image and determine if it has
been altered

23
You can choose between two main types of cameras – analog and digital. Digital cameras can be
further classified into parallel digital, Camera Link, and IEEE 1394. The following sections
contain information about these cameras and their advantages and disadvantages, which can help
you choose the right camera for your application.

Analog Cameras

Analog cameras are cameras that generate a video signal in analog format. The analog signal is
digitized by an image acquisition device. The video signal is based on the television standard,
making analog the most common standard for representing video signals.

You may have heard the term charge-coupled device (CCD), and wondered how it relates to the
analog video signal. A CCD is an array of hundreds of thousands of interconnected
semiconductors. Each pixel is a solid-state, photosensitive element that generates and stores an
electric charge when it is illuminated. The pixel is the building block for the CCD imager, a
rectangular array of pixels on which an image of the scene is focused. In most configurations, the
sensor includes the circuitry that stores and transfers its charge to a shift register, which converts
the spatial array of charges in the CCD imager into a time-varying video signal. Timing
information for the vertical and horizontal positions and the sensor value combine to form the
video signal.

For standard analog cameras, the lines of the CCD are interlaced to increase the perceived image
update rate. This means that the odd-numbered rows (the odd field) are scanned first. Then the
even-numbered fields (the even field) are scanned. The two fields make up one frame. Electronic
Industries Association (EIA) RS-170 and NTSC cameras update at 30 frames/s with a resolution
of 640 columns x 480 rows. CCIR and PAL cameras update at 25 frames/s with a resolution of
768 columns x 576 rows.

Analog cameras are low in cost and easy to interface with a standard analog acquisition device.
Therefore, they can solve numerous applications at an attractive price.

24
Digital Cameras

Digital cameras have several advantages over analog cameras. Analog video is more susceptible
to noise during transmission than digital video. By digitizing at the camera level rather than at
the image acquisition device, the signal-to-noise ratio is typically higher, resulting in better
accuracy. Because digital cameras are not required to conform to television standards, they can
offer larger image sizes and faster frame rates, as well as higher pixel resolutions. Digital
cameras come with 10 to 16-bit gray levels of resolution as a standard for machine vision,
astronomy, microscopy, and thermal imaging applications. Digital cameras use the same CCD
type devices for acquiring images as analog, they simply digitize the video before sending it to
the frame grabber.

Parallel Digital Cameras

Until recently, parallel digital cameras were the only type of digital cameras available. They
offer all of the benefits mentioned above. However, parallel digital cameras have no clear
physical or protocol standards, and interfacing to digital acquisition devices can be difficult.
Parallel digital cameras often require custom cables to connect with image acquisition devices.
You also must be certain that your camera is compatible with your image acquisition device.

Fortunately, a large base of parallel cameras exists on the market for almost any imaging
application. National Instruments provides cables and camera configuration files to make
connecting to parallel digital cameras easy. To determine whether one of the NI acquisition
devices is compatible wiith your camera, visit Camera Advisor on ni.com.

Camera Link

Camera Link is an interface specification for cables that connect digital cameras to image
acquisition devices. It preserves the benefits of digital cameras – such as flexibility for many
types of sensors – yet it has only a small connector and one or two identical cables, which work

25
with all Camera Link image acquisition devices. Camera Link greatly simplifies cabling, which
can be a complex task when working with standard digital cameras. To determine whether one of
the NI acquisition devices is compatible with your camera, visit Camera Advisor on ni.com.

IEEE 1394

IEEE 1394 is a serial bus standard used by many PC peripherals, including digital cameras. IEEE
1394 cameras use a simple, flexible, 4 or 6-wire power cable; and in some cases, the bus can
supply power to the camera. However, because IEEE 1394 is a shared bus, there is a bandwidth
limitation of approximately 40 MB/s when no other device is connected to the bus. IEEE 1394
cameras also require processor control to move the image data, which limits available processor
bandwidth for image processing.

IEEE 1394 is a standard that also includes functions for enumerating and setting up the camera
capabilities. You can acquire images from any industrial IEEE 1394 camera and OHCI-
compliant IEEE 1394 adapter using NI-IMAQ for IEEE 1394 Cameras driver software, which
you can purchase at ni.com.

26
2.3.2 IMAGE PREPROCESSING:-

Images from the brain scanner may be pre-processed before any statistical comparison takes
place to remove noise or correct for sampling errors.

A study will usually scan a subject several times. To account for the motion of the head between
scans, the images will usually be adjusted so each of the voxels in the images corresponds
(approximately) to the same site in the brain. This is referred to as realignment or motion
correction, see image realignment.

Functional neuroimaging studies usually involve several participants, who will have slightly
differently shaped brains. All are likely to have the same gross anatomy, but there will be minor
differences in overall brain size, individual variation in topography of the gyri and sulci of
the cerebral cortex, and morphological differences in deep structures such as the corpus
callosum. To aid comparisons, the 3D image of each brain is transformed so that superficial
structures line up, a process known as spatial normalization. Such normalization typically
involves not only translation and rotation, but also scaling and nonlinear warping of the brain
surface to match a standard template. Standard brain maps such as the Talairach-Tournoux or
templates from the Montréal Neurological Institute (MNI) are often used to allow researchers
from across the world to compare their results.

Images are often smoothed (similar to the 'blur' effect used in some image-editing software) by
which voxels are averaged with their neighbours, typically using a Gaussian filter or
bywavelet transformation, to make the data less noisy.

27
2.3.3 IMAGE SEGMENTATION:-
In computer vision, segmentation refers to the process of partitioning a digital image into
multiple segments (sets of pixels, also known as superpixels). The goal of segmentation is to
simplify and/or change the representation of an image into something that is more meaningful
and easier to analyze. Image segmentation is typically used to locate objects and boundaries
(lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a
label to every pixel in an image such that pixels with the same label share certain visual
characteristics.

The result of image segmentation is a set of segments that collectively cover the entire image, or
a set of contours extracted from the image (see edge detection). Each of the pixels in a region are
similar with respect to some characteristic or computed property, such as color, intensity,
or texture. Adjacent regions are significantly different with respect to the same characteristic(s).
[1]
When applied to a stack of images, typical in Medical imaging, the resulting contours after
image segmentation can be used to create 3D reconstructions with the help of interpolation
algorithms like Marching cubes.

Smoothing:-

In image processing, to smooth a data set is to create an approximating function that attempts to
capture important patterns in the data, while leaving out noise or other fine-scale
structures/rapid phenomena. Many different algorithms are used in smoothing. One of the
most common algorithms is the "moving average", often used to try to capture important
trends in repeated statistical surveys. In image processing and computer vision, smoothing

28
ideas are used in scale-space representations.Smoothing may be distinguished from the
related and partially overlapping concept of curve fitting in the following wayscurve fitting
often involves the use of an explicit function form for the result, whereas the immediate
results from smoothing are the "smoothed" values with no later use made of a functional
form if there is one;the aim of smoothing is to give a general idea of relatively slow
changes of value with little attention paid to the close matching of data values, while curve
fitting concentrates on achieving as close a match as possible.smoothing methods often
have an associated tuning parameter which is used to control the extent of
smoothing.However, the terminology used across applications is mixed. For example, use
of an interpolating spline fits a smooth curve exactly through the given data points

29
2.3.4 IMAGE REPRESETATION

Quantized images are commonly represented as sets of pixels encoding color/brightness


information in matrix form. An alternative model is based on contour lines: A contour
representation allows for easy retrieval of the full image in bitmap form . It has been used
primarily for data compression of an image. The idea is to encode, for each level , the boundaries
of connected regions of pixels at levels greater than or equal to . It is easy to reconstruct an
original image from those boundaries. There exist output-sensitive algorithms for computing
contour representations.

One problem is how to store such representations in a compact manner. In practice one seldom
needs the entire contour representation. Typical use is in the form of a query asking for the
contours matching a given gray level. Current data structuring techniques should be called upon
to provide efficient solutions. So far, they have not. Attempts to remedy this are underway.

Here is a typical problem encountered with this kind of representation. Suppose that we wish to
erase wrinkles around the eyes of a person in a digitized picture given in contour representation.
Because wrinkles might intersect contour lines, these might become disconnected after removal
of the wrinkles. To reconnect them is not so easy. Dynamic programming might be a natural
approach for this problem. In fact, it has been tried successfully in several experiments. Dynamic
programming tends to be costly, however, and faster heuristics should be investigated.

Another problem ripe for a computational-geometric attack is resolution enhancement. Suppose


we have an image scanner with 6-bit resolution for each of the three colors (RGB), and we wish

30
to increase the resolution to, say, 8 bits. The naive approach, which consists of adding the gray
levels of four pictures of the same object, has immediate flaws. Of the numerous solutions
proposed in the computer vision literature, most suffer the drawback of causing blurring. As it
turns out, the problem has a natural formulation in terms of weighted Voronoi diagrams, a well-
studied construction in computational geometry.

Since such a Voronoi diagram is hard to compute especially in the presence of high degeneracy,
a different approach might be preferable. An important observation here is that the objects are
not continuous but discrete. This is the main difference with interpolation of contour lines in
geographic information systems. Indeed, suppose that we want to improve the resolution on an
intensity level by a factor of . This means partitioning the pixels at intensity level into different
finer levels. Let denote the set of all pixels with intensity level , and denote the set of all contour
edges between pixels with intensity levels lower (resp. higher) than . For each pixel in we may
compute the Euclidean distance to the nearest boundary edges . Using the ratio between these
distances we can thus classify those pixels into finer levels. Without getting into details, it is
apparent that several variants of this heuristic can be designed. Efficient implementations and
classifications of these heuristics would be very useful.

31
2.3.5 IMAGE DESCRIPTION
When a file such as an image, video or sound clip is uploaded to Wikipedia or the Wikimedia
Commons an associated file page is created. The purpose of these pages is to provide
information about the file, such as the author, date of creation, who uploaded the file, any
modifications that may have been made, an extended description about the file's subject or
context, where the file is used, and license or copyright information. In the case of an image, the
file page shows a higher resolution version of the image, if available. To view the file page for an
image or video, click on the image itself.

Image summary

Most articles that use images will have a caption, but this will likely be shorter than the image's
full description, and more closely related to the text of the article.

Keep in mind that everyone who sees this image in an article and clicks on it for more
information (or to enlarge it) arrives at the file description page.

If you made the image yourself, there are certain questions which only you can answer. Because
you may not be around to answer those questions later, you should include this information in
the description page when you upload the image. This will help other editors to make better use
of the image, and it will be more informative for readers.

For pictures:

 Where was the picture taken?

32
 When was the picture taken?
 What are the names of all the people and notable objects visible in the picture?
 What is happening in the picture?
 Who was the photographer?

For synthetic pictures:

 Diagrams and markings should be explained as completely as possible.


 If necessary, a legend or key should be provided.

Technical information for pictures:

 If a film camera was used, provide the model number, lense information and exposure
settings
 What post-production modifications where made? (adjustments to color, contrast etc.)

Technical information for synthetic images:

 What software was used to create or edit the image?


 What pre-existing sources (free images, photos, etc) were used as inputs?

33
2.3.6 IMAGE RECOGNITION
Computer vision is the science and technology of machines that see, where see in this case
means that the machine is able to extract information from an image that is necessary to solve
some task. As a scientific discipline, computer vision is concerned with the theory behind
artificial systems that extract information from images. The image data can take many forms,
such as video sequences, views from multiple cameras, or multi-dimensional data from a medical
scanner.

As a technological discipline, computer vision seeks to apply its theories and models to the
construction of computer vision systems. Examples of applications of computer vision include
systems for:

 Controlling processes (e.g., an industrial robot or an autonomous vehicle).


 Detecting events (e.g., for visual surveillance or people counting).
 Organizing information (e.g., for indexing databases of images and image sequences).
 Modeling objects or environments (e.g., industrial inspection, medical image analysis or
topographical modeling).
 Interaction (e.g., as the input to a device for computer-human interaction).

34
Computer vision is closely related to the study of biological vision. The field of biological vision
studies and models the physiological processes behind visual perception in humans and other
animals. Computer vision, on the other hand, studies and describes the processes implemented in
software and hardware behind artificial vision systems. Interdisciplinary exchange between
biological and computer vision has proven fruitful for both fields.

Computer vision is, in some ways, the inverse of computer graphics. While computer graphics
produces image data from 3D models, computer vision often produces 3D models from image
data. There is also a trend towards a combination of the two disciplines, e.g., as explored
in augmented reality.

Sub-domains of computer vision include scene reconstruction, event detection, video


tracking, object recognition, learning, indexing, motion estimation, and image restoration.

The classical problem in computer vision, image processing, and machine vision is that of
determining whether or not the image data contains some specific object, feature, or activity.
This task can normally be solved robustly and without effort by a human, but is still not
satisfactorily solved in computer vision for the general case: arbitrary objects in arbitrary
situations. The existing methods for dealing with this problem can at best solve it only for
specific objects, such as simple geometric objects (e.g., polyhedra), human faces, printed or
hand-written characters, or vehicles, and in specific situations, typically described in terms of
well-defined illumination, background, and pose of the object relative to the camera.

Different varieties of the recognition problem are described in the literature:

 Object recognition: one or several pre-specified or learned objects or object classes can
be recognized, usually together with their 2D positions in the image or 3D poses in the scene.
 Identification: An individual instance of an object is recognized. Examples:
identification of a specific person's face or fingerprint, or identification of a specific vehicle.
 Detection: the image data is scanned for a specific condition. Examples: detection of
possible abnormal cells or tissues in medical images or detection of a vehicle in an automatic
road toll system. Detection based on relatively simple and fast computations is sometimes
used for finding smaller regions of interesting image data which can be further analysed by
more computationally demanding techniques to produce a correct interpretation.

35
Several specialized tasks based on recognition exist, such as:

 Content-based image retrieval: finding all images in a larger set of images which have
a specific content. The content can be specified in different ways, for example in terms of
similarity relative a target image (give me all images similar to image X), or in terms of high-
level search criteria given as text input (give me all images which contains many houses, are
taken during winter, and have no cars in them).
 Pose estimation: estimating the position or orientation of a specific object relative to the
camera. An example application for this technique would be assisting a robot arm in
retrieving objects from a conveyor belt in an assembly line situation.
 Optical character recognition (OCR): identifying characters in images of printed or
handwritten text, usually with a view to encoding the text in a format more amenable to
editing orindexing (e.g. ASCII).

Motion analysis

Several tasks relate to motion estimation where an image sequence is processed to produce an
estimate of the velocity either at each points in the image or in the 3D scene, or even of the
camera that produces the images . Examples of such tasks are:

 Egomotion: determining the 3D rigid motion (rotation and translation) of the camera
from an image sequence produced by the camera.
 Tracking: following the movements of a (usually) smaller set of interest points or objects
(e.g., vehicles or humans) in the image sequence.
 Optical flow: to determine, for each point in the image, how that point is moving relative
to the image plane, i.e., its apparent motion. This motion is a result both of how the
corresponding 3D point is moving in the scene and how the camera is moving relative to the
scene.

Face detection and recognition:-

36
Special case of image recognition is Face detection and recognition As one of the most
successful applications of image analysis and understanding, face recognition has
recently received significant attention, especially during the past several years. At
least two reasons account for this trend: the first is the wide range of commercial and
law enforcement applications, and the second is the availability of feasible
technologies after 30 years of research. Even though current machine recognition
systems have reached a certain level of maturity, their success is limited by the
conditions imposed by many real applications. For example, recognition of face
images acquired in an outdoor environment with changes in illumination and/or pose
remains a largely unsolved problem. In other words, current systems are still far
away from the capability of the human perception system.This paper provides an up-
to-date critical survey of still- and video-based face recognition research. There are
two underlying motivations for us to write this survey paper: the first is to provide an
up-to-date review of the existing literature, and the second is to offer some insights
into the studies of machine recognition of faces. To provide a comprehensive survey,
we not only categorize existing recognition techniques but also present detailed
descriptions of representative methods within each category. In addition, relevant
topics such as psychophysical studies, system evaluation, and issues of illumination
and pose variation are covered.

37
2.3.7 IMAGE INTERPRETATION
Photographic interpretation can be defined as: “the act of examining photographic images for the
purpose of identifying objects and judging their significance.

Principles of image interpretation have been developed empirically for more than 150 years. The
most basic of these principles are the elements of image interpretation. They are: location, size,
shape, shadow, tone/color, texture, pattern, height/depth and site/situation/association. These are
routinely used when interpreting an aerial photo or analyzing a photo-like image. A well-trained
image interpreter uses many of these elements during his or her analysis without really thinking
about them. However, a beginner may not only have to force himself or herself to consciously
evaluate an unknown object with respect to these elements, but also analyze its significance in
relation to the other objects or phenomena in the photo or image.

Elements of Interpretation
The following are elements of aerial photographic and satellite image interpretation
1 Location:- There are two primary methods to obtain precise location in the form of
coordinates. 1) survey in the field using traditional surveying techniques or global positioning

38
system instruments, or 2) collect remotely sensed data of the object, rectify the image and then
extract the desired coordinate information. Most scientists who choose option 1 now use
relatively inexpensive GPS instruments in the field to obtain the desired location of an object. If
option 2 is chosen, most aircraft used to collect the remotely sensed data have a GPS receiver.
This allows the aircraft to obtain exact latitude/longitude coordinates each time a photo is taken.

2 Size:- The size of an object is one of the most distinguishing characteristics and one of the
more important elements of interpretation. Most commonly, length, width and perimeter are
measured. To be able to do this successfully, it is necessary to know the scale of the photo.
Measuring the size of an unknown object allows the interpreter to rule out possible alternatives.
It has proved to be helpful to measure the size of a few well-known objects to give a comparison
to the unknown-object. For example, field dimensions of major sports like soccer, football, and
baseball are standard throughout the world. If objects like this are visible in the image, it is
possible to determine the size of the unknown object by simply comparing the two.

3 Shape:- There is an infinite number of uniquely shaped natural and man-made objects in the
world. A few examples of shape are the triangular shape of modern jet aircraft and the shape of a
common single family dwelling. Humans has modified the landscape in very interesting ways
that has given shape to many objects, but nature also shapes the landscape in its own ways. In
general, straight, recti-linear features in the environment are of human origin. Nature produces
more subtle shapes.

4 Shadow:- Virtually all remotely sensed data is collected within 2 hours of solar noon to avoid
extended shadows in the image or photo. This is because shadows can obscure other objects that
could otherwise be identified. On the other hand, the shadow cast by an object may be key to the
identity of another object. Take for example the Washington Monument in Washington D.C.
While viewing this from above it can be difficult to discern the shape of the monument, but with
a shadow cast, this process becomes much easier. It is good practice to orient the photos so that
the shadows are falling towards the interpreter. A pseudoscopic illusion can be produced if the
shadow is oriented away from the observer. This happens when low points appear high and high
points appear low.

5 Colour:- Real world materials like vegetation, water and bare soil reflect different proportions
of energy in the blue, green, red, and infrared portions of the electro-magnetic spectrum. An

39
interpreter can document the amount of energy reflected from each at specific wavelengths to
create a spectral signature. These signatures can help understand why certain objects appear as
they do on black and white color imagery. These shades of gray are referred to as tone. The
darker an object appears the less amount of light it reflects.Color imagery is often preferred
because, as opposed to shades of gray, humans can detect thousands of different colors. Color
aids in the process of photo interpretation.

6 Texture:- This is defined as the “characteristic placement and arrangement of repetitions of


tone or color in an image.” Adjectives often used to describe texture are smooth (uniform,
homogeneous), intermediate, and rough (coarse, heterogeneous). It is important to remember that
texture is a product of scale. On a large scale depiction, objects could appear to have an
intermediate texture. But, as the scale becomes smaller, the texture could appear to be more
uniform, or smooth. A few examples of texture could be the “smoothness” of a paved road, or
the “coarseness” a pine forest.

7 Pattern:- Pattern is the spatial arrangement of objects in the landscape. The objects may be
arranged randomly or systematically. They can be natural, as with a drainage pattern of a river,
or man-made, as with the squares formed from the United States Public Land Survey System.
Typical adjectives used in describing pattern are: random, systematic, circular, oval, linear,
rectangular, and curvilinear to name a few.

8 Height and Depth:- Height and depth, also known as “elevation” and “bathymetry”, is one of
the most diagnostic elements of image interpretation. This is because any object, such as a
building or electric pole that rises above the local landscape will exhibit some sort of radial
relief. Also, objects that exhibit this relief will cast a shadow that can also provide information as
to its height or elevation. A good example of this would be buildings of any major city.

9 Situation:- Site has unique physical characteristics which might include elevation, slope, and
type of surface cover (e.g., grass, forest, water, bare soil). Site can also have socioeconomic
characteristics such as the value of land or the closeness to water. Situation refers to how the
objects in the photo or image are organized and “situated” in respect to each other. Most power
plants have materials and building associated in a fairly predictable manner. Association refers to
the fact that when you find a certain activity within a photo or image, you usually encounter
related or “associated” features or activities. Site, situation, and association are rarely used

40
independent of each other when analyzing an image. An example of this would be a large
shopping mall. Usually there are multiple large buildings, massive parking lots, and it is usually
located near a major road or intersection.

3. Conclusion
Our contribution in the project proved to be fruitful as we were successfully able to extract the
human face from an input image which was captured by an image acquisition device at any
particular instance of time and then match the image with the image database. Face recognition
was done successfully and it can be applied in various applications used for authentication.

41
4.Future Enhancements

• Face detection is the first step of “Face Recognition”, it has many utilities e.g. counting
number of people,tagging people as commonly seen in facebook ™.
• Face recognition can be used for authentication of passwords, it protects the user from
hackers keeping the password protected.

42
• Another very useful and important enhancement can be for checking the attendance of
students and for authentication of employees in an organization.
• It can used to match the images captured by the CCTV at various public places with the
images present in the criminal records.
• Face recognition can be used for verifying visas and can be used in e-passport microchip
to verify if the holder is the rightful owner of the passport

So this project can be further enhanced in order to incorporate many functionalities it can further
be enhanced to make it faster and smaller.

5. References

• For any help regarding image processing and image acquisition please visit the website
www.mathworks.com.

43
• Image Acquisition and Image Processing Toolboxes present in the HELP of the
MATLAB software.
• Paper of "A simple and accurate face detection algorithm in complex background"
published by Yu-Tang Pai, Shanq-Jang Ruan, Mon-Chau Shie, Yi-Chi Liu
• Wikipedia http://www.wikipedia.org/
• And last but not least our best friend http://www.google.co.in.

44
Source code

% initialising webcam

imaqhwinfo

imaqhwinfo('winvideo',1)

v=videoinput('winvideo',1,'YUY2_160X120')

preview (v);

45
% creating database

rootname='ah';

extension='.jpg';

for i=1:10

filename=[rootname,int2str(i),extension];

d=getsnapshot(v);

d=ycbcr2rgb(d);

pause(1);

figure,imshow(d);

imwrite(d,filename);

end

for k=1:30

filename=[rootname,int2str(k),extension];

I=imread(filename);

l=I;

I=double(I);

H=size(I,1);

W=size(I,2);

R=I(:,:,1);

G=I(:,:,2);

46
B=I(:,:,3);

YCbCr=rgb2ycbcr(I);

Y=YCbCr(:,:,1);

minY=min(min(Y));

maxY=max(max(Y));

Y=255.0*(Y-minY)./(maxY-minY);

YEye=Y;

Yavg=sum(sum(Y))/(W*H);

T=1;

if (Yavg<64)

T=1.4;

elseif (Yavg>192)

T=0.6;

end

if (T~=1)

RI=R.^T;

GI=G.^T;

else

47
RI=R;

GI=G;

end

C=zeros(H,W,3);

C(:,:,1)=RI;

C(:,:,2)=GI;

C(:,:,3)=B;

figure,imshow(C/255);

title('Lighting compensation');

%skin Extraction

YCbCr=rgb2ycbcr(C);

Cr=YCbCr(:,:,3);

S=zeros(H,W);

[SkinIndexRow,SkinIndexCol] =find(10<Cr & Cr<45);

for i=1:length(SkinIndexRow)

S(SkinIndexRow(i),SkinIndexCol(i))=1;

end

48
figure,imshow(S);

title('skin');

% REMOVE NOISE

SN=zeros(H,W);

for i=1:H-5

for j=1:W-5

localSum=sum(sum(S(i:i+4, j:j+4)));

SN(i:i+5, j:j+5)=(localSum>12);

end

end

figure,imshow(SN);

title('skin with noise removal');

% Skin Block

L = bwlabel(SN,8);

BB = regionprops(L, 'BoundingBox');

bboxes= cat(1, BB.BoundingBox);

49
widths=bboxes(:,3);

heights=bboxes(:,4);

hByW=heights./widths;

lenRegions=size(bboxes,1);

foundFaces=zeros(1,lenRegions);

rgb=label2rgb(L);

figure,imshow(rgb);

title('face candidates');

for i=1:lenRegions

if (hByW(i)>1.75 || hByW(i)<0.75)

continue;

50
end

if (heights(i)<20 && widths(i)<20)

continue;

end

% Get current region's bounding box

CurBB=bboxes(i,:);

XStart=CurBB(1);

YStart=CurBB(2);

WCur=CurBB(3);

HCur=CurBB(4);

% Crop current region

rangeY=int32(YStart):int32(YStart+HCur-1);

rangeX= int32(XStart):int32(XStart+WCur-1);

RIC=RI(rangeY, rangeX);

GIC=GI(rangeY, rangeX);

BC=B(rangeY, rangeX);

figure, imshow(RIC/255);

51
title('Possible face R channel');

l=rgb2gray(l);

M=zeros(HCur, WCur);

theta=acos( 0.5.*(2.*RIC-GIC-BC) ./ sqrt( (RIC-GIC).*(RIC-GIC) + (RIC- BC).*(GIC-


BC) ) );

theta(isnan(theta))=0;

thetaMean=mean2(theta);

[MouthIndexRow,MouthIndexCol] =find(theta<thetaMean/4);

for j=1:length(MouthIndexRow)

M(MouthIndexRow(j),MouthIndexCol(j))=1;

end

imagesc(l); hold on; colormap gray;

u = fdmex(l', 0);

Hist=zeros(1, HCur);

for j=1:HCur

Hist(j)=length(find(M(j,:)==1));

end

wMax=find(Hist==max(Hist));

52
wMax=wMax(1); % just take one of them.

if (wMax < WCur/6)

%reject due to not existing mouth

continue;

end

end

% draw rectangle

p=l;

figure,imshow(p)

for i=1:size(u,1)

h = rectangle('Position',[u(i,1)-u(i,3)/2,u(i,2)-u(i,3)/2,u(i,3),u(i,3)], ...

'EdgeColor', [1,0,0], 'linewidth', 2);

end

axis equal;

axis off

%coppping all database images

p=imcrop(p,[u(i,1)-u(i,3)/2,u(i,2)-u(i,3)/2,u(i,3),u(i,3)]);

53
figure,imshow(p);

h=imresize(p,[54 54]);

imwrite(h,filename);

end

I=getsnapshot(v);

I=ycbcr2rgb(I);

figure,imshow(I);

x=I;

54
I=double(I);

H=size(I,1);

W=size(I,2);

R=I(:,:,1);

G=I(:,:,2);

B=I(:,:,3);

YCbCr=rgb2ycbcr(I);

Y=YCbCr(:,:,1);

minY=min(min(Y));

maxY=max(max(Y));

Y=255.0*(Y-minY)./(maxY-minY);

YEye=Y;

Yavg=sum(sum(Y))/(W*H);

T=1;

if (Yavg<64)

T=1.4;

elseif (Yavg>192)

T=0.6;

55
end

if (T~=1)

RI=R.^T;

GI=G.^T;

else

RI=R;

GI=G;

end

C=zeros(H,W,3);

C(:,:,1)=RI;

C(:,:,2)=GI;

C(:,:,3)=B;

figure,imshow(C/255);

title('Lighting compensation');

%skin Extraction

56
YCbCr=rgb2ycbcr(C);

Cr=YCbCr(:,:,3);

S=zeros(H,W);

[SkinIndexRow,SkinIndexCol] =find(10<Cr & Cr<45);

for i=1:length(SkinIndexRow)

S(SkinIndexRow(i),SkinIndexCol(i))=1;

end

figure,imshow(S);

title('skin');

x=rgb2gray(x);

% REMOVE NOISE

SN=zeros(H,W);

for i=1:H-5

for j=1:W-5

localSum=sum(sum(S(i:i+4, j:j+4)));

SN(i:i+5, j:j+5)=(localSum>12);

end

end

57
figure,imshow(SN);

title('skin with noise removal');

% Skin Block

L = bwlabel(SN,8);

BB = regionprops(L, 'BoundingBox');

bboxes= cat(1, BB.BoundingBox);

widths=bboxes(:,3);

heights=bboxes(:,4);

hByW=heights./widths;

lenRegions=size(bboxes,1);

foundFaces=zeros(1,lenRegions);

rgb=label2rgb(L);

figure,imshow(rgb);

title('face candidates');

58
for i=1:lenRegions

if (hByW(i)>1.75 || hByW(i)<0.75)

continue;

end

if (heights(i)<20 && widths(i)<20)

continue;

end

% Get current region's bounding box

CurBB=bboxes(i,:);

XStart=CurBB(1);

YStart=CurBB(2);

WCur=CurBB(3);

HCur=CurBB(4);

% Crop current region

59
rangeY=int32(YStart):int32(YStart+HCur-1);

rangeX= int32(XStart):int32(XStart+WCur-1);

RIC=RI(rangeY, rangeX);

GIC=GI(rangeY, rangeX);

BC=B(rangeY, rangeX);

figure,imshow(RIC/255);

title('Possible face R channel');

M=zeros(HCur, WCur);

theta=acos( 0.5.*(2.*RIC-GIC-BC) ./ sqrt( (RIC-GIC).*(RIC-GIC) + (RIC- BC).*(GIC-


BC) ) );

theta(isnan(theta))=0;

thetaMean=mean2(theta);

[MouthIndexRow,MouthIndexCol] =find(theta<thetaMean/4);

for j=1:length(MouthIndexRow)

M(MouthIndexRow(j),MouthIndexCol(j))=1;

end

imagesc(x); hold on; colormap gray;

s = fdmex(x', 0);

Hist=zeros(1, HCur);

60
for j=1:HCur

Hist(j)=length(find(M(j,:)==1));

end

wMax=find(Hist==max(Hist));

wMax=wMax(1); % just take one of them.

if (wMax < WCur/6)

%reject due to not existing mouth

continue;

end

end

% draw rectangle

figure,imshow(x)

for i=1:size(s,1)

h = rectangle('Position',[s(i,1)-s(i,3)/2,s(i,2)-s(i,3)/2,s(i,3),s(i,3)], ...

61
'EdgeColor', [1,0,0], 'linewidth', 2);

end

axis equal;

axis off

q=imcrop(q,[s(i,1)-s(i,3)/2,s(i,2)-s(i,3)/2,s(i,3),s(i,3)]);

figure,imshow(q);

imwrite(q,'2.jpg');

h=imread('2.jpg');

h=imresize(h,[61 61]);

imwrite(h,'2.jpg');

im1=imread('2.jpg');

rootname='ah';

extension='.jpg';

for i=1:30

filename=[rootname,int2str(i),extension];

im2=imread(filename);

62
D = sqrt(sum((im2(:) - im1(:)).^2)) / sqrt(sum(im1(:).^2));

if(D<0.4)

if(i>=1 && i<=10)

figure,imshow(filename);

title('face is correctly matched with hemant')

break;

elseif(i>=11 && i<=20)

figure,imshow(filename)

title('face is correctly matched with akash')

break;

else

figure,imshow(filename);

title('face is correctly matched with abhay')

break;

end

end

end

if(i>30)

63
fprintf('face is not correctly matched with database ');

figure,imshow('failed.jpg');

end

64
SNAPSHOTS

65
The image shown above is in YCbCr format.

66
The image shwon above is in RGB format

67
Image Database

Skin

68
Skin with noise noise removal

Possible Face Candidates

69
Cropped images

70

You might also like