Professional Documents
Culture Documents
The process works like this. First, we will load the original image, convert it to
grayscale, Then, we will use the > operator to apply the threshold t, a number in
the closed range [0.0, 1.0]. Pixels with color values on one side of t will be turned
“on,” while pixels with color values on the other side will be turned “off.” In
order to use this function, we have to determine a good value for t. How might
we do that? Well, one way is to look at a grayscale histogram of the image. Here
is the histogram.
Since the image has a white background, most of the pixels in the image are white.
This corresponds nicely to what we see in the histogram: there is a spike near the
value of 1.0. If we want to select the shapes and not the background, we want to
turn off the white background pixels, while leaving the pixels for the shapes
turned on. So, we should choose a value of t somewhere before the large peak
and turn pixels above that value “off”.
Set t=0.8
We can now apply the mask to the original colored image. What we are left with
is only the colored shapes from the original, as shown in this image:
Image Thresholding
The simplest method of segmenting a gray-scale image is to threshold it. By
selecting a certain value, and setting all pixel values below it to one (white), and
all pixel values above it to zero (black), we get a binary image. The best threshold
value changes from an image to another. It is also dependent of location, because
some parts of the text are darker or lighter than others. So, the larger the image,
the harder it is to choose a globally acceptable threshold value. A good estimate
can easily be found by examining histograms (or single pixel values) of areas
inside characters and choosing an upper boundary appropriately. The best value
can be quickly found manually, because the process is very quick.
Fourier transform
Fourier transform is used to convert a signal into frequency domain.
The Fourier transform simply states that that the non periodic signals whose area
under the curve is finite can also be represented into integrals of the sines and
cosines after being multiplied by a certain weight.
The Fourier transform has many wide applications that include, image compression
(e.g JPEG compression), filtering and image analysis.
Since we are dealing with images, and in fact digital images, so for digital images we
will be working on discrete fourier transform
• Spatial Frequency
• Magnitude
• Phase
The spatial frequency directly relates with the brightness of the image.
The magnitude of the sinusoid directly relates with the contrast. Contrast is the
difference between maximum and minimum pixel intensity.
Phase contains the color information.
The formula for 2 dimensional discrete Fourier transform is given below.
j
П P
f’
C’ k
O
P’
i
The line perpendicular to П and passing through the pinhole is called the optical axis
and the point C′ where it intersects П is called the image center. This point can be used as the
origin of the image coordinate system and plays an important role in camera calibration and
image processing. Let P denote a scene point with coordinates (x, y, z) and P′ denote its image
with coordinates (x′, y′, z′). Since P′ lies in the image plane, therefore z′=f′. Also the three points
P, O and P′ are collinear, therefore
⃗⃗⃗⃗⃗⃗⃗ = 𝜆 ⃗⃗⃗⃗⃗⃗
𝑂𝑃′ 𝑂𝑃 for some constant λ, so
𝑥 ′ = 𝜆𝑥 𝑥′ 𝑦′ 𝑧′
′
{𝑦 = 𝜆𝑦 ↔ 𝜆 = = = (1.1)
𝑥 𝑦 𝑧
𝑧 ′ = 𝜆𝑧
Since z’= f’, the position of point P′ in the image plane is given by
𝑥
𝑥 ′ = 𝑓′
{ 𝑧 (1.2)
𝑦
𝑦 ′ = 𝑓′
𝑧
What is mask?
Mask is also a signal. It can be represented by a two dimensional matrix. The mask
is usually of the order of 1x1, 3x3, 5x5, 7x7 . A mask should always be in odd number,
because other wise you cannot find the mid of the mask.
A mask is a filter. Concept of masking is also known as spatial filtering. Masking is
also known as filtering.
-1 0 1
-1 0 1
-1 0 1
Example of a mask
Why do we need to find the mid of the mask. The answer lies below, in topic of, how
to perform convolution?
What is filtering
The process of filtering is also known as convolving a mask with an image. As
this process is same of convolution so filter masks are also known as convolution
masks.
Types of filters
Generally there are two types of filters. One is called as linear filters or smoothing
filters and others are called as frequency domain filters.
Filters are applied on image for multiple purposes. The two most common uses are
as following:
Blurring
In blurring, we simple blur an image. An image looks more sharp or more detailed if
we are able to perceive all the objects and their shapes correctly in it. For example.
An image with a face, looks clear when we are able to identify eyes, ears, nose, lips,
forehead etc. very clear. This shape of an object is due to its edges. So in blurring,
we simple reduce the edge content and makes the transition form one color to the
other very smooth.
Types of linear filters
Blurring can be achieved by many ways. The common type of filters that are used to
perform blurring are.
• Mean filter
• Weighted average filter
• Gaussian filter
Mean filter
Mean filter is also known as Box filter and average filter. A mean filter has the
following properties.
Since it is a 3x3 mask, that means it has 9 cells. The condition that all the element
sum should be equal to 1 can be achieved by dividing each value by 9. As
1/9 + 1/9 + 1/9 + 1/9 + 1/9 + 1/9 + 1/9 + 1/9 + 1/9 = 9/9 = 1
The blurring can be increased by increasing the size of the mask. The more is the
size of the mask, the more is the blurring. Because with greater mask, greater number
of pixels are catered and one smooth transition is defined.
The two properties are satisfied which are (1 and 3). But the property 2 is not satisfied. So in
order to satisfy that we will simple divide the whole filter by 16, or multiply it with 1/16.
Smoothing with larger standard deviations suppresses noise, but also blurs the image
Sharpening
Sharpening is opposite to the blurring. In blurring, we reduce the edge content and in
Sharpening, we increase the edge content. So in order to increase the edge content
in an image, we have to find edges first.
Types of edges
• Horizontal edges
• Vertical Edges
• Diagonal Edges
• Prewitt Operator
• Sobel Operator
Prewitt operator is used for detecting edges horizontally and vertically.
The sobel operator is very similar to Prewitt operator. It is also a derivate mask and
is used for edge detection. It also calculates edges in both horizontal and vertical
direction.
The Gradient-Sobel filter uses two 3 by 3 kernels to detect gradients in the horizontal and
vertical directions.
The first mask is for horizontal edges, and the second is for Vertical edges
• Coordinate Systems
Coordinates are an ordered set of values which specify a location relative to
some origin. There are a variety of coordinate systems. Some examples
follow:
2D vs. 3D
A 2D coordinate system is used to specify all locations in 2D space.
A 3D coordinate system is used to specify all locations in 3D space.
Cartesian (rectangular) coordinate systems
In a 2D Cartesian coordinate system each location is specified by an ordered set of
two distances, an x-coordinate and a y-coordinate, represented as (x, y). The two
coordinates are "ordered" because their order matters. The x-coordinate of a point
location comes first; the y-coordinate comes next. For instance, (2, 3) and (3, 2)
specify two different locations.
A Euclidean plane with a chosen Cartesian coordinate system is called a Cartesian plane. In a
Cartesian plane one can define canonical representatives of certain geometric figures, such as
the unit circle (with radius equal to the length unit, and center at the origin), the unit
square (whose diagonal has endpoints at (0, 0) and (1, 1)), the unit hyperbola, and so on.
The two axes divide the plane into four right angles, called quadrants. The quadrants may be
named or numbered in various ways, but the quadrant where all coordinates are positive is
usually called the first quadrant.
If the coordinates of a point are (x, y), then its distances from the X-axis and from the Y-axis are
|y| and |x|, respectively; where |...| denotes the absolute value of a number.
A Cartesian coordinate system for a three-dimensional space consists of
an ordered triplet of lines (the axes) that go through a common point
(the origin), and are pair-wise perpendicular; an orientation for each axis;
and a single unit of length for all three axes. As in the two-dimensional
case, each axis becomes a number line. For any point P of space, one
considers a hyperplane through P perpendicular to each coordinate axis,
and interprets the point where that hyperplane cuts the axis as a
number. The Cartesian coordinates of P are those three numbers, in the
chosen order.