You are on page 1of 20

Thresolding

Thresholding is a type of image segmentation, where we change the pixels of an


image to make the image easier to analyze. In thresholding, we convert an image
from color or grayscale into a binary image, i.e., one that is simply black and
white. Most frequently, we use thresholding as a way to select areas of interest
of an image,

The process works like this. First, we will load the original image, convert it to
grayscale, Then, we will use the > operator to apply the threshold t, a number in
the closed range [0.0, 1.0]. Pixels with color values on one side of t will be turned
“on,” while pixels with color values on the other side will be turned “off.” In
order to use this function, we have to determine a good value for t. How might
we do that? Well, one way is to look at a grayscale histogram of the image. Here
is the histogram.
Since the image has a white background, most of the pixels in the image are white.
This corresponds nicely to what we see in the histogram: there is a spike near the
value of 1.0. If we want to select the shapes and not the background, we want to
turn off the white background pixels, while leaving the pixels for the shapes
turned on. So, we should choose a value of t somewhere before the large peak
and turn pixels above that value “off”.

Set t=0.8
We can now apply the mask to the original colored image. What we are left with
is only the colored shapes from the original, as shown in this image:

Image Thresholding
The simplest method of segmenting a gray-scale image is to threshold it. By
selecting a certain value, and setting all pixel values below it to one (white), and
all pixel values above it to zero (black), we get a binary image. The best threshold
value changes from an image to another. It is also dependent of location, because
some parts of the text are darker or lighter than others. So, the larger the image,
the harder it is to choose a globally acceptable threshold value. A good estimate
can easily be found by examining histograms (or single pixel values) of areas
inside characters and choosing an upper boundary appropriately. The best value
can be quickly found manually, because the process is very quick.
Fourier transform
Fourier transform is used to convert a signal into frequency domain.
The Fourier transform simply states that that the non periodic signals whose area
under the curve is finite can also be represented into integrals of the sines and
cosines after being multiplied by a certain weight.
The Fourier transform has many wide applications that include, image compression
(e.g JPEG compression), filtering and image analysis.

Discrete fourier transform

Since we are dealing with images, and in fact digital images, so for digital images we
will be working on discrete fourier transform

Consider the above Fourier term of a sinusoid. It include three things.

• Spatial Frequency
• Magnitude
• Phase
The spatial frequency directly relates with the brightness of the image.
The magnitude of the sinusoid directly relates with the contrast. Contrast is the
difference between maximum and minimum pixel intensity.
Phase contains the color information.
The formula for 2 dimensional discrete Fourier transform is given below.

The discrete Fourier transform is actually the sampled Fourier transform, so it


contains some samples that denotes an image. In the above formula f(x,y) denotes
the image, and F(u,v) denotes the discrete Fourier transform. The formula for 2
dimensional inverse discrete Fourier transform is given below.
The inverse discrete Fourier transform converts the Fourier transform back to the
image

The same image in the frequency domain can be represented as.

1.1 Optics of a Pinhole Camera Image Formation


Figure 1.1 shows a pinhole camera and optics of image formation. The image is formed on
image plane of the camera by light rays emerging from the scene facing the box. The
perspective projection creates inverted image on the image plane. It is sometimes convenient
to consider a virtual image associated with the plane lying in front of the pinhole camera at the
same distance from it as the actual image plane as shown in figure 1.1. This virtual image is
not inverted, but is equivalent to the actual image. Depending on the context and computational
convenience, real or virtual image may be used.
Pinhole Object
Virtual
Real Image
Image

Figure 1.1: The pinhole imaging model

1.1.1.1 Perspective Projection


Consider a coordinate system (o, i, j, k) attached to a pinhole camera, whose origin coincides
with the pinhole, and vectors i and j form a basis for vector plane parallel to the image plane
П, which is located at a positive distance f ′ from the pinhole along the vector k as shown in
figure 1.2.

j
П P
f’

C’ k
O
P’
i

Figure 1.2: Diagram for perspective projection equations

The line perpendicular to П and passing through the pinhole is called the optical axis
and the point C′ where it intersects П is called the image center. This point can be used as the
origin of the image coordinate system and plays an important role in camera calibration and
image processing. Let P denote a scene point with coordinates (x, y, z) and P′ denote its image
with coordinates (x′, y′, z′). Since P′ lies in the image plane, therefore z′=f′. Also the three points
P, O and P′ are collinear, therefore
⃗⃗⃗⃗⃗⃗⃗ = 𝜆 ⃗⃗⃗⃗⃗⃗
𝑂𝑃′ 𝑂𝑃 for some constant λ, so
𝑥 ′ = 𝜆𝑥 𝑥′ 𝑦′ 𝑧′

{𝑦 = 𝜆𝑦 ↔ 𝜆 = = = (1.1)
𝑥 𝑦 𝑧
𝑧 ′ = 𝜆𝑧
Since z’= f’, the position of point P′ in the image plane is given by

𝑥
𝑥 ′ = 𝑓′
{ 𝑧 (1.2)
𝑦
𝑦 ′ = 𝑓′
𝑧

1.1.1.2 Digital Image (Anatomy)


Modern digital cameras use charged coupled device (CCD) for capturing the amount and
colour of light falling on the image plane. A CCD sensor as shown in figure 1.3 uses a
rectangular grid of electron collection sites laid over a thin silicon wafer to record a measure
of the amount of light energy reaching them. Each site is formed by growing a layer of silicon
oxide on the wafer and then depositing a conductive gate structure over the oxide. When
photons strike the silicon, electron hole pairs are generated and the electrons are captured by
the potential well formed by applying a positive electrical potential to the corresponding gate.
The electrons generated at each site are collected over a fixed period of time. The amount
of electrons collected at each site is quantized to an integer value and represent intensity
value of a pixel.
An image is a two dimensional array of pixels. The color CCD cameras essentially use the
same concept except that successive rows or columns of sensors are made sensitive to red,
green or blue light, often using a filter coating that blocks complementary light. The individual
color channels are either digitized separately to generate RGB output or combined into a
composite video signal or into a component video format separating color and brightness
information as per the requirement of down-line applications.

What is mask?
Mask is also a signal. It can be represented by a two dimensional matrix. The mask
is usually of the order of 1x1, 3x3, 5x5, 7x7 . A mask should always be in odd number,
because other wise you cannot find the mid of the mask.
A mask is a filter. Concept of masking is also known as spatial filtering. Masking is
also known as filtering.
-1 0 1
-1 0 1
-1 0 1
Example of a mask
Why do we need to find the mid of the mask. The answer lies below, in topic of, how
to perform convolution?

How to perform convolution?


In order to perform convolution on an image, following steps should be taken.

• Slide the mask onto the image.


• Multiply the corresponding elements and then add them
• Repeat this procedure until all values of the image has been calculated.
The general process of filtering and applying masks is consists of moving the filter
mask from point to point in an image. At each point (x,y) of the original image, the
response of a filter is calculated by a pre defined relationship. All the filters values are
pre defined and are a standard.

What is filtering
The process of filtering is also known as convolving a mask with an image. As
this process is same of convolution so filter masks are also known as convolution
masks.

Types of filters

Generally there are two types of filters. One is called as linear filters or smoothing
filters and others are called as frequency domain filters.

Why filters are used?

Filters are applied on image for multiple purposes. The two most common uses are
as following:

• Filters are used for Blurring,smoothing and noise reduction


• Filters are used for edge detection and sharpness

Blurring
In blurring, we simple blur an image. An image looks more sharp or more detailed if
we are able to perceive all the objects and their shapes correctly in it. For example.
An image with a face, looks clear when we are able to identify eyes, ears, nose, lips,
forehead etc. very clear. This shape of an object is due to its edges. So in blurring,
we simple reduce the edge content and makes the transition form one color to the
other very smooth.
Types of linear filters
Blurring can be achieved by many ways. The common type of filters that are used to
perform blurring are.

• Mean filter
• Weighted average filter
• Gaussian filter

Mean filter

Mean filter is also known as Box filter and average filter. A mean filter has the
following properties.

• It must be odd ordered


• The sum of all the elements should be 1
• All the elements should be same
If we follow this rule, then for a mask of 3x3. We get the following result.
1/9 1/9 1/9
1/9 1/9 1/9
1/9 1/9 1/9

Since it is a 3x3 mask, that means it has 9 cells. The condition that all the element
sum should be equal to 1 can be achieved by dividing each value by 9. As
1/9 + 1/9 + 1/9 + 1/9 + 1/9 + 1/9 + 1/9 + 1/9 + 1/9 = 9/9 = 1
The blurring can be increased by increasing the size of the mask. The more is the
size of the mask, the more is the blurring. Because with greater mask, greater number
of pixels are catered and one smooth transition is defined.

The result of a mask of 5x5 on an image is shown below



Weighted average filter or weighted filter or Guassian
filter
In weighted average filter, we gave more weight to the center value. Due to which the
contribution of center becomes more then the rest of the values. Due to weighted
average filtering, we can actually control the blurring.
Properties of the weighted average filter are.

• It must be odd ordered


• The sum of all the elements should be 1
• The weight of center element should be more then all of the other elements

The two properties are satisfied which are (1 and 3). But the property 2 is not satisfied. So in
order to satisfy that we will simple divide the whole filter by 16, or multiply it with 1/16.

Smoothing with larger standard deviations suppresses noise, but also blurs the image
Sharpening

Sharpening is opposite to the blurring. In blurring, we reduce the edge content and in
Sharpening, we increase the edge content. So in order to increase the edge content
in an image, we have to find edges first.

This is one way of sharpening an image.


----------------------------------------------------------------------------------------------------------
What are edges
Edge detection includes a variety of mathematical methods that aim at identifying points in
a digital image at which the image brightness changes sharply or, more formally, has
discontinuities. The points at which image brightness changes sharply are typically organized into
a set of curved line segments termed edges.
We can also say that sudden changes of discontinuities in an image are called as
edges. Significant transitions in an image are called as edges.

Types of edges

Generally edges are of three types:

• Horizontal edges
• Vertical Edges
• Diagonal Edges

Why detect edges

Most of the shape information of an image is enclosed in edges. So first we detect


these edges in an image and by using these filters and then by enhancing those areas
of image which contains edges, sharpness of the image will increase and image will
become clearer.
some of the masks for edge detection that we will discuss.

• Prewitt Operator
• Sobel Operator
Prewitt operator is used for detecting edges horizontally and vertically.
The sobel operator is very similar to Prewitt operator. It is also a derivate mask and
is used for edge detection. It also calculates edges in both horizontal and vertical
direction.

The Gradient-Sobel filter uses two 3 by 3 kernels to detect gradients in the horizontal and
vertical directions.

The first mask is for horizontal edges, and the second is for Vertical edges

• Morphological image processing


Morphological image processing is a collection of non-linear operations related
to the shape or morphology of features in an image.

• Coordinate Systems
Coordinates are an ordered set of values which specify a location relative to
some origin. There are a variety of coordinate systems. Some examples
follow:

2D vs. 3D
A 2D coordinate system is used to specify all locations in 2D space.
A 3D coordinate system is used to specify all locations in 3D space.
Cartesian (rectangular) coordinate systems
In a 2D Cartesian coordinate system each location is specified by an ordered set of
two distances, an x-coordinate and a y-coordinate, represented as (x, y). The two
coordinates are "ordered" because their order matters. The x-coordinate of a point
location comes first; the y-coordinate comes next. For instance, (2, 3) and (3, 2)
specify two different locations.

In a 3D Cartesian coordinate system each location is specified by an ordered set of


three coordinates, an x-coordinate, a y-coordinate, and a z-coordinate, (x, y, z).

Two dimensions [edit]


Further information: Two-dimensional space

A Cartesian coordinate system in two dimensions (also called a rectangular coordinate


system or an orthogonal coordinate system[6]) is defined by an ordered
pair of perpendicular lines (axes), a single unit of length for both axes, and an orientation for
each axis. The point where the axes meet is taken as the origin for both, thus turning each axis
into a number line.
In mathematics, physics, and engineering, the first axis is usually defined or depicted as
horizontal and oriented to the right, and the second axis is vertical and oriented upwards.
(However, in some computer graphics contexts, the ordinate axis may be oriented downwards.)
The origin is often labeled O, and the two coordinates are often denoted by the letters X and Y,
or x and y. The axes may then be referred to as the X-axis and Y-axis. The choices of letters
come from the original convention, which is to use the latter part of the alphabet to indicate
unknown values. The first part of the alphabet was used to designate known values.

A Euclidean plane with a chosen Cartesian coordinate system is called a Cartesian plane. In a
Cartesian plane one can define canonical representatives of certain geometric figures, such as
the unit circle (with radius equal to the length unit, and center at the origin), the unit
square (whose diagonal has endpoints at (0, 0) and (1, 1)), the unit hyperbola, and so on.

The two axes divide the plane into four right angles, called quadrants. The quadrants may be
named or numbered in various ways, but the quadrant where all coordinates are positive is
usually called the first quadrant.

If the coordinates of a point are (x, y), then its distances from the X-axis and from the Y-axis are
|y| and |x|, respectively; where |...| denotes the absolute value of a number.
A Cartesian coordinate system for a three-dimensional space consists of
an ordered triplet of lines (the axes) that go through a common point
(the origin), and are pair-wise perpendicular; an orientation for each axis;
and a single unit of length for all three axes. As in the two-dimensional
case, each axis becomes a number line. For any point P of space, one
considers a hyperplane through P perpendicular to each coordinate axis,
and interprets the point where that hyperplane cuts the axis as a
number. The Cartesian coordinates of P are those three numbers, in the
chosen order.

You might also like