You are on page 1of 11

DOCUMENTATION IMAGE

PROCESSING
DAY 1
1. Image is represented as a 2D array (grey) or as a 3D array​ (​coloured:BGR)
2. Each element of the array is called a pixel and takes a value from 0 to 255
(black to white)
3. To declare an image, we use the Mat class:
a. Mat img(rows,cols,CV_8UC1,Scalar(number)) //for grey
b. Mat img(rows,cols,CV_8UC3,Scalar(number,number,number)) //for
coloured

4. 8U stands for 8 unsigned which is the values of shades in the grayscale


(0-2^8-1). We can change the number 8 to vary the size of the scale.
5. The header files used in image processing are:
a. #include”opencv2/highgui/highgui.hpp”
b. #include”opencv2/imgproc/imgproc.hpp”
c. #include”opencv2/core/core.hpp”
d. This is followed by: using namespace cv;

6. The following functions are also used:


a. namedWindow(“name”,mode): Used to name the window in
which the image will open.
b. imshow: Used to show the image
c. waitKey(number): show the time for which the window “win” will
stay. 0 means it will stay till any key is pressed. Otherwise the
numbers are given in milliseconds.
d. at<uchar>(i,j) and at<Vec3b>(i,j)[number]: these function take
an integer/integers as value and are used to change any (ith,jth)
pixel in the (gray or coloured) image. The number in the second
function stands for any one between BGR= {0,1,2}.
e. img.rows and img.cols give the number or rows and columns
respectively in the image.
Sample Code:
https://drive.google.com/open?id=18FdjgnP_yMCf3r11tbdLkFlHIuuivcwS
CODES:
1. // Create bottle image:
https://drive.google.com/open?id=1EToQa1qSczpmtcS6ypASXx67Gf6w6Urz
2. // Create Chessboard:
https://drive.google.com/open?id=1adniitFIsg9EBqGB-KB1ofsbb7dAvXkO
3. //Mirror Image
https://drive.google.com/file/d/1NGP-1ESkOsnBQH977cACJ1mVkgZIMwxY/view?us
p=sharing

DAY 2
1. To read an image from the computer, me use the function imread().
Syntax: Mat img_name=imread(“path”,0/1);
Here path is the address of the image in the computer. And 0 is the image in
grayscale and 1 is the image in colour.
2. To save an image we use the function imwrite().
Syntax: imwrite (“path”, variable_of_the_image);
Here path is the address where the image will be saved. The second
argument is the object of the image.
3. Covering coloured image to grayscale:
a. Take average of the 3 colours (R,G,B) ie. (R+G+B)/3
b. (max(R,G,B)+min(R,G,B))/2
c. 0.21R+0.72G+0.07B
This method is scientifically studied and gives weightage according to
how our eyes perceive the respective colours.
Code:
https://drive.google.com/file/d/1V3RqZSPOhFYrZ78zl52dbshtLTN9TPNO/view?usp=
sharing

4. To convert from grayscale to binary: We use a trackbar for that. The trackbar sets
a threshold value above which the pixel will be white and below which it will be black.
To create trackbar we use the createTrackbar() function.
Syntax: createTrackbar
(“name_of_trackbar”,”window_name”,&var,max_value_of_variable, callback)
Here: var is the threshold value.
5. ​Callback function: it is a function in which we tell what the trackbar to do
Syntax: callback(int t, void* c)
Here t is the threshold value.
Code (Segmentation):

https://drive.google.com/file/d/1Ljjqa8YqnyUsOcX5oOo88SRdGnCYs8tM/view?usp=
sharing
6. Resizing images:
2x2 --- 4x4 --- 8x8
downscaling upscaling
To upscale, we copy one pixel in multiple pixels.
To downscale, we take average of the pixels and store in one pixel​.
Downscaling involves loss in data.​ To change an image into a ratio which is not a
rational number, we first upscale, and then downscale.
Codes:
1. Downscaling
https://drive.google.com/file/d/1-Lk9o8D74PLgI_eZy-U-yiI3vYE2Ejpi/view?usp
=sharing

2. //Upscaling
https://drive.google.com/file/d/1oG4iT_llPumSYfshK4zwRxLPdYVVugMp/view?usp=
sharing

7. ​Rotating an image:
We use the axes rotation formula:

X cosθ -sinθ x
= *
Y sinθ cosθ y

We usually rotate about a centre, (ic,jc).


The X coordinate is the j index of new image.
The Y coordinate is the -i index of new image.
Hence:
X=(j-jc)*sinθ - (i-ic)*cosθ
Y=-(j-jc)*cosθ - (i-ic)*sinθ
If we take centre as diag/2 where diag = √row 2 + cols 2
We can directly write as:
x new =(diag/2)-X
y new =(diag/2)-Y
We also increase the size of the image so accommodate the rotation. The
dimensions are √2 times the diag.
Code:
//​!!!WRONG
https://drive.google.com/file/d/1zsbUwoYG508ymFJqZsfoLmL5i9gXsKWg/view?usp
=sharing
8. Morphing: this is the process in which we create a trackbar. On either extremes of
the trackbar, we get 2 different images. And in the middle, we get weighted averages
of the mages.
Code:
https://drive.google.com/file/d/1q_4hZmvWLp_PorLf1Fgj9X0wu-0hh6lQ/view?usp=s
haring

DAY 3
Kernel:​ A kernel is a matrix (usually 3x3) which is used to perform operation on an
image.
Padding: ​It is adding an extra layer to an image so as the kernel does not overflow
outside the image.
Types:
a. 0 Padding: Adding a layer of black pixels at the end of the image as a
frame.

img** img*

img* img*

b. Reflection Padding:We reflect the second last layer of eacd side with
respect to the last layer to add a layer of padding

v12 v13 v14 v5

v1 v1 v1 v3 v4 v3

v13 v12 v13 v14 v5 v14

v16 v11 v16 v15 v6 v15

v9 v10 c v8 v7 v8

v11 v16 v15 v6

c. Wrap Around Padding:


v10 c v8 v7

v4 v1 v1 v3 v4 v1

v5 v12 v13 v14 v5 v12

v6 v11 v16 v15 v6 v11

v7 v10 c v8 v7 v10

v1 v1 v3 v4

*We can use the​ isValid​ function if we don't want to do padding as illustrated below.
Blurs:

a. Mean Blur: In this, we take a kernel, compute the average of all the pixels in
the kernel and store it in the centre pixel of the kernel.
Code (Mean Blur):
https://drive.google.com/file/d/1_egVXkIn_tqOWbKab19Cpv5BlyjcK-FA/view?
usp=sharing

b. ​Median Blur:​ This blur take the median of all the elements of the kernel except
the centre element and stores the value in the centre of the kernel.
Code:
https://drive.google.com/file/d/1V4hW462ZveAHptm334SVEfZARh34vw6c/view?usp
=sharing

c. Gaussian Blur: ​Like a Gaussian graph, this blur takes the account of weightages
of the distances of the cells of the kernel from the centre pixel of the kernel. It adds
the pixel values according to that weightage and stores the sum in the centre kernel,

1/16 1/8 1/16

1/8 1/4 1/8

1/16 1/8 1/16


Code:
https://drive.google.com/file/d/1rusOjPIqeE6ZsyOPeBSGTzcwL-19aPb0/view?usp=s
haring

Erosion​: ​We check a kernel. Even if one value in the kernel is black, the centre
kernel becomes black.
It is also defined as minima of the kernel
Erosion increases the black content of the image

Dilution​: We check a kernel. Even if one value in the kernel is white, the centre
kernel becomes white
It is also defined as maxima of the kernel.
Erosion increases the white content of the image

Erosion followed by Dilution is called​ Opening


Dilution followed by Erosion is called ​Closing

Code:
https://drive.google.com/file/d/1vBi8nQf6sAw1WYtZhHLDqb0DC3mSVr-t/view?usp=
sharing

Edge Detection:
a. Using Erosion and Dilution​: Perform dilution on an image and store it in
separate image (say img1). Perform erosion on the same image and store it
in separate image (say img2). Do img1-img2 to get the final image.
This works because dilution enlarges the white parts and erosion reduces
them. When we subtract, we will get the place where the edge is.
b. ​Using blur: ​Use mean/Gaussian blur on a binary image. The edges will become
grayish. Subtract the new and old images to get the edges.

c. ​Prewitt Filter:
A. We take 2 matrices to indicate changes in x and y direction.

-1 0 1

Gx=⅙ -1 0 1

-1 0 1

-1 -1 -1

Gy=⅙ 0 0 0

1 1 1

B. We multiply corresponding kernel and the Gx and Gy matrices elements.


For each obtained matrix, we add the elements in the matrices and store the
modulus of them in the variables gx and gy respectively. Then we find the
value √gx 2
+ gy 2 . We compare this value with a threshold. If it is greater
than the threshold, then we consider that pixel as an edge, else not.

Code:
https://drive.google.com/file/d/1xDJd7NUtfY9YhD5Ch18nIjW1kE_S_4AW/view?usp=
sharing

d. ​Sobel Filter:
It is the exact same thing as Prewitt Filter except the matrix is changed. It is
considered more effective than Prewitt Filter.

-1 0 1

Gx= ⅛ -2 0 2

-1 0 1
-1 -2 -1

Gy=⅛ 0 0 0

1 2 1

Code:
https://drive.google.com/file/d/1HDORugdR6agARvUQIOYli_XbyVRc-0ew/vie
w?usp=sharing

Dealing With Frames:


Sample Code:
https://drive.google.com/file/d/1L-KmznqF0wlTNU7nCB7s4JmhmHd38mak/vi
ew?usp=sharing
In place of DO SOMETHING, we can perform any image processing
operation. It will apply to every frame of the video.

DAY 4

Graphs:
It is a data structure in which we have nodes connected to different
nodes.

Edges can be directed or undirected and can have certain weights

Traversal of graph:
a. Depth First Search:
Code:
PS: ​Given a binary image, find a path from (0,0) to
(img.rows-1,img.cols-1)
https://drive.google.com/open?id=11DFFou8TMt5tkb2qxNWw1dh3LZ8h19rs

b. ​Breadth first Search:


Code:
https://drive.google.com/file/d/1LocC_vV4eUa8MuWGPUFA8S4T3LYGUJfg/v
iew?usp=sharing

CONTOURS:
Closed edges in an image are called contours
Contours are detected in an image using the findContours() function and drawn
using the drawContours() function.
*Read about it from documentation*

DIJKSTRA AND A* ALGORITHMS:


Used to find shortest path between 2 nodes in a graph.
*Read about it from GeeksForGeeks

HOUGH TRANSFORM:
Used it find accurate lines in an image after Canny. After Canny, we get edges, but
they are not often in a straight line. So to get those, we use Hough transform.
In this, we take each white (edge) pixel in the image and take all θ values from 0 ° to
180 ° . Corresponding to this, we find find the value of r using the formula:
xcosθ + ysinθ = r
Hence, we find the equation of every line passing through that point. For each of this
line, We conduct voting the the other white points. We for another plane of (r, θ )
where each point defines a line. This is called the Hough plane. If a line is voted by a
point, its intensity in the (r, θ ) plane is increased by a constant amount. After doing
this for all the points, we see that we get particular points in the Hough plane with
most intensity. These values of r and θ give us the corect lines.
CODE:
https://drive.google.com/file/d/1e7xfoRy7zv_aL36fF4MnGEpS3pjqIeTJ/view?usp=sh
aring

HISTOGRAM:
It is a bar graph plotted between frequency of a pixel value and pixel values (0-255).
This means that it gives an idea about the count of the pixel values in an image.
CODE:
https://drive.google.com/file/d/1im-dmJd9hS4xfNlnNVaQn3-A_M76lPXS/view?usp=s
haring
VIGNETTE FILTER:
https://drive.google.com/file/d/18PhYNhrICosp2N8Dy9HWPtcdSJqWFYlk/view?usp=
sharing

*​CORNER DETECTION HERE*

You might also like