Professional Documents
Culture Documents
THEORITICAL REVIEW
5
Digital image processing is a discipline that studies the techniques of
image processing. Digital image is image processing or images that are done
digitally using a computer.
Real or complex values that are represented by certain rows of bits in a
digital image are stored in an array. In a computer, digital images are mapped
in the form of grids and pixel elements in the form of 2-dimensional matrices
contain numbers that represent color channels and are stored sequentially by
the computer (Pulung Nurtantio, 2017).
Digital image can be represented by a matrix consisting of M columns
and N rows, intersection of columns and rows called pixels which are the
smallest elements of an image. Pixels have two coordinate parameters (x, y),
namely coordinates and color or intensity, the values contained in coordinates
(x, y) are f (x, y), which is the intensity of the pixels at that point (Figure 2.1).
f ( 0,0 ) f ( 0,1 ) … f ( 0 , M −1 )
f ( 1,0 ) f ( 1,1 ) … f (1 , M −1 )
.. . .
f(x,y) = ⌈ ⌉ (2.1)
.. . .
.. . .
f ( N−1,0 ) f ( N −1,1 ) … f ( 0 , M −1 )
6
2.3 Artificial Intelligence
Artificial Intelligence or Artificial Intelligence is part of the science
used to solve problems with approaches such as the analogy of human
thinking that is applied to the use of machines. Besides being associated with
computer science, AI is also related to various disciplines in the fields of
science such as mathematics, psychology, biology, philosophy, and other
fields. Artificial intelligence learns how to make machines (computers) able to
do human work consistently well or better than human.
With its association with a wide range of scientific fields, AI can
continue to grow and ultimately be useful to produce solutions to complex
problems that have not been solved in certain fields of science.
Artificial Intelligence (AI) is one of the disciplines that has a broad
scope, several major scopes in AI include Expert Systems (Expert Systems),
Natural Language Proccessing (NLP), Speech Recognition, Computer Vision,
and others. Expert system is a science that aims to transfer the expertise of an
expert on a computer, then used by others (who are not experts). Natural
Language Manager was developed so that users can communicate with a
computer device in everyday language. Speech recognition is applied to
enable humans to communicate with computers using voice. Computer vision
is used to interpret objects or images appearing on a computer so that the
computer can understand the characteristics of the object or image.
Artificial Intelligence (AI) is made on the basis of a system that has
human expertise in a particular domain, namely soft computing. Soft
computing is a new science in building intelligent systems that are able to
work better and are easily adaptable. The type of methodology commonly
used in soft computing is Neural Networks or commonly called Artificial
Neural Networks. Some other methods used are Fuzzy System, Probabilistic
Reasoning, and Evolutionary Computing (Pannu, 2015).
7
2.3.1 Machine Learning
Machine learning merupakan metode yang digunakan untuk
membuat program yang dapat belajar dari data. Pembelajaran dilakukan
secara dinamis dan mandiri, berbeda dengan komputer biasa yang tidak
bisa mendapatkan pengetahuan lewat pembelajaran dari data.
Machine learning method is done by recognizing and forming
patterns from examples of data or objects analyzed, to determine the
answer to a question independently through the patterns that have been
learned. Some program implementations made with machine learning
methods that are the solution to a problem include face detection, medical
diagnostics, price predictions, digit recognition, stock trading, etc. (Judith
Hurwitz, 2018).
Neural network is one of the categories of soft computing science
commonly applied in making machine learning algorithms. Neural
networks have a way of working resembling a human brain that can
stimulate stimuli, carry out processes and issue outputs that vary based on
stimulants and during the learning process (Judith Hurwitz, 2018).
2.3.2 Machine Learning Method
Based on the methods, the machine learning method can be
classified into 3 models, which are supervised learning, unsupervised
learning, and reinforcement learning.
2.3.2.1 Supervised Learning
Supervised learning is a learning method where the expected
output is known beforehand. The availability of data is very important
when you want to do supervised learning. In this method, each known
input pattern output is given to one neuron in the input layer, propagated
along the neural network until the neurons in the output layer then
generate an output pattern that is matched with the target output pattern.
If there is a difference in the results between the results of the
learning output pattern and the target output pattern, an error occurs, if the
value is large enough, further learning is needed. Examples of artificial
8
neural network algorithms that use supervised learning methods are
adaline, backpropagation, boltzman, hapfield, and perceptron (Judith
Hurwitz, 2018).
2.3.2.2 Unsupervised Learning
Unsupervised learning is learning method that does not require
target output. The expected results of learning cannot be determined. The
purpose of this method is to be able to group similar units in a certain area.
Examples of artificial neural network algorithms that use the method of
unsupervised learning are Learning Vector Quantization (LVQ) and
neocognitron (Judith Hurwitz, 2018).
2.3.2.3 Reinforcement Learning
Reinforcement learning is a method that relies on feedback or
mixed learning phases. Learners actively interact with the outside
environment to get information and get replies for every action taken by
the learner. The application of reinforcement learning can be found in the
making of AI in strategy games such as chess with AlphaGo which has
overtaken human expertise in playing chess.
2.3.3 Deep Learning
Deep learning is one area of machine learning for implementing
problems that have a large number of datasets by utilizing artificial neural
networks. By adding more layers to the model, image learning can be
labeled better, so deep learning techniques provide a strong architecture
for Supervised Learning.
The application of deep learning algorithms in the growing field of
machine learning makes computers able to learn with accuracy, speed, and
large-scale data. The research community and industry have implemented
deep learning algorithms as solutions to problems that require large data
processing such as computer vision, natural language processing, and
speech recognition.
Feature engineering is a technique and one of the main features of
deep learning that is used to achieve results when predictions are made by
9
extracting useful patterns from the data so that the model can easily
distinguish classes. However, with the collection and types of data in
different conditions, feature engineering becomes difficult to learn and
master because different technical approaches are needed.
CNN (Convolutional Neural Network) algorithm is commonly
used in feature engineering because it is well known for finding good
features in the image to the next layer to form a non-liner hypothesis that
can increase the complexity of a model (Danukusumo, 2017).
Artificial Neural Network (ANN) is an adaptive system that can
change its structure to solve a problem based on internal or external
information, its flexibility to input data can produce consistent response
output.
ANN is a parallel computing model that mimics the function and
workings of the nervous system of the human brain that has billions of
neurons and is related (synapses) (Yu-Chen Hu, 2018).
The neuron component consists of a cell nucleus that is responsible
for processing information, one axon (axon) which is a medium for
information propagation and at least one dendrite which is the entry and
exit of information. The structure of neurons can be seen in Figure 2.2.
10
2.3.4 Neural Network Component
Components that are owned by various types of neural networks
are the same. Neural networks consist of several units of neurons that are
interconnected and each of them can carry out the transformation of
information received through their connection to other neurons. The
relationship owned by the unti is usually known as weight. The
information processed is stored at a certain value with a certain weight.
The structure of neurons in a neural network can be seen in Figure 2.3.
11
2.3.5 Neural Network Architectures
In neural networks, there are important factors that are used in
determining the nature of a neuron, namely the weight (weight) and the
use of the activation function.
Architecture that can be created using ANN varies, can consist of
one neuron, or become multiple neurons in one layer, to form a network of
multiple neurons in multiple layers. With a variety of architectures, the
ability is also different. The more complex a network, the more problem
that can be solved becomes more complex and complex.
Based on the number of layers, neural network architecture can be
created in:
1. Single Layer Neural Network: A network with a single layer
consists of 1 input layer where each neuron is always
connected to each neuron contained in 1 output layer. Incoming
input is processed into output without going through hidden
layers.
12
2. Multiple Layers Neural Network: Multi-layered network that
has 3 types of layers, namely the input layer, the output layer,
and the hidden layer. Networks with this form can solve more
complex problems compared to traditional networks, but
requires a training process that tends to take longer.
13
Convolutional Neural Network (CNN) is a type of deep neural network
that is a development of multi-layer perceptron (MLP). High network depth
makes CNN widely applied to image data. Image classification can be done
with MLP, but the MLP method is not suitable for use because it does not
store spatial information from image data and assumes that each pixel is an
independent feature so the results obtained are not good.
Technically, CNN is an architecture that can be trained in several stages.
Input (output) and output (output) of each stage consists of several arrays
called a feature map. Each consists of three layers, namely convolution, layer
activation function and layer pooling. Following is the Convolutional Neural
Network architecture:
Based on Figure 2.7, the first stage of CNN architecture is the convolution
stage. This stage is done by using a kernel with a certain size depending on the
number of features produced.
After that, proceed to the activation function menu, the type of function
that is commonly used is the activation function ReLU (Rectifier Linear Unit),
after completion of the activation function process, then proceed to the
pooling process. This series of processes is repeated several times until a
feature map is generated sufficient to proceed to the fully connected layer,
after which it is forwarded to the output class.
2.4.1 Activation Function
Activation function is a function that states the relationship
between the level of internal activation (summation function) which can be
14
linear or non-linear. The purpose of this function is to determine whether
neurons are activated or not. Some activation functions that are often used
in neural networks are as follows (Sena, 2017):
15
Figure 2.9 Linear Function
16
Figure 2.11 ReLU Function
17
Figure 2.12 Backpropagation Algorithm
1. Forward Pass
This stage calculates the input layer stage forward to the output
layer stage using the predetermined activation function.
2. Backward Pass
This stage is done when a difference is found between the
network output with the desired target which is an error. The
error is propagated backwards, starting from the line that is
directly connected to each unit in the output layer.
3. Modification
This stage modifies the weights to reduce the level of errors
that occur.
2.4.3 Convolutional Layer
Convolutional Layer is part of the stage in making CNN
architecture. At the convolution layer the process of operating convolution
output occurs from the previous layer. Convolution is a mathematical term
that is the application of a function to the output of another function
repeatedly.
Convolution operation is an operation on two real-valued argument
functions by applying the output function as a feature map or feature map
of image input. Input and output are seen as two real-value arguments.
Convolution operations can be written with the following formula:
18
∞
(2.6)
s ( t ) =( x∗t )( t )=∑ ¿−∞
X ¿¿
¿
α
Information:
s(t) = The Function of the convolution operation results
x = Input
W = Weight (kernel)
The s (t) function provides a single output in the form of a fearure
map. The first argument is input x and the second argument is w as the
kernel or filter. If the input used is a two-dimensional image, it can be said
t as pixels and expresses it in the form i and jj.
Convolution operation calculations based on basic formulas are
cumulative and appear when K is the kernel, then I is the input and the
kernel can be seen relative to the input. An alternative to convolution
operations is matrix multiplication between the input image and the kernel
whose output is calculated by dot product.
The use of hyperparameters can be done to determine the output
volume of each layer. Hyperparameters used in the equation below
calculate the number of activation neurons in one output. The equation is
as follows:
19
Convolutional layer consists of neurons arranged to form a filter
with length and height (pixels). For example, convolution is the first layer
in feature extraction. There are layers with a length of 5 pixels, height 5
pixels, and thickness / number of 3 pieces (5x5x3) according to the
channel of the image. The filter is shifted to all parts of the image. Each
shift is carried out a "dot" operation between the input and the filter value
so that it produces output called a feature map. Illustration can be seen in
Figure 2.13.
20
dimensions of the output volume on the feature map, so that the number of
parameters and calculations in the network decreases and controls
overfitting. In general, the pooling layer used is a 2x2 filter. An example
of a max pooling operation can be illustrated with the following image:
21
character of the data (I Wayan Suartika, 2016). The gollowing illustration
of fully connected layer can be seen in Figure 2.15:
22
Adaptive moments (Adam) were created by
Diederik P. Kingma and Jimmu Lei Ba in 2014. This
algorithm is adaptive and can be seen from the variant
combination between RSMProp by using momentum to
change gradients as a good algorithm from the context of an
algorithm. Then, Adam adds a bias to the estimate of the
first momentum and the next momentum and calculates the
next point from the starting point
Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is often used for
machine learning and deep learning. The most important
part of this algorithm is the learning rate. The gradient
produced by SGD contains noise that does not disappear
even though it has reached the minimum point so that a
reduction in learning rate is needed every time.
2.4.8 Cross Entropy Loss Function
Loss function is a function that describes the loss or loss associated
with all the possibilities generated by a model. Loss functions work when
the learning model provides important errors and must be considered. A
good loss function is a function that produces the lowest expected error.
23
Figure 2.16 Dropout Regularization
Based on Figure 2.16 in part (a) is a normal neural network that has
two hidden layers, part (b) is a neural network that has several activation
neurons that are not used anymore after using a dropout. The
implementation of this technique on the CNN model affects the
performance of the model in training and reduces overfitting.
In ordinary artificial neural networks, for the calculation of the
feedforward process, for example yl is the output value of layer l and zl is
the input value at layer l where Wl and bl are the weight and bias of layer l,
with unit i the activation function f can be done with the formula (2.7) (I.
Jindal, 2016):
l +1
(2.8)
∑ ¿W l+1
i y l+ b(il +1)
i
l+ 1
y i =f ¿
1 l
y r j∗y
1
(2.9)
24
l+1 l+1 1 (l +1)
z i =W i y + bi
l+ 1
y i =f ¿
The fj notation is the result of the function for each jth element in
the class output vector, the z argument is a hypothesis given by the training
model so that it can be classified by the softmax function. Softmax also
gives more intuitive results and has a better probabilistic interpretation
than other classification algorithms because probability calculations are
performed on all labels. After that, a vector with a value between zero and
one is obtained, and if it is added to the value of one resulting from the
change in the vector of the real value taken from the label (Z. Dong, 2015)
.
2.4.11 Confusion Matrix
Confusion matrix is used as values that become a benchmark when
calculating the performance appraisal of a classification model. The
following is an example of a confusion matrix illustrated with the image:
25
Figure 2.17 Confusion Matrix
TP+TN (2.11)
Accuracy=
P+ N
26
Accuracy calculations are used to measure the ability of a model to
classify and assign values based on all actual value states. Whereas the
precision calculation is done with the following formula:
TP (2.12)
Precision=
TP+ FP
TP (2.13)
Sensitivity=
TP+ FN
TN (2.14)
Specifity=
TN + FP
27
2.4.12 ROC AUC Score
Receiving Operating Characteristic (ROC) is a graphical model
used for visualization, and measurement of a classification model. ROC
graphics have long been used in signal detection theory to illustrate the
trade off between hit rate and false alarm (Swets et al., 2000). Then the
ROC analysis capability is expanded so that it can be applied in machine
learning that is used in evaluating the comparison algorithm. The ROC
graph can be illustrated with the following picture (Fawcett, 2015):
28
If the AUC Score is close to 1, then the ability of the model in
distinguishing features in each image in classifying is getting better
because it has an ideal sensitivity so that it can distinguish between the
characteristics of existing classes well. AUC values can be interpreted with
the following table (Shengping Yang, 2017):
2.5 Python
The Python programming language is one of the open source and free
programming languages in the world. The concept of Python also has a simple
concept design that emphasizes user convenience.
The ability of Python which has supported object-based and functional
programming and has code that is easy to read, learn, maintain, and reload
makes Python widely used as a solution for research in the fields of
computational science, robotics, economics and other fields.
Python also has libraries developed by third parties, thus allowing
users to use Python to solve problems with certain disciplines. Some existing
libraries are OpenCV for the application of computer vison and Pandas for
data analysis purposes.
29
2.5.1 Keras
Hard includes a high-level neural network API written in Python
and capable of running on TensorFlow, Theano, or CNTK. The multi-
platform, multi-platform Hardware capability supports fast and easy
processing in terms of modularity and extensibility and has a focus on the
development of deep learning science, making Keras widely used for
prototyping, convolutional networks, repetitive and repetitive networks,
and a combination of both. (Chollet, 2018).
2.5.2 TensorFlow
TensorFlow is a collection of open source software used for
numerical computing with data flow graphs. Its flexible architecture can
be applied to computing to one or several CPUs or GPUs on a server,
desktop, or mobile device using one API.
TensorFlow then continues to be developed as a library that
supports machine learning programming and other computing needs.
2.6 Kaggle
Kaggle is an online community of machine learning users that provides
a variety of cloud-based services. Kaggle was formed in 2010 by holding a
competition for users to solve a given problem.
The Kaggle service expanded to be able to publish datasets, providing
a kernel to run coding along with a text editor. In addition Kaggle also
provides free GPU quota facilities which are limited by a certain number of
hours and then reset once a week. Kaggle is also supported by a variety of
supporting libraries that are popularly used among them namely Hard,
TensorFlow, Numpy, PyTorch, etc.
With this service, for users who are limited in terms of devices do not
need to worry about developing knowledge in the field of machine learning.
30