You are on page 1of 29

Design of Intelligent Systems

CS- 715
MS (Computer Science)

Artificial Neural Network


BY: Bushra Saif
Introduction
• Human brain is a complex system made of billions of neurons
that opens up new mysteries with every discovery about it. And
the attempts to mimic the structure and function of the human
brain led to a new field of study called Deep Learning.
• Artificial Neural Networks also known as Neural Networks,
inspired from the neural networks of the human brain is a
component of Artificial Intelligence.
• The human brain is a network of billion densely connected
neurons, which is highly complex, nonlinear and has trillions
of synapses. A neural network mainly consists of dendrites,
axon, cell body, synapses, soma and nucleus. Dendrites are
responsible for receiving input from other neurons and axon is
responsible for transmission from one to the other
L1.2
Biological Neural Network

The molecular and biological machinery of neural networks is based on


electrochemical signaling. Neurons fire electrical impulses only when certain
conditions are met. Some of the neural structure of the brain is present at birth,
while other parts are developed through learning, especially in early stages of life
to adapt to the environment (new inputs) L1.3
Artificial Neural Network
• As a result, we can say that ANNs are composed of multiple
nodes. That imitate biological neurons of the human brain.
Although, we connect these neurons by links. Also, they interact
with each other.

• The term neural network is composed of two individual words

Neural Network

What are the neurons, How they connected to make


a network
it’s just a node that hold
a value between 0 and 1
L1.4
How ANN Works
• For example, we have a image 28*28 pixels that is having
number 9. These 784 pixels on screens are just neurons, each
neurons are having some value ranging from 0 to 1. This number
inside a neuron is called it’s “activation function”.

L1.5
How ANN Works
• So all of these 784 neurons in our image make 1 st layer of our
network. The last layer in this case will have 10 neurons that will
map output on them. These 10 neurons (0-9) also have some
activation value, means how network sure that corresponding
image is this digit.

L1.6
How ANN Works
• The way of network operates activation in one layer is
determined the activities of next layer.
• Means according to the brightness of pixel in feeding image, that
pattern of activation cause some very specific pattern to the next
layer, which pass final values to output layer. So the network
choose brightest neuron of that output layer.
• This is helpful to detect edges and patterns in the image.

L1.7
ANN Layers
• Artificial Neural Networks are made up of layers and layers of
connected input units and output units called neurons.
• A single layer neural network is called a perceptron. Multiple
hidden layers may also be present in an artificial neural network.
The input units(receptor), connection weights, summing
function, computation and output units (effectors) are what
makes up an artificial neuron.
1. Input layer
• Number of neuron is equal to number of pixels in referred
image, these pixels value is used for activation only.
2. Convolutional / Hidden layer
• Researchers actual don’t know what these layers do inside. But
we assume in digit recognition NN hidden layers do edge
detection by using pixel values to narrow down our problem.
L1.8
ANN Layers
• And next layers find patterns from these edges for identification.
2. Output layer
• Used to show our final results. The number of neurons in this
layer depends upon classes in our problem.

L1.9
Network parameters
• This can be considered as weighted directed graphs where the
neurons could be compared to the nodes
• The connection between two neurons as weighted edges. The
weight can be positive and negative also.
W1a1+w2a2+w3a3+….wnan
• The activation should be between 0/1, a common function that
does this is sigmoid function also called logistic curve.
Sigmoid function (W1a1+w2a2+w3a3+….wnan
• The processing element of a neuron receives many signals (both
from other neurons and as input signals from the external world).
• Signals are sometimes modified at the receiving synapse and the
weighted inputs are summed at the processing element. If it
crosses the threshold, it goes as input to other neurons (or as
output to the external world) and the process repeats. L1.10
Network parameters

In general terms, these weights typically represent the strength


of the interconnection amongst neurons inside the artificial
neural network.
L1.11
Network parameters
If the weighted sum equates to zero, a bias is added to make
the output non-zero or else to scale up to the system’s
response. Bias has the weight and the input to it is always
equal to 1. Here the sum of weighted inputs can be in the
range of 0 to positive infinity. To keep the response in the
limits of the desired value, a certain threshold value is
benchmarked. And then the sum of weighted inputs is passed
through the activation function.

L1.12
Example1

Churn Prediction is essentially predicting which clients are most likely to cancel


a subscription i.e. 'leave a company' based on their usage of the service. 
L1.13
-Neural network for example1

L1.14
-Basics of ANN we need
• The equation for the neural network is a linear combination
of the independent variables and their respective weights
and bias (or the intercept) term for each neuron. The neural
network equation looks like this:

Z = Bias + W1X1 + W2X2 + …+ WnXn


where,
• Z is the symbol for denotation of the above graphical
representation of ANN.
• Wi , is the weights or the beta coefficients
• Xi , is the independent variables or the inputs, and
• Bias or intercept = W0

L1.15
- No. of layers and nodes in Example1
• It has the following neural network with an architecture of
[4, 5, 3, 2] and is depicted below:
1. 4 independent variables or the Xs in the input layer, L1
2. 5 neurons in the first hidden layer, L2
3. 3 neurons in the second hidden layer, L3, and
4. 2 in the output layer L4 with two nodes, Q1 and Q2

• For our purpose here, we will refer to the neurons in Hidden


Layer L2 as N1, N2, N3, N4, N5 and N6, N7, N8 in the Hidden
Layer L3, respectively in the linear order of their
occurrence.
• In a classification problem, the output layer can either have
one node or can have nodes equivalent to the number of the
classes or the categories. L1.16
-How to form basic equation
• The nodes in the hidden layer L2 are dependent on the X’s
(independent variables or the inputs) present in the input layer
therefore, the equation will be the following:
N1 = W11*X1 + W12*X2 + W13*X3 + W14*X4 + W10
N2 = W21*X1+ W22*X2 + W23*X3 + W24*X4+ W20
N3 = W31*X1+ W32*X2 + W33*X3 + W34*X4 + W30
N4 = W41*X1+ W42*X2 + W43*X3 + W44*X4 + W40
N5 = W51*X1+ W52*X2 + W53*X3 + W54*X4 + W50
• Similarly, the nodes in the hidden layer L3 are derived from the
neurons in the previous hidden layer L2, hence their respective
equations will be:
N6 = W61*N1+ W62*N2 + W63*N3 + W64*N4 + W65*N5 +W60
N7 = W71*N1+ W72*N2 + W73*N3 + W74*N4 + W75*N5 +W70
N8 = W81*N1+ W82*N2 + W83*N3 + W84*N4 + W85*N5 +W80 L1.17
-No. of Weight in network
• The output layer nodes are coming from the hidden layer L3
which makes the equations as:

Q1 = WO11 * N6 + WO12 * N7 + WO13 * N8 + WO10


Q2 = WO21 * N5 + WO22 * N6 + WO23 * N7 + WO20

• The weights we need to adjust in this case are 51.


• The number of weights for the 1st hidden layer L2 would be
determined as = (4 + 1) * 5 = 25, where 5 is the number of
neurons in L2 and there are 4 input variables in L1. Each of the
input X will have a bias term which makes it 5 bias terms.
• Similarly, the number of weight for the 2nd hidden layer L3 = (5 +
1) * 3 = 18 weights, and for the output layer the number of
weights = (3 + 1) * 2 = 8.
L1.18
-No. of Weight in network
• The total number of weights for this neural network is the sum of
the weights from each of the individual layers which is = 25 + 18
+ 8 = 51

• So now we have no. of weights in network, then how do we


calculate the weights? In the first iteration, we assign
randomized values between 0 and 1 to the weights. In the
following iterations, these weights are adjusted to converge at the
optimal minimized error term.

• We are so persistent about minimizing the error because the error


tells how much our model deviates from the actual observed
values. Therefore, to improve the predictions, we constantly
update the weights so that loss or error is minimized.
L1.19
-No. of Weight in network
• This adjustment of weights is also called the correction of the
weights.
• There are two methods: Forward Propagation and Backward
Propagation to correct the betas or the weights to reach the
convergence.

L1.20
Back Propagation
• To train the network through supervised learning, the
model’s predicted output is compared to the actual output
(that is known to be correct) and the difference between
these two results is measured and is known as the cost or
cost value.
• The purpose of training is to reduce the cost value until the
• model’s prediction closely matches the correct output. This
is achieved by incrementally tweaking the network’s
weights until the lowest possible cost value is obtained.
• This process of training the neural network is called
backpropagation. Rather than navigate left to right like how
data is fed into a neural network, back-propagation is done
in reverse and runs from the output layer from the right
towards the input layer on the left. L1.21
Squashing the Neural Net
• For a binary classification problem, we know that Sigmoid
is needed to transform the linear equation to a nonlinear
equation. For a particular node, the transformation is as
follows:
N1 = W11*X1 + W12*X2 + W13*X3 + W14*X4 + W10

• After implementing the Sigmoid transformation, it becomes:


h1 = sigmoid(N1)
where,
sigmoid(N1) = exp(W11*X1 + W12*X2 + W13*X3 + W14*X4 + W10) /(1+
exp(W11*X1 + W12*X2 + W13*X3 + W14*X4 + W10))
• This alteration is applied on the hidden layers and the output
layers and is known as the Activation or the Squashing
Function. L1.22
Other Activation functions

L1.23
Forward propagation
• The process of going from left to right i.e. from the Input layer
to the Output Layer to correct or adjust weight, is Forward
Propagation
• Our binary classification dataset had input X as 4 * 8 matrix
with 4 input variables and 8 records and the Y variable is 2 * 8
matrix with two columns, for class 1 and 0, respectively with 8
records. It had some categorical variables post converting it to
dummy variables.
• The idea here is that we start with the input layer of the 4*8
matrix and want to get the output of (2*8). The hidden layers
and the neurons in each of the hidden layers are hyper
parameters and so are defined by the user. How we achieve the
output is via matrix multiplication between the input
variables and the weights of each layer.
L1.24
Forward propagation
• We have seen above that the weights will have a matrix for
each of the respective layers. We perform matrix multiplication
starting from the input matrix of 4 * 8 with the weight matrix
between the L1 and L2 layer to get the next matrix of the next
layer L3. Similarly, we will do this for each layer and repeat
the process until we reach the output layer of the dimensions 2
* 8.

L1.25
Forward propagation
• Now, let’s break down the steps to understand how the matrix
multiplication in Forward propagation works:

1. First, the input matrix is 4 * 8, and the weight matrix between


L1 and L2, referring to it as Wh1 is 5 * 5 (we saw in previous
slides).
2. The Wh1= 5* 5 weight matrix, includes both for the beta’s or the
coefficients and for the bias term.
3. For simplification, breaking the wh1 into beta weights and the
bias. So the beta weights between L1 and L2 are of 4*5
dimension (as have 4 input variables in L1 and 5 neurons in the
Hidden Layer L2).
4. For understanding purpose, will illustrate the multiplication for
one layer:
L1.26
Forward propagation

L1.27
Forward propagation
• For matrix multiplication, the number of columns of the first
matrix must be equal to the number of rows of the second
matrix. Our first matrix of input has 8 columns and the second
matrix of weights has 4 rows hence, we can’t multiply the two
matrices.

• So, we take the transpose of one of the matrices to conduct the


multiplication. Transposing the weight matrix to 5 * 4 will help us
resolve this issue.

• Note that for the next layer between L2 and L3, the input this time
will not be X but will be h1, which results from L1 and L2.

Z2 = Wh2 * h1 + bh2
L1.28
Conclusion
1. First, we initialize the weights with some random values,
which are mostly between 0 and 1.
2. Calculate the output which is to predict the values and estimate
the loss.
3. Then, adjust the weights such that the loss will be minimized.

• We repeat the above three steps until there is no change in the


loss function that has reached its optimal value or based on the
number of iterations (or epochs).

L1.29

You might also like