Professional Documents
Culture Documents
ANN-Unit 3 - Regression & Multi-Layer Perceptron
ANN-Unit 3 - Regression & Multi-Layer Perceptron
Applied Neural
Networks
Unit – 3
Lecture Outline
▪ Machine Learning Basics
▪ Linear and Logistic Regression
▪ Neural Networks and Architecture
▪ Vector Analysis for Neural Networks
▪ Loss and Cost Functions
▪ Derivative Evaluation
▪ Vectorization
1
10/30/2023
Machine Learning
▪ As a broad subfield of artificial intelligence, machine learning is concerned with the
design and development of algorithms and techniques that allow computers
to "learn".
▪ A major focus of machine learning research is to automatically learn to recognize
complex patterns and make intelligent decisions based on data.
▪ Unsupervised Learning
▪ Machine learning algorithms used to draw inferences from datasets consisting of input data
without labeled responses
▪ Reinforcement Learning
▪ Learning from a series of reinforcements—rewards or punishments. For example,
the lack of a tip at the end of the customer dealing or sale.
2
10/30/2023
Classification
3
10/30/2023
Classification: Definition
▪ Given a collection of records (training set )
▪ Each record contains a set of attributes, one of the attributes is the class.
▪ Find a model for class attribute as a function of the values of other attributes.
Classification Example
4
10/30/2023
Classification Example
Venue Type of Wicket Type of match Batted first Winning
Team
Pakistan Slow ODI Pakistan Pakistan
The input and output values can be discrete or continuous. For now we will concentrate on problems
where the output has exactly two possible values; this is Boolean classification
Classification
• Given a collection of records (training set )
– Each record contains a set of attributes, one of the attributes is the class (categorical
variable).
• Find a model for class attribute as a function of the values of other attributes
(supervised learning).
Venue Type of Wicket Type of match Batted first Winning Team
10
5
10/30/2023
11
Linear Regression
12
6
10/30/2023
What is Regression?
▪ Regression is a parametric technique used to predict continuous (dependent)
variable given a set of independent variables.
▪ It is parametric in nature because it makes certain assumptions (discussed next)
based on the data set.
▪ If the data set follows those assumptions, regression gives incredible results. Otherwise,
it struggles to provide convincing accuracy.
13
y
dependent
variable
Regression (output)
14
7
10/30/2023
Linear Regression
We want to find the best line (linear function y=f(X))
to explain the data.
y
X
Dr. Muhammad Usman Arif; Applied Neural Networks 10/30/2023 15
15
(Sex) x2
y (Income)
(Experience)x3
(Age) x4
Dr. Muhammad Usman Arif; Applied Neural Networks 10/30/2023 16
16
8
10/30/2023
Linear Regression
The predicted value of y is given by:
𝑦ො = 𝛽መ0 + 𝑋𝑗 𝛽መ𝑗
𝑗=1
17
18
9
10/30/2023
19
Y = b0 + b1X
▪ To find the values for the coefficients which minimize the objective function we
take the partial derivates of the objective function (SSE) with respect to the
coefficients. Set these to 0, and solve.
20
10
10/30/2023
Example I
▪ Find the least square regression line for the following set of data
{(-1 , 0),(0 , 2),(1 , 4),(2 , 5)}
x y xy x2
-1 0 0 1
0 2 0 0
1 4 4 1
2 5 10 4
Σx = 2 Σy = 11 Σx y = 14 Σx = 6
2
21
Example I
22
11
10/30/2023
x 0 1 2 3 4
y 2 3 5 4 6
Example II
• Find the least square regression line for the following set
of data. Estimate y when x = 10
x y xy x2
0 2 0 0
1 3 3 1
2 5 10 4
3 4 12 9
4 6 24 16
Σx = 10 Σy = 20 Σx y = 49 Σx2 = 30
Dr. Muhammad Usman Arif; Applied Neural Networks 10/30/2023 23
23
Example II
24
12
10/30/2023
Example III
▪ The sales of a company (in million dollars) for each year are shown in the table
below.
25
26
13
10/30/2023
Example III
27
Error Calculation
▪ Error is an inevitable part of the prediction-making process.
▪ No matter how powerful the algorithm we choose, there will always remain an (∈)
irreducible error which reminds us that the "future is uncertain."
▪ Try to reduce it to the lowest.
▪ Conceptually, the regression model tries to reduce the sum of squared
errors ∑[Actual(y) - Predicted(y')]² by finding the best possible value of regression
coefficients (β0, β1, etc).
28
14
10/30/2023
Regression Model
▪ The first coefficient without an input is called
the intercept, and it adjusts what the model
predicts when all your inputs are 0.
29
Residual Errors
▪ We call the difference between the actual value and the model’s estimate a residual.
▪ If our collection of residuals are small, it implies that the model that produced them
does a good job at predicting our output of interest.
▪ Conversely, if these residuals are generally large, it implies that model is a poor
estimator.
▪ We technically can inspect all of the residuals to judge the model’s accuracy but this
does not scale well.
▪ Statistical Computations
▪ Mean Absolute Error
▪ Mean Squared Error
▪ Root Mean Squared Error
30
15
10/30/2023
31
Interpreting MAE
▪ The MAE is also the most intuitive of the metrics since we’re just looking at the
absolute difference between the data and the model’s predictions.
▪ Because we use the absolute value of the residual, the MAE does not
indicate underperformance or overperformance of the model (whether or not the
model under or overshoots actual data).
▪ Each residual contributes proportionally to the total amount of error, meaning that
larger errors will contribute linearly to the overall error.
▪ A small MAE suggests the model is great at prediction, while a large MAE suggests
that your model may have trouble in certain areas.
▪ A MAE of 0 means that your model is a perfect predictor of the outputs (but this will
almost never happen).
32
16
10/30/2023
33
34
17
10/30/2023
▪ MSE is measured in units that are the square of the target variable.
▪ RMSE is measured in the same units as the target variable.
35
Multiple Linear
Regression
Slides in this section are taken from the Instructor Resources of Applied Statistics and Probability for Engineers
by Montgomery and Runger (John Wiley and Sons).
36
18
10/30/2023
Introduction
• Many applications of regression analysis involve
situations in which there are more than one
regressor variable.
• A regression model that contains more than one
regressor variable is called a multiple regression
model.
37
38
19
10/30/2023
39
Data Representation
40
20
10/30/2023
where
41
42
21
10/30/2023
43
Example
44
22
10/30/2023
45
46
23
10/30/2023
47
Example
48
24
10/30/2023
Logistic Regression
49
50
25
10/30/2023
Logistic Regression
Furthermore, the logistic regression, rather
than fitting a line to the given data fits a
curve to the data.
The curve gives us the probability of the
output variable being 1 or 0 based on the
independent attributed.
In our figure this gives us the probability of
a mouse being obese based on the weight
of the mouse.
51
π = Proportion of “Success”
In ordinary regression the model predicts the
mean Y for any combination of predictors.
What’s the “mean” of a 0/1 indicator variable?
yi # of 1' s
y= = = Proportion of " success"
n # of trials
Goal of logistic regression: Predict the “true”
proportion of success, π, at any value of the
predictor.
Dr. Muhammad Usman Arif; Applied Neural Networks 10/30/2023 52
52
26
10/30/2023
53
54
27
10/30/2023
Sigmoid Function
no data Function Plot
1.0
0.8
0.6
y
0.4
0.2
-10 -8 -6 -4 -2 0 2 4 6 8 10 12
x
exp (bo + b1• x )
y=
1 + exp (bo + b1• x )
55
Maximum Likelihood
56
28
10/30/2023
57
Specification of ANN
▪ The number of input attributes found within individual instances determines the
number of input layer nodes.
▪ The user specifies the number of hidden layers as well as the number of nodes
within a specific hidden layer.
58
29
10/30/2023
Input Format
59
▪ Neural networks can be used for both classification (to predict the class label of a
given tuple) or prediction (to predict a continuous-valued output).
▪ For classification, one output unit (node) may be used to represent two classes (where the
value 1 represents one class, and the value 0 represents the other).
▪ If there are more than two classes, then one output unit per class is used.
60
30
10/30/2023
Architecture of NN?
▪ How many neurons are required in the input layer?
Name Give Birth Can Fly Live in Water Have Legs Class
human yes no no yes mammals
python no no no no non-mammals
salmon no no yes no non-mammals
whale yes no yes no mammals
frog no no sometimes yes non-mammals
komodo no no no yes non-mammals
bat yes yes no yes mammals
pigeon no yes no yes non-mammals
cat yes no no yes mammals
leopard shark yes no yes no non-mammals
turtle no no sometimes yes non-mammals
penguin no no sometimes yes non-mammals
porcupine yes no no yes mammals
eel no no yes no non-mammals
salamander no no sometimes yes non-mammals
gila monster no no no yes non-mammals
platypus no no no yes mammals
owl no yes no yes non-mammals
dolphin yes no yes no mammals
eagle no yes no yes non-mammals
61
Architecture of NN?
▪ How many neurons are required in the input layer?
Outlook Temperature Humidity W indy Class
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
Dr. Muhammad Usman Arif; Applied Neural Networks 10/30/2023 62
62
31
10/30/2023
x1 x2 x3 x4 x5
Input
Layer
Hidden
Output Format Layer
63
Node 1 w1i
w3i Node j
w3j
Node 3
64
32
10/30/2023
Learning ANN
▪ The backpropagation algorithm performs learning on a multilayer
feed-forward neural network
▪ Learning is accomplished by modifying network connection
weights while a set of input instances is repeatedly passed
through the network.
▪ Once trained, an unknown instance passing through the network
is classified according to the value(s) seen at the output layer.
65
66
33
10/30/2023
67
▪ wik = r x Error(k) x Oi
▪ where r is learning rate parameter, 0 < r < 1
68
34
10/30/2023
Algorithm
▪ Initialize the network:
▪ Create the network topology by choosing the number of nodes for the input, hidden, and output layers.
▪ Initialize weights for all node connections to arbitrary values between -1.0 and 1.0.
▪ Choose a value between 0 and 1 for the learning parameter.
▪ Choose a terminating condition.
▪ For all the training instances:
▪ Feed the training instance through the network.
▪ Determine the output error.
▪ Updated the network weights.
▪ If the terminating condition has not been met, repeat step 2.
▪ Test the accuracy of the network on a test dataset. If the accuracy is less than optimal,
change one or more parameters of the network topology and start over.
69
Training/Testing of ANN
▪ During the training phase, training instances are
repeatedly passed through the network while individual
weight values are modified.
▪ The purpose of changing the connection weights is to
minimize training set error rate.
▪ Network training continues until a specific terminating
condition is satisfied.
▪ The terminating condition can be convergence of the
network to a minimum total error value, a specific time
criterion, or a maximum number of iterations.
70
35