Lecture 3 - MachineLearning-CrashCourse2023

Machine learning: Crash Course
Kshitij Sharma
Department of Computer Science
Faculty of Information Technology and Electrical Engineering
What?
Why?
Machine Learning is the future
• Why Machine Learning Is The Future Of Business Culture?

How?
Supervised vs Unsupervised?
Feature vs variables?
Feature extraction
Feature space reduction
• Space required to store the data is reduced as the number of features
comes down
• Less features lead to less computation/training time
• Some algorithms do not perform well when we have a large number

of features. So reducing these features needs to happen for the
algorithm to be useful.
• It takes care of multicollinearity by removing redundant features.
Such features are highly correlated to each other. There is no point in
• It helps in visualizing data. Tt is very difficult to visualize data in bigger
feature spaces so reducing the space to 2D or 3D may allow for
plotting and observing patterns more clearly.
• Missing Values Ratio
• Data columns with too many missing values are unlikely to carry much useful
information
• data columns with number of missing values greater than a given threshold can
be removed
• Low Variance Filter
• data columns with little changes in the data carry little information
• variance is range dependent; therefore normalization is required before
applying this technique
• High Correlation Filter
• Data columns with very
similar trends are also
likely to carry very similar
information
• only one of them will
suffice to feed the machine
learning model
library(corrgram)
corrgram(<dataFrame>,
order=TRUE,
lower.panel=panel.shade,
upper.panel=panel.pie,
text.panel=panel.txt)
• Forward Selection • Backward Elimination
Feature 1 Feature 1 Feature 2
Feature 1 Feature 2
Feature 3 Feature 3
Feature 1
Feature 2 Feature 4
Feature 3 Feature 1
Feature 1
Feature 2 Feature 1
Feature 3 Feature 4 Feature 3 Feature 5
Feature 1
Feature 4 Feature 5
Feature 4 Feature 3
Feature 3 Feature 5
Feature 5 Feature 5 Feature 5 Feature 2 Feature 2
Prediction/classification
Classification
• Centroid based
Classification
• Centroid based Compute centroids
Classification
• Centroid based
Classification
• Centroid based
• Compute distances
Classification
• Nearest neighbour
Classification
• Nearest neighbour
Classification
• K - Nearest neighbour
# load data
df <- data(iris)
summary(iris)
# normalise the data between 0 and 1

normalisedData <- as.data.frame(
lapply(iris[,c(1,2,3,4)],
function(x) {
(x -min(x))/(max(x)-min(x))
}))
summary(normalisedData)
#randomely generate 80% indices from the original data
trainRandom <- sample(1:nrow(iris), 0.8 * nrow(iris))
#divide the original data into training and testing set

trainData <- normalisedData[trainRandom,]
testData <- normalisedData[-trainRandom,]
targetClass <- iris[trainRandom,5]
testClass <- iris[-trainRandom,5]
#load the package class and run the knn classification

library(class)
set.seed(1234)
prediction <- knn(trainData,testData,cl=targetClass,k=13)
#create confusion matrix and compute accuracy

tab <- table(prediction,testClass)
accuracy <- 100* sum(diag(tab)/(sum(rowSums(tab))))
#try with different values of ”k”
acc<-numeric()
for (i in 3:15)
{
set.seed(1234)
prediction <- knn(trainData,testData,cl=targetClass,k=i)
tab <- table(prediction,testClass)
acc[i-2]<-accuracy(tab)
}
d1<-data.frame(neighbours=3:15, accuracy=acc)
# accuracy function
accuracy <- function(x)
{
sum(diag(x)/(sum(rowSums(x)))) * 100
}
# plot
ggplot(d1,aes(x=neighbours,y=accuracy)) + geom_line() + geom_point() + theme_bw()
Classification
• Decision
tree
#read data
set.seed(6789)
titanic <-read.csv('https://raw.githubusercontent.com/guru99-edu/R-Programming/master/titanic_data.csv')
head(titanic)
# clean data
# Drop useless information
# "pclass" and "survived" columns get labels instead of numbers (easy to understand)
# remove NAs
cleanData <- titanic
%>% select(-c(home.dest, cabin, name, x, ticket))
%>% mutate(pclass = factor(
pclass, levels = c(1, 2, 3),
labels = c('Upper', 'Middle', 'Lower')),
survived = factor(
survived, levels = c(0, 1),
labels = c('No', 'Yes')))
%>% na.omit()
glimpse(cleanData)
#Split into training and testing
set.seed(1234)
trainRandom <- sample(1:nrow(cleanData), 0.8 * nrow(cleanData))
trainData <- cleanData[trainRandom,]
testData <- cleanData[-trainRandom,]
# train the model and plot

treeModel <- rpart(survived~., data = trainData, method = 'class')
rpart.plot(treeModel, extra = 106)
rpart.plot(treeModel, extra = 106)
# test the model
predict <-predict(treeModel, testData, type = 'class')
tab <- table(testData$survived, predict)
accuracy(tab)
Acc=79.38
# tune the parameters

control <- rpart.control(
minsplit = 4,
minbucket = round(5 / 3),
maxdepth = 3,
cp = 0)
# run the model with tuning

tuneModel <- rpart(survived~., data = trainData, method = 'class', control = control)
predict <- predict(treeModel, testData, type = 'class')
tab <- table(testData$survived, predict)
tab
testAccuracy <- sum(diag(tab)) / sum(tab)
testAccuracy
Acc=80.53
• Regression
• Y = 14.81X + 88.33
Sales
Money on ads
Problem with
Linear regression
• Linear Regression
Sales
Money on ads
• Polynomial Regression
• Y = 14.81X + 2.89X2 + 88.33
Sales
Money on ads
Quality
Prediction/classification Quality
Error
Error = original - predicted
• Mean Absolute Error (MAE)
• Root Mean Squared Error (RMSE)
Error
• R-squared
• Precision, Recall, Accuracy
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑅𝑒𝑎𝑐𝑙𝑙 =
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑜𝑡𝑎𝑙
One package to rule them all - caret
library(caret)
names(getModelInfo())
More than 200 prediction/classification

algorithms are supported
Caret
#training-grid
fitControl<-trainControl(method="repeatedcv",number=10, repeats = 10)

# this is just for the cross-validation
# the following are for the individual algorithms you can try other models by running -->
modelLookup("<name of the model>")
# for a list of models supported by caret package and to find the names for "modelLookup" run -->
names(getModelInfo())
Caret
train(
formula,
data = …,
method=" … ",
trControl = …,
preProcess = …,
tuneGrid = …,
metric = …
)
predict(<trained model>,<test Data>)
confusionMatrix(<predicted values/labels>, <original values/labels>)

Naïve bayes
Naïve bayes
Class prior
• Naïve bayes likelihood
probability
posterior probability prior probability

of predictor
Naïve bayes
=
…
Naïve bayes Outlook Temp Humidity Windy Play
RAINY HOT HIGH FALSE NO
RAINY HOT HIGH TRUE NO
OVERCAST HOT HIGH FALSE YES
SUNNY MILD HIGH FALSE YES
SUNNY COOL NORMAL FALSE YES
SUNNY COOL NORMAL TRUE NO
OVERCAST COOL NORMAL TRUE YES
RAINY MILD HIGH FALSE NO
RAINY COOL NORMAL FALSE YES
SUNNY MILD NORMAL FALSE YES
RAINY MILD NORMAL TRUE YES
OVERCAST MILD HIGH TRUE YES
OVERCAST HOT NORMAL FALSE YES
SUNNY MILD HIGH TRUE NO
Naïve bayes
Outlook Temp Humidity Windy Play
Frequency table
RAINY • Naïve
HOT HIGH
bayes FALSE NO
RAINY HOT HIGH TRUE NO YES NO
OVERCAST HOT HIGH FALSE YES SUNNY 3 2
SUNNY MILD HIGH FALSE YES OVERCAST 4 0
SUNNY COOL NORMAL FALSE YES RAINY 2 3
SUNNY COOL NORMAL TRUE NO
OVERCAST COOL NORMAL TRUE YES
Likelihood table
YES NO
RAINY COOL NORMAL FALSE YES
SUNNY 3/9 2/5
SUNNY MILD NORMAL FALSE YES
OVERCAST 4/9 0/5
RAINY MILD NORMAL TRUE YES
RAINY 2/9 3/5
Naïve bayes
YES NO
SUNNY 3/9 2/5 5/14
P(SUNNY|YES) = 3/9 = 0.33
OVERCAST 4/9 0/5 4/14 P(YES) = 9/14 = 0.64
RAINY 2/9 3/5 5/14
9/14 5/14
P(SUNNY) = 5/14 = 0.36
Naïve bayes
= 0.60
= 0.36
Naïve bayes
=
…
Naïve bayes
Outlook Temp Humidity Windy Play YES NO
RAINY • Naïve
HOT HIGH
bayes FALSE NO HOT 2/9 2/5
RAINY HOT HIGH TRUE NO MILD 4/9 2/5
OVERCAST HOT HIGH FALSE YES COOL 3/9 1/5
SUNNY MILD HIGH FALSE YES
SUNNY COOL NORMAL FALSE YES YES NO
SUNNY COOL NORMAL TRUE NO HIGH 3/9 4/5
OVERCAST COOL NORMAL TRUE YES NORMAL 6/9 1/5
RAINY COOL NORMAL FALSE YES YES NO
SUNNY MILD NORMAL FALSE YES FALSE 6/9 2/5
RAINY MILD NORMAL TRUE YES TRUE 3/9 3/5
Naïve bayes
YES NO YES NO
HOT 2/9 2/5 SUNNY 3/9 2/5
MILD 4/9 2/5 OVERCAST 4/9 0/5
COOL 3/9 1/5 RAINY 2/9 3/5
YES NO YES NO
HIGH 3/9 4/5 FALSE 6/9 2/5
NORMAL 6/9 1/5 TRUE 3/9 3/5

RAINY COOL HIGH TRUE ?
Naïve bayes
=
…
Naïve bayes
𝑃 𝑌𝐸𝑆 𝑋 =
𝑃 𝑅𝑎𝑖𝑛𝑦 𝑌𝑒𝑠 × 𝑃 𝐶𝑂𝑂𝐿 𝑌𝑒𝑠 × 𝑃 𝐻𝐼𝐺𝐻 𝑌𝑒𝑠 × 𝑃 𝑇𝑅𝑈𝐸 𝑌𝑒𝑠 ×
𝑃(𝑌𝐸𝑆)
𝑃 𝑁𝑂 𝑋 =
𝑃 𝑅𝑎𝑖𝑛𝑦 𝑁𝑂 × 𝑃 𝐶𝑂𝑂𝐿 𝑁𝑂 × 𝑃 𝐻𝐼𝐺𝐻 𝑁𝑂 × 𝑃 𝑇𝑅𝑈𝐸 𝑁𝑂 ×
𝑃(𝑁𝑂)
Naïve bayes
= 𝑃 𝑌𝐸𝑆 𝑋
=
𝑃 𝑁𝑂 𝑋
=
=
Naïve bayes - Implementation
library(dplyr)
library(ggplot2)
library(caret)
# load data and remove Nas

data("PimaIndiansDiabetes2", package = "mlbench")
PimaIndiansDiabetes2 <- na.omit(PimaIndiansDiabetes2)
# split data
set.seed(1234)
trainingSamples <- PimaIndiansDiabetes2$diabetes %>% createDataPartition(p = 0.8, list = FALSE)
trainData <- PimaIndiansDiabetes2[trainingSamples, ]
testData <- PimaIndiansDiabetes2[-trainingSamples, ]
#train
model <- train(
diabetes ~.,
data = trainData,
method = "nb",
trControl =
trainControl("cv", number = 10))
#test
predictions <-predict(model,testData)
#evaluate
confusionMatrix(predictions,testData$diabetes)
#parameter tuning
modelLookup("nb")
gridControl <- expand.grid(

usekernel = c(TRUE, FALSE),
fL = 0:5,
adjust = seq(0, 5, by = 1))
modelTune <- train( diabetes ~.,

data = trainData,
method = "nb",
trControl = trainControl("cv", number = 10), tuneGrid = gridControl)
Naïve bayes – When to use?
plot(modelTune)
#training performance
confusionMatrix(modelTune)
#test with the best model

predictions <-predict(modelTune,testData)
#evaluate the best model

confusionMatrix(predictions,testData$diabetes)
Support vector machines
Support vector machines – kernel trick
SVM - implementation
trainSVMLinear <- train(

score~.,
data=trainData[,featureSet],
method="svmLinear” OR , "svmPoly” OR ”svmRadial”
trControl = fitControl,
preProcess = c("center","scale"),
tuneGrid = svmLinearGrid, OR svmPolyGrid OR svmRadialGrid
metric="rmse")
svmLinearGrid <- expand.grid(

C = seq(0, 1, length = 10))
svmPolyGrid <- expand.grid(

C = seq(0, 1, length = 10),
degree=c(1:10),
scale = seq(0, 1, length = 10))
svmRadialGrid <- expand.grid(

C = seq(0, 1, length = 10),
sigma=seq(0, 1, length = 25))
SVM – When to use?
Neural networks
Basic Architectures
Neural networks
Basic Architecture
• Human Brain
• Thalmocortical System
• 3 Million Neurons
• 476 Million Synapses
• Full Brain
• 10 Billion Neurons
• 1,000 Trillion Synapses
• Artificial Neural network
• ResNet 152 layers
• 60 Million Synapses
Neural networks
Basic Architecture
Biological NN Artificial NN
10 million times more synapses
Topology is asynchronous and not constructed in
layer
For our biological networks we don't know The learning algorithm for artificial neural networks
is backpropagation
Power consumption human brains are much more
efficient
Learning in the biological neural networks Often there's a training stage, there's a distinct
you really never stop learning training stage and there's a distinct testing stage
when you release the thing in the wild
Neural networks
Basic Architecture
Perceptron without Bias Perceptron with Bias

Neural networks
Single computational layer: Perceptron
Classification
Regression
The weights wi are calculated using the gradient descent techniques

Neural networks
Multi-layer Neural Networks
Multi-layer NN without Bias Multi-layer NN with Bias

Neural networks
Multi-layer Neural Networks
Multi-layer NN scalar notation Multi-layer NN vector notation

Basics of neural networks
Activation functions
Neural networks - caret
Deep Neural networks
Deep Neural networks
Neural networks – When to use?
Resources
• Machine Learning Yearning – Andrew Ng
• The hundred page machine learning book – Andriy Burkov
• Machine Learning by Tom M Mitchell
Thank you!!
Questions??

Lecture 3 - MachineLearning-CrashCourse2023

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 3 - MachineLearning-CrashCourse2023

Uploaded by

Copyright:

Available Formats

Machine learning: Crash Course

• Why Machine Learning Is The Future Of Business Culture?

• Less features lead to less computation/training time

• Some algorithms do not perform well when we have a large number

# normalise the data between 0 and 1

#divide the original data into training and testing set

#load the package class and run the knn classification

#create confusion matrix and compute accuracy

# train the model and plot

# tune the parameters

# run the model with tuning

𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒

More than 200 prediction/classification

fitControl<-trainControl(method="repeatedcv",number=10, repeats = 10)

modelLookup("<name of the model>")

predict(<trained model>,<test Data>)

confusionMatrix(<predicted values/labels>, <original values/labels>)

posterior probability prior probability

Outlook Temp Humidity Windy Play

# load data and remove Nas

gridControl <- expand.grid(

modelTune <- train( diabetes ~.,

#test with the best model

#evaluate the best model

trainSVMLinear <- train(

svmLinearGrid <- expand.grid(

svmPolyGrid <- expand.grid(

svmRadialGrid <- expand.grid(

Perceptron without Bias Perceptron with Bias

The weights wi are calculated using the gradient descent techniques

Multi-layer NN without Bias Multi-layer NN with Bias

Multi-layer NN scalar notation Multi-layer NN vector notation

You might also like