DW&M Unit 3 Part I

US-TCS-504
•
•
•
•
•
Learning
•
Learning
• There are three types of learning:
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
Supervised Learning
• In Supervised learning, an AI system is
presented with data which is labelled.
• It means some data is already tagged with the
correct answer.
• It can be compared to learning which takes
place in the presence of a supervisor or a
teacher.
Supervised learning
Types of supervised learning
Regression
• It is a Supervised Learning task where output is
having continuous value.
• It is used for the prediction of continuous variables,
such as Weather forecasting, Market Trends, etc.
Classification
• Classification means to group the output
inside a class.
• If the algorithm tries to label input into two
distinct classes, it is called binary
classification.
• Selecting between more than two classes is
referred to as multiclass classification.
Classification
Unsupervised learning
• Unsupervised learning is a type of machine
learning in which models are trained using
unlabelled dataset and are allowed to act on
that data without any supervision.
Types of Unsupervised learning
Clustering
• Clustering is a method of grouping the objects
into clusters such that objects with most
similarities remains into a group and has less
or no similarities with the objects of another
group.
Association
• An association rule learning problem is where
you want to discover rules that describe large
portions of your data, such as people that buy
X also tend to buy Y.
Reinforcement Learning
• A reinforcement learning algorithm, or agent,
learns by interacting with its environment.
• The agent receives rewards by performing
correctly and penalties for performing incorrectly.
• The agent learns without intervention from a
human by maximizing its reward and minimizing
its penalty.
• It is a type of dynamic programming that trains
algorithms using a system of reward and
punishment.
Models
• There are two forms of data analysis that can
be used for extracting models describing
important classes or to predict future data
trends. These two forms are as follows −
• Classification
• Prediction
Prediction
• Prediction models predict continuous valued
functions.
• The prediction technique can be used in the

sale to predict profit/loss for the future.
Two Steps
•
Classification
• Classification models predict categorical class
labels
• A classification model may be built to

categorize credit card transactions as either
real or fake.
Binary Classification
• Binary classification refers to predicting one of two
classes.
• y=f(x), where y = categorical output
Multi class Classification
• If a classification problem has more than two
outcomes, then it is called as Multi-class
Classifier.
Example: Classifications of types of crops
Two steps
1. Developing the Classifier or model creation
2. Applying classifier for classification
Step 1 (Developing)
• This step is the learning step or the learning phase.
• In this step the classification algorithms build the

classifier.
• The classifier is built from the training set made up of

database tuples and their associated class labels.
• Each tuple that constitutes the training set is referred

to as a category or class. These tuples can also be
referred to as sample, object or data points.
Step 1
•
Step 2
• In this step, the classifier is used for
classification.
• Here the test data is used to estimate the
accuracy of classification rules.
• The classification rules can be applied to the
new data tuples if the accuracy is considered
acceptable.
Step 2
•
Types Of Learners In Classification
• Lazy Learners – Lazy learners simply store the
training data and wait until a testing data
appears. Eg – k-nearest neighbor.
• Eager Learners – Eager learners construct a

classification model based on the given training
data before getting data for predictions.
• Due to this, they take a lot of time in training and
less time for a prediction. Eg – Decision Tree,
Naive Bayes, Artificial Neural Networks.
Process
•
Applications
• Sentiment Analysis
• Document Classification
• Image Classification
Statement of the Problem
Training Set: T = {t1, …, tn} set of n
examples
Each example ti
• characterized by m features (ti(A1), …, ti(Am))
• belongs to one of k classes (Ci : 1 i k)
GOAL
• From the training data find a model to
describe the classes accurately and
synthetically using the data’s features.
• The model will then be used to assign
class labels to unknown (previously
unseen) records
KNN Classification
• K-NN algorithm assumes the similarity

between the new case/data and available
cases and put the new case into the category
that is most similar to the available categories.
• KNN is a non-parametric and lazy learning
algorithm.
KNN Classification
• K-NN is a non-parametric algorithm, which
means it does not make any assumption on
underlying data.
• It is also called a lazy learner algorithm
because it does not learn from the training set
immediately instead it stores the dataset and
at the time of classification, it performs an
action on the dataset.
KNN
• Suppose P1 is the point, for which label needs
to predict.
• First, you find the k closest point to P1 and
then classify points by majority vote of its k
neighbors.
• Each object votes for their class and the class
with the most votes is taken as the prediction.
KNN
•
Optimal value of K ??
•
Optimal value of K
•
KNN example
NAME AGE GENDER
CLASS OF
SPORTS
• Here male is denoted
with numeric value 0
Ajay 32 0 Football
and female with 1.
Mark 40 0 Neither
Sara 16 1 Cricket
• Find in which class of
Zaira 34 1 Cricket
people Angelina will lie
Sachin 55 0 Neither
whose k factor is 3 and
age is 5.
Rahul 40 0 Cricket
Pooja 20 1 Neither
Smith 15 0 Cricket
Laxmi 55 1 Football
Michael 15 0 Football
Finding distance
•
KNN
• To find the distance between 2 points
• Distance between Angelina and Ajay is

• d=√((age2-age1)²+(gender2-gender1)²)
• d=√((5-32)²+(1-0)²)
• d=√729+1
The distance calculated
• Ajay 27.02
Mark 35.01
Sara 11.00
Zaira 9.00
Sachin 50.01
Rahul 35.01
Pooja 15.00
Smith 10.00
Laxmi 50.00
Michael 10.05
KNN Classification
• As the value of k=3, the 3 closest points are
Zaira 9 cricket
Michael 10 cricket
smith 10.5 football
• So according to KNN algorithm, Angelina will

be in the class of people who like cricket.
Decision tree
• Decision tree builds classification models in the form
of a tree structure.
• It breaks down a dataset into smaller and smaller
subsets
• The final result is a tree with decision nodes and leaf
nodes.
Steps
• It begins with the original set S as the root
node.
• On each iteration of the algorithm, it iterates
through the very unused attribute of the set S
and calculates Entropy(H) and Information
gain(IG) of this attribute.
Steps
• It then selects the attribute which has the
smallest Entropy or Largest Information gain.
• The set S is then split by the selected attribute
to produce a subset of the data.
• The algorithm continues to recur on each
subset, considering only attributes never
selected before.
ID3 (Iterative Dichotomiser 3)
• ID3 follows the rule — A branch with an
entropy of zero is a leaf node and A branch
with entropy more than zero needs further
splitting.
Entropy
• Entropy: Entropy is a measure of the randomness
in the information being processed.
• Information Gain: Information Gain is the
decrease/increase in Entropy value when the
node is split.
• An attribute should have the highest information
gain to be selected for splitting.
• Based on the computed values of Entropy and
Information Gain, we choose the best attribute at
any particular step.
Example
Decision tree
• Mathematically Entropy for 1 attribute is
represented as:
• Where S → Current state, and

• Pi → Probability of an event
Entropy
Information Gain
Information gain of all Attributes
Highest gain attribute
• To determine which of the following
Temperature, Humidity or Wind has higher
information gain.
• Calculate parent entropy E(S)

• E(S) = (-(3/5)log(3/5)-(2/5)log(2/5)) = 0.971.
• Now calculate information gain.
IG(S, Temperature) = 0.971–0.4 =0.571
• Similarly
IG(S, Humidity) = 0.971
IG(S, Windy) = 0.020
• For humidity from the above table,
• play will occur if humidity is normal and
• not occur if it is high.
Underfitting
• A statistical model or a machine learning
algorithm is said to have underfitting when it
cannot capture the underlying trend of the
data.
• Underfitting destroys the accuracy of the
machine learning model.
Overfitting
• A statistical model is said to be overfitted,
when we train it with a lot of data.
• When a model gets trained with so much of
data, it starts learning from the noise and
inaccurate data entries in our data set.
Curve Fitting
Pruning
• Pruning is an algorithm used to reduce the
number of nodes that have to be evaluated to
reach the optimal solution.
• Pruning works by blocking the evaluation of
nodes whose leaf nodes would give worse
results compared to the one that was
previously examined.
Cross-Validation
• Cross-Validation (CV)is a technique used to
test a model’s ability to predict unseen data,
data not used to train the model.
• Holdout cross-validation
• k- fold cross validation
• Leave-one-out cross-validation LOOCV
• Random subsampling
Holdout cross-validation
• The simplest approach is to randomly split the
available data into a training set
• From this the learning algorithm produces h
using train set and a test set on which the
accuracy of h is evaluated.
k- fold cross validation
• Split the data into k equal subsets.
• We then perform k rounds of learning
• On each round 1/k of the data is held out as a test
set and the remaining examples are used as training
data.
Leave-one-out cross-validation LOOCV
• Leave-one-out cross validation is K-fold cross
validation
• Leave-one-out cross-validation (LOOCV) is an
exhaustive cross-validation technique.
• K equal to N, the number of data points in the
set.
Random subsampling
• Random subsampling validation also referred
to as Monte Carlo cross-validation splits the
dataset randomly into training and validation.
• In this technique, multiple sets of data are
randomly chosen from the dataset and
combined to form a test dataset. The
remaining data forms the training dataset.
Random subsampling
• The error rate of the model is the average of
the error rate of each iteration.
Evaluation Metrics
• Evaluation metrics help in determining how
good the model is trained.
1. Confusion matrix
2. Accuracy
3. Precision
4. Recall
5. ROC
Confusion matrix
• The confusion matrix visualizes the accuracy
of a classifier by comparing the actual and
predicted classes.
Evaluation metrics
•Recall or Sensitivity or TPR (True Positive Rate):
TP/(TP+FN)
•Specificity or TNR (True Negative Rate):

TN/(TN+FP)
•Precision:
TP/(TP+FP)
Evaluation metrics
•Accuracy:
(TP+TN)/(N+P)
F1, log loss
• ROC is a probability curve and AUC represents
the degree or measure of separability.
Bayes Theorem
• It is defined as the: Probability of an event A
given B equals the probability of B and A
happening together divided by the probability
of B.”
Naïve Bayes
• Naive Bayes is a probabilistic machine learning
algorithm that can be used in a wide variety of
classification tasks.
• Typical applications include filtering spam,
sentiment prediction etc.
Naïve Bayes
• It is a classification technique based on Bayes’
Theorem with an assumption of
independence among predictors.
• Naive Bayes classifier assumes that the
presence of a particular feature in a class is
unrelated to the presence of any other
feature.
Naïve Bayes
• Multiple Features x1,x2,x3…….xn and y
output.
The posterior probability can then be written as:

Naïve Bayes
• The Naive Bayes classifier combines this
model with a decision rule. One common rule
is to pick the hypothesis that’s most probable.
• This is known as the maximum a
posteriori or MAP decision rule.
Dataset
In learning phase we will compute the
table of likelihood
Calculate P(Yes) and P(No)
Naïve Bayes
• If we have a new set of weather condition. We
can classify it as whether we will play football
or not based on the likelihood from learning
phase.
New Sample
• Outlook=Sunny,
• Temperature=Mild,
• Humidity=High and
• Wind=Weak
• On this we can predict whether we will play or

not?
Probability to Play
• P(play=Yes|x) =
2/9 * 4/9 * 3/9 * 6/9 *9/14 =
0.014
Probability to No play
• P(No|x) =
3/5 * 2/5 * 4/5 * 2/5 * 5/14=
0.0274
• Normalize
• P(Yes|x) = 0.0141/(0.0141 + 0.0274)
= 0.34
• P(No|x) = 0.00274/(0.0141 + 0.0274)

= 0.66
Naïve Bayes
• Since P(No|x) is greater than P(Yes|x) we
classify the new instance as No.
Rule based Classifier
• Rule-based classifier makes use of a set of IF-THEN
rules for classification.
IF condition THEN conclusion
1. The IF part of the rule is called rule antecedent or
precondition.
2. The THEN part of the rule is called rule consequent.
3. The antecedent part the condition consist of one or
more attribute tests and these tests are logically
ANDed.
4. The consequent part consists of class prediction.
Example
• • R1: (Give Birth = no) & (Can Fly
= yes) → Birds
• R2: (Give Birth = no) & (Live in

Water = yes) → Fishes
• R3: (Give Birth = yes) & (Blood

Type = warm) → Mammals
• R4: (Give Birth = no) & (Can Fly

= no) → Reptiles
• R5: (Live in Water = sometimes)

→ Amphibians
Example
• R1: (Give Birth = no) & (Can Fly =
yes) → Birds
• The rule R1 above covers a hawk
=> Bird • R2: (Give Birth = no) & (Live in
• The rule R3 covers the grizzly Water = yes) → Fishes
bear => Mammal
• R3: (Give Birth = yes) & (Blood
Type = warm) → Mammals
• R4: (Give Birth = no) & (Can Fly =

• A lemur triggers rule R3, so it is no) → Reptiles
classified => Mammal
• A turtle triggers both R4 & R5 • R5: (Live in Water = sometimes)
• A dogfish shark trigger matches → Amphibians
none
Assessment of Rule
• Rule can be accessed based on two factors.
• Coverage and accuracy
• na = number of records covered by the rule(R).
• nc = number of records correctly classified by

rule(R).
• n = Total number of records

Coverage of a rule
• Coverage of a rule: Fraction of records that
satisfy the rule’s antecedent describes rule
coverage.
Coverage (R) = na / |n|
Accuracy of a rule
• Accuracy of a rule: Fraction of records that
meet the antecedent and consequent value
defines rule accuracy.
Accuracy (R) = nc / na
Building Classification Rules
Direct Method
• Extract rules directly from data
• RIPPER, Holte’s 1R (OneR)
Indirect Method
• Extract rules from other classification models
(e.g.decision trees, etc).
•
Indirect Method
Convert from the decision tree:

DW&amp;M Unit 3 Part I

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DW&amp;M Unit 3 Part I

Uploaded by

Copyright:

Available Formats

US-TCS-504

• The prediction technique can be used in the

• A classification model may be built to

• In this step the classification algorithms build the

• The classifier is built from the training set made up of

• Each tuple that constitutes the training set is referred

• Eager Learners – Eager learners construct a

• K-NN algorithm assumes the similarity

• Distance between Angelina and Ajay is

• So according to KNN algorithm, Angelina will

• Where S → Current state, and

• Calculate parent entropy E(S)

•Specificity or TNR (True Negative Rate):

The posterior probability can then be written as:

• On this we can predict whether we will play or

• P(No|x) = 0.00274/(0.0141 + 0.0274)

• R2: (Give Birth = no) & (Live in

• R3: (Give Birth = yes) & (Blood

• R4: (Give Birth = no) & (Can Fly

• R5: (Live in Water = sometimes)

• R4: (Give Birth = no) & (Can Fly =

• na = number of records covered by the rule(R).

• nc = number of records correctly classified by

• n = Total number of records

You might also like

DW&M Unit 3 Part I

DW&M Unit 3 Part I