Professional Documents
Culture Documents
MACHINE LEARNING
DECISION TREE
A decision tree is a type of Supervised Machine Learning in which each internal node represents
a test on a feature (e.g. whether a coin flip comes up heads or tails) each leaf node represents
a class label (decision taken after computing all features) and branches represent conjunctions
of features that lead to those class labels. The paths from root to leaf represent classification
rules.
• Inductive bias is a preference for smaller trees over larger ones
• They classify instances by sorting them down the tree from the root to some leaf node
• Each node in the tree specifies a test of some attribute of the instance, and each branch
descending from that node corresponds to one of the possible values for this attribute
Decision trees are constructed via an algorithmic approach that identifies ways to split a data
set based on different conditions. Decision Trees are a supervised learning method used for
both classification and regression tasks.
Decision trees – appropriate problems
Decision tree learning is generally best suited to problems with the following characteristics:
ENTROPY.
Entropy is a measure of the randomness in the information being processed. The higher the
entropy, the harder it is to draw any conclusions from that information. Flipping a coin is an
example of an action that provides information that is random. Entropy measures homogeneity
of examples
c
(General Equation) In general Entropy ( S ) ≡ ∑ − p i log 2 pi Where pi is the proportion of S
i=1
belonging to class i
(Formula) Given a collection S, containing positive and negative examples of some target
concept, the entropy of S relative to this boolean classification is
Entropy ( S ) ≡− p⊕ log 2 p ⊕− p⊖ log 2 p ⊖ Where p⊕ is the proportion of positive examples in S and
p⊖ is the proportion of negative examples in S
Example
• Suppose S is a collection of 14 examples of some boolean concept
• 9 positive and 5 negative examples ¿
Entropy of S relative to this boolean classification is
Entropy ¿¿ 0.940
Entropy is 0 if all members of S belong to the same class, either positive or negative
Entropy is 1 when the collection contains an equal number of positive and
negative(randomness)
INFORMATION GAIN.
Information gain is the measure used by the ID3 algorithm to identify the best attribute at any
particular node of the tree. Constructing a decision tree is all about finding an attribute that
returns the highest information gain and the smallest entropy.
Information gain identifies how much impurity is removed in a set of examples if a particular
attribute is chosen at a particular node
• The aim is to reduce that impurity so that for a particular attribute
constraint all the examples are either positive or negative
Information gain, Gain ( S , A ) of an attribute A , relative to a collection of examples S, is defined
|S v|
as Gain ( S , A ) ≡ Entropy ( S )− ∑ Entropy ( S v ) where Values ( A ) is the set of all possible
v ∈Values ( A ) |S|
values for attribute A and Sv is the subset of S for which attribute A has value v
• Gain ( S , A ) is therefore the expected reduction in entropy caused by knowing the value
of attribute A
𝐺𝑎𝑖𝑛(𝑆, 𝑂𝑢𝑡𝑙𝑜𝑜𝑘)=0.246 (𝑂𝑢𝑡𝑙𝑜𝑜𝑘 provides the best prediction of the target concept)
𝐺𝑎𝑖𝑛(𝑆, 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦)=0.151
𝐺𝑎𝑖𝑛(𝑆, 𝑊𝑖𝑛𝑑)=0.048
𝐺𝑎𝑖𝑛(𝑆, 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒)=0.029
LECTURE 5
MACHINE LEARNING
What is Multivariate Regression?
Multivariate Regression is a supervised machine learning algorithm involving multiple data
variables for analysis. Multivariate regression is an extension of simple linear regression. It is
used when we want to predict the value of a variable based on the value of two or more
different variables. The variable we want to predict is called the Dependent Variable, while
those used to calculate the dependent variable are termed as Independent Variables.
Mathematical equation
The simple regression linear model represents a straight-line meaning y is a function of x. When
we have an extra dimension (z), the straight line becomes a plane. Here, the plane is the
function that expresses y as a function of x and z. The linear regression equation can now be
expressed as: y = m1.x + m2.z+ c
where, n represents the number of independent variables, β0~ βn represent the coefficients
and x1~xn, are the independent variable
You can estimate missing data within your data range (Interpolation)
You can estimate future data outside your data range (Extrapolation)
Interpolation Versus Extrapolation.
In regression tasks, we use data to generalize a function that maps a set of input variables X to
output variables y. Using this function mapping, a y value can be predicted for any combination
of input variables. This process is referred to as interpolation when the input variables lie in
between the training data, whereas if the point of estimation lies outside this region it is
referred to as extrapolation.
Let's consider the example of college graduates.
• Let's assume we have access to somewhat sparse data where we know the number of
college graduates every 4 years, as shown in the scatter plot below.
We want to estimate the number of college graduates for all the missing years in between. We
can do this by fitting a line to the limited available data points. This process is called
interpolation.
Let’s assume we have access to limited data from the year 2001 to the year 2012, and
we want to predict the number of college graduates from the year 2013 to 2018.
It can be seen that the number of college graduates with master’s degrees increases almost
linearly with the year.
Hence, it makes sense to fit a line to the dataset. Using the 12 points to fit a line, and then test
the prediction of this line on the future 6 points, it can be seen that the prediction is very close.
This process is called extrapolation.
Linear Regression
In simple words, Linear Regression is the supervised Machine Learning model in which the
model finds the best fit linear line between the independent and dependent variable i.e it finds
the linear relationship between the dependent and independent variable.
Linear Regression is of two types:
1. Simple Linear Regression
Simple Linear Regression is where only one independent variable is present and the model has
to find the linear relationship of it with the dependent variable
2. Multiple Linear Regression( In chapter 5).
VARIABLE ROLES
LECTURE 3
MACHINE LEARNING
Concept learning
Learning generalized hypotheses from specific positive and negative examples
Concept Learning can be seen as a problem of searching through a predefined space of
potential hypotheses for the hypothesis that best fits the training examples.
– Search takes place in the hypothesis space
?, Cold , High, ?, ?, ?
Enjoy Sport is true on Cold days with High humidity no matter if the Sky is Sunny or Rainy, Wind
is Strong or Weak, Water is Warm or Cool, and Forecast is Same or Change
• Most general hypothesis – Every day is a positive example <?, ?, ?, ?, ?, ?>
• Most specific hypothesis – No day is a positive example <0, 0, 0, 0, 0, 0>
• EnjoySport concept learning task requires learning the sets of days for which
EnjoySport=yes,
h1 = (Sunny, ?, ?, Strong, ?, ?)
h2 = (Sunny, ?, ?, ?, ?, ?)
Now consider the sets of instances that are classified positive by hl and by h2.
– Because h2 imposes fewer constraints on the instance, it classifies more instanceses
– positive.
– In fact, any instance classified positive by hl will also be classified positive by h2.
– Therefore, we say that h2 is more general than hl
FIND-S Algorithm
FIND-S Algorithm starts from the most specific hypothesis and generalize it by considering only
positive examples.
FIND-S algorithm ignores negative examples.
– As long as the hypothesis space contains a hypothesis that describes the true target
concept, and the training data contains no errors, ignoring negative examples does not cause to
any problem.
FIND-S algorithm finds the most specific hypothesis within H that is consistent with the positive
training examples.
We can start the search with the most specific hypothesis and then keep generalizing it till:
– all the positive examples satisfy it
– and no negative example is covered by it
Training data
h = <Ø, Ø, Ø, Ø, Ø, Ø>
x1=<sunny, warm, normal, strong, warm, same> + h1=<sunny, warm, normal, strong, warm, same>
x2=<sunny, warm, high, strong, warm, same> + h1=<sunny, warm, ?, strong, warm, same>
x3=<rainy, cold, high, strong, warm, change> - h1=<sunny, warm, ?, strong, warm, same>
x4=<sunny, warm, high, strong, cool, change> + h1=<sunny, warm, ?, strong, ?,?>
– Assumptions:
o Hypothesis space H contains a hypothesis that describes the true
target concept
o Current hypothesis is the most specific hypothesis in consistent
with the observed positive examples
o Initialize h to the most specific hypothesis in
o For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x #(value= h-value)
– Then do nothing
Else replace ai in h by the next more general constraint
that is satisfied by
o Output hypothesis
Candidate-Elimination Algorithm
• FIND-S outputs a hypothesis from H, that is consistent with the training examples, this is just
one of many hypotheses from H that might fit the training data equally well.
The key idea in the Candidate-Elimination algorithm is to output a description of the set of all
hypotheses consistent with the training examples.
Consistent Hypothesis
Version Space
Version Space is the collection of all consistent hypothesis
• The set of all hypotheses consistent with the training data
– Learned by the candidate-elimination algorithm
• Definition: The version space, denoted VSH.D, with respect to hypothesis space H and
training examples D, is the subset of hypotheses H from consistent with the training
examples in D.
VS H , D h H | Consistent h, D
Compact Representation of Version Spaces
A version space can be represented with its general and specific boundary sets.
The Candidate-Elimination algorithm represents the version space by storing only its most
general members G and its most specific members S.
Example Version Space
S0 : , , , , ,
S1 : Sunny , Warm , Normal , Strong , Warm , Same
S 2 : Sunny , Warm , ?, Strong , Warm , Same
G0 , G1 , G2 : ?, ?, ?, ?, ?, ?
G4 : Sunny , ?, ?, ?, ?, ? ?,Warm , ?, ?, ?, ?
G3 : Sunny , ?, ?, ?, ?, ? ?,Warm , ?, ?, ?, ? ?, ?, ?, ?, ?, Same