ML Class ITV

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/305775161
Notes on Machine Learning Algorithms
Presentation · July 2016
CITATIONS READS
0 6,614
1 author:
Sérgio Viademonte
Institute of Technology Vale
20 PUBLICATIONS 66 CITATIONS
SEE PROFILE
All content following this page was uploaded by Sérgio Viademonte on 01 June 2018.
The user has requested enhancement of the downloaded file.

Notes on Machine Learning Algorithms
ITV
Applied Computing Group
Sergio Viademonte, PhD.
Sergio Viademonte, PhD. ITV DS

June 2016
Roadmap
• Data Mining Top Level Problems
• Decision Trees ! Classification
• Association Rules ! Association Pattern Mining, Classification

Roadmap
Data Mining Top Level Problems:
• Clustering
Given a data matrix D, partition its records into sets S1…Sn such that
records in each cluster are similar to each other.
• Classification
Learning the structure of a dataset of examples, already partitioned
into groups, referred as categories or classes.
• Association Pattern Mining

Given a binary n x d data matrix D, determine all subsets of columns
such that all the values in these columns take on the value of 1 for at
least a fraction s of the rows in the matrix. The relative frequency of a
pattern is referred to as its support.
• Outlier Detection
Given a data matrix D, determine the records of D that are very
different from the remaining records in D.
Sergio Viademonte, PhD. ITV DS Charu Aggarwal (2015). Data Mining: The textbook.
Roadmap
Supervised learning: learn by examples
• Decision Trees ! Classification
• Association Rules ! Association Pattern Mining, Classification

Decision Trees
• Classification process is modeled as a set of hierarchical decisions on the

feature variables, organized in a tree structure.
• Building trees: top down tree construction, bottom up tree

pruning.
• Splitting criteria supervised with the class label.
• Univariate and Multivariate splits.

Decision Trees
Algorithm (Data Set D)
begin
Create root node containing D;
loop
Grow it by selecting and eligible node in the tree;
Split the node into two or more nodes based on the split criterion;
until
No more nodes for split;
Prune overfitting nodes;
Label each leaf node with its dominant class;
end

Decision Trees
• Ex: To decide if stay home or play outside, based on weather conditions.
• Label: ToDo: play / home
• Features: Weather (Sunny / Rainy), Temperature (Warm, Cold)
sunny rainy
weather
1 Internal node is a test on an attribute
2 A branch represents an outcome of the test overcast

ex: Weather = Sunny
3 A leaf node represents a class label or class tempe

home
distribution play rature
warm cold
4 At each node, one attribute is
chosen to split the training examples
play home
into distinct classes
5 A new case is classified by following a matching path to a leaf node

Decision Trees
• Splitting attribute
A goodness function is used to evaluate attributes for splitting.
Typical goodness functions:
• error rate
p = fraction of instances in a set of data points S
S belongs to a class label
er = 1 – p
Lowest values of error rate are better
Compute weighted average of ER of individuals attribute values Si
• gini index (CART / IBM intelligent miner).

• Developed by Corrado Gini, published in his 1912 paper "Variability and Mutability”
• Measure the discriminative power of a particular feature
r
S (S1…Sr) = Σ |Si | / |S| . G(Si)
i=1
Calculate the overall Gini index, based on the target attribute G(Stg)
Calculate the Gini index for each individual attribute/value G(Si)
Calculate the Gain for attribute Si, G(Stg) - G(Si), chose the attribute with the largest Gain.
Decision Trees
• information gain, entropy (ID3 / C4.5):
Let pj, fraction of data points in class j, for the attribute value vi, than
the class entropy E(vi) is defined as follows:
k
E(vi) = - Σ pj log2(pj) ,
j =1
Lower values for Entropy are the better (value 0 implies a perfect
separation)
Posterior distribution p(x | a) for x given a.
E(vi) = [ 0, log2(k) ]

Decision Trees
• reliefF / Fisher linear discrimination
Let µj and δj be the mean and standard deviation of data point

belonging to class J, for a feature n, pj the fraction of data points
belonging to class J. Let µ be the global mean of the data on feature n.
The Fisher score F for feature n is defined as:
k 2 k 2
Fn = Σ pj (µj - µ) ⁄ Σ pj δj
j=1 j=1
The numerator quantifies the average interclass separation, and the

denominator quantifies the average intraclass separation.
Attributes with higher values of Fisher score may selected as
predictors for classification algorithms.

Decision Trees
• Algorithms
• ID3 Interative Dichotomiser 3 (Ross Quinlan)

• CHAID CHi-squared Automatic Interaction Detection, (Gordon V. Kass.)
• C4.5 extension of the basic ID3 algorithm, (Ross Quinlan).
• address the some issues not dealt with by ID3
Overfitting the data
Set how deeply to grow a decision tree.
Reduced error pruning.
Handling continuous attributes.
Handling training data with missing attribute values
• C4.8 (J4.8 in Weka)
• C5.0 (Ross Quinlan).
• CART, Classification and Regression Tress (Breiman, Freedman, Olshen, Stone, 1984)
Example:

Association Rules
I is a set of n binary attributes called items, I = {I1, I2, ... , In}
Given X, Y ⊂ I, and X ∩ Y = Ø.
D = {T1, T2, ..., Tn} a set of T distinct transactions, where:

each transaction Ti = {Ii1, Ii2, ..., Iik}, is a set of items I, where Iij ∈ I and T ⊆ I.
X is called antecedent (left-hand-side, LHS).

Y is called consequent (right-hand-side, RHS).

Association Rules
The strength of an association rule can be measured in terms of
its support s and confidence c.
Support of an itemset is defined as the proportion of transactions in the database

which contain the itemset. How often a rule is applicable to a given data set.
The rule X ⇒ Y has support S if S% of the transactions in D contain X ∪ Y.
Support s(X à Y) = (X U Y) / N
Confidence is the ratio of the number of transaction that contain X ∪ Y to the number of
transactions that contain X, given by the following expression:
C = support (X ∪ Y) / support (X)
How frequently items in Y appear in transactions that contain X.

Association Rules
TID Bread Milk Coffe Beer Eggs

1 1 1 0 0 0
2 1 0 1 1 1
3 0 1 1 1 0
4 1 1 1 1 0
5 1 1 1 0 0
Tuple: {Milk, Coffe, Beer}
Rule: {Milk, Coffe} -> {Beer}
S {Milk, Coffe,Beer} = Σ Tuple / n = 2/5 = 0.4
C {Milk, Coffe} ->{ Beer} = 2/3 = 0.67

Association Rules
A common strategy adopted by many association rule mining
algorithms is to decompose the problem into two major subtasks:
1.Frequent Itemset Generation, whose objective is to find all the item-sets that
satisfy the minsup threshold. These itemsets are called frequent
itemsets.
2. Rule Generation, whose objective is to extract all the high-confidence rules from
the frequent itemsets found in the previous step. These rules are called strong rules.
Some AR generator algorithms:
• AIS (Agrawal, Imielinsk and Swami, 1993)

• SETM (Houtsma and Swami, 1993)
• DHP (Park, Chen and Yu, 1995)
• AprioriTid (Agrawal et al., 1998)
• Apriori (Agrawal et al., 1996).
Sergio Viademonte, PhD.

View publication stats
ITV DS

ML Class ITV

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Class ITV

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Notes on Machine Learning Algorithms

Presentation · July 2016

The user has requested enhancement of the downloaded file.

Sergio Viademonte, PhD.

Sergio Viademonte, PhD. ITV DS

• Data Mining Top Level Problems

• Decision Trees ! Classification

• Association Rules ! Association Pattern Mining, Classification

Sergio Viademonte, PhD. ITV DS

• Association Pattern Mining

Supervised learning: learn by examples

• Decision Trees ! Classification

• Association Rules ! Association Pattern Mining, Classification

Sergio Viademonte, PhD. ITV DS

• Classification process is modeled as a set of hierarchical decisions on the

• Building trees: top down tree construction, bottom up tree

• Splitting criteria supervised with the class label.

• Univariate and Multivariate splits.

Sergio Viademonte, PhD. ITV DS

Algorithm (Data Set D)

Sergio Viademonte, PhD. ITV DS

2 A branch represents an outcome of the test overcast

3 A leaf node represents a class label or class tempe

5 A new case is classified by following a matching path to a leaf node

Sergio Viademonte, PhD. ITV DS

• gini index (CART / IBM intelligent miner).

• information gain, entropy (ID3 / C4.5):

Sergio Viademonte, PhD. ITV DS

• reliefF / Fisher linear discrimination

Let µj and δj be the mean and standard deviation of data point

The numerator quantifies the average interclass separation, and the

Sergio Viademonte, PhD. ITV DS

• ID3 Interative Dichotomiser 3 (Ross Quinlan)

Sergio Viademonte, PhD. ITV DS

I is a set of n binary attributes called items, I = {I1, I2, ... , In}

D = {T1, T2, ..., Tn} a set of T distinct transactions, where:

X is called antecedent (left-hand-side, LHS).

Sergio Viademonte, PhD. ITV DS

Support of an itemset is defined as the proportion of transactions in the database

The rule X ⇒ Y has support S if S% of the transactions in D contain X ∪ Y.

C = support (X ∪ Y) / support (X)

How frequently items in Y appear in transactions that contain X.

Sergio Viademonte, PhD. ITV DS

TID Bread Milk Coffe Beer Eggs

Tuple: {Milk, Coffe, Beer}

Rule: {Milk, Coffe} -> {Beer}

S {Milk, Coffe,Beer} = Σ Tuple / n = 2/5 = 0.4

C {Milk, Coffe} ->{ Beer} = 2/3 = 0.67

Sergio Viademonte, PhD. ITV DS

Some AR generator algorithms:

• AIS (Agrawal, Imielinsk and Swami, 1993)

Sergio Viademonte, PhD.

You might also like