You are on page 1of 16

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/305775161

Notes on Machine Learning Algorithms

Presentation · July 2016

CITATIONS READS

0 6,614

1 author:

Sérgio Viademonte
Institute of Technology Vale
20 PUBLICATIONS 66 CITATIONS

SEE PROFILE

All content following this page was uploaded by Sérgio Viademonte on 01 June 2018.

The user has requested enhancement of the downloaded file.


Notes on Machine Learning Algorithms

ITV
Applied Computing Group

Sergio Viademonte, PhD.

Sergio Viademonte, PhD. ITV DS


June 2016
Roadmap

• Data Mining Top Level Problems

• Decision Trees ! Classification

• Association Rules ! Association Pattern Mining, Classification

Sergio Viademonte, PhD. ITV DS


Roadmap
Data Mining Top Level Problems:

• Clustering
Given a data matrix D, partition its records into sets S1…Sn such that
records in each cluster are similar to each other.

• Classification
Learning the structure of a dataset of examples, already partitioned
into groups, referred as categories or classes.

• Association Pattern Mining


Given a binary n x d data matrix D, determine all subsets of columns
such that all the values in these columns take on the value of 1 for at
least a fraction s of the rows in the matrix. The relative frequency of a
pattern is referred to as its support.

• Outlier Detection
Given a data matrix D, determine the records of D that are very
different from the remaining records in D.
Sergio Viademonte, PhD. ITV DS Charu Aggarwal (2015). Data Mining: The textbook.
Roadmap

Supervised learning: learn by examples

• Decision Trees ! Classification

• Association Rules ! Association Pattern Mining, Classification

Sergio Viademonte, PhD. ITV DS


Decision Trees

• Classification process is modeled as a set of hierarchical decisions on the


feature variables, organized in a tree structure.

• Building trees: top down tree construction, bottom up tree


pruning.

• Splitting criteria supervised with the class label.

• Univariate and Multivariate splits.

Sergio Viademonte, PhD. ITV DS


Decision Trees

Algorithm (Data Set D)

begin
Create root node containing D;
loop
Grow it by selecting and eligible node in the tree;
Split the node into two or more nodes based on the split criterion;
until
No more nodes for split;
Prune overfitting nodes;
Label each leaf node with its dominant class;
end

Sergio Viademonte, PhD. ITV DS


Decision Trees
• Ex: To decide if stay home or play outside, based on weather conditions.
• Label: ToDo: play / home
• Features: Weather (Sunny / Rainy), Temperature (Warm, Cold)

sunny rainy
weather
1 Internal node is a test on an attribute

2 A branch represents an outcome of the test overcast


ex: Weather = Sunny

3 A leaf node represents a class label or class tempe


home
distribution play rature

warm cold
4 At each node, one attribute is
chosen to split the training examples
play home
into distinct classes

5 A new case is classified by following a matching path to a leaf node

Sergio Viademonte, PhD. ITV DS


Decision Trees
• Splitting attribute
A goodness function is used to evaluate attributes for splitting.
Typical goodness functions:

• error rate
p = fraction of instances in a set of data points S
S belongs to a class label
er = 1 – p
Lowest values of error rate are better
Compute weighted average of ER of individuals attribute values Si

• gini index (CART / IBM intelligent miner).


• Developed by Corrado Gini, published in his 1912 paper "Variability and Mutability”
• Measure the discriminative power of a particular feature

r
S (S1…Sr) = Σ |Si | / |S| . G(Si)
i=1

Calculate the overall Gini index, based on the target attribute G(Stg)
Calculate the Gini index for each individual attribute/value G(Si)
Calculate the Gain for attribute Si, G(Stg) - G(Si), chose the attribute with the largest Gain.
Sergio Viademonte, PhD. ITV DS
Decision Trees

• Splitting attribute

• information gain, entropy (ID3 / C4.5):

Let pj, fraction of data points in class j, for the attribute value vi, than
the class entropy E(vi) is defined as follows:
k
E(vi) = - Σ pj log2(pj) ,
j =1

Lower values for Entropy are the better (value 0 implies a perfect
separation)
Posterior distribution p(x | a) for x given a.
E(vi) = [ 0, log2(k) ]

Sergio Viademonte, PhD. ITV DS


Decision Trees

• Splitting attribute

• reliefF / Fisher linear discrimination

Let µj and δj be the mean and standard deviation of data point


belonging to class J, for a feature n, pj the fraction of data points
belonging to class J. Let µ be the global mean of the data on feature n.
The Fisher score F for feature n is defined as:
k 2 k 2
Fn = Σ pj (µj - µ) ⁄ Σ pj δj
j=1 j=1

The numerator quantifies the average interclass separation, and the


denominator quantifies the average intraclass separation.
Attributes with higher values of Fisher score may selected as
predictors for classification algorithms.

Sergio Viademonte, PhD. ITV DS


Decision Trees

• Algorithms

• ID3 Interative Dichotomiser 3 (Ross Quinlan)


• CHAID CHi-squared Automatic Interaction Detection, (Gordon V. Kass.)
• C4.5 extension of the basic ID3 algorithm, (Ross Quinlan).
• address the some issues not dealt with by ID3
Overfitting the data
Set how deeply to grow a decision tree.
Reduced error pruning.
Handling continuous attributes.
Handling training data with missing attribute values
• C4.8 (J4.8 in Weka)
• C5.0 (Ross Quinlan).
• CART, Classification and Regression Tress (Breiman, Freedman, Olshen, Stone, 1984)

Example:

Sergio Viademonte, PhD. ITV DS


Association Rules

I is a set of n binary attributes called items, I = {I1, I2, ... , In}

Given X, Y ⊂ I, and X ∩ Y = Ø.

D = {T1, T2, ..., Tn} a set of T distinct transactions, where:


each transaction Ti = {Ii1, Ii2, ..., Iik}, is a set of items I, where Iij ∈ I and T ⊆ I.

X is called antecedent (left-hand-side, LHS).


Y is called consequent (right-hand-side, RHS).

Sergio Viademonte, PhD. ITV DS


Association Rules
The strength of an association rule can be measured in terms of
its support s and confidence c.

Support of an itemset is defined as the proportion of transactions in the database


which contain the itemset. How often a rule is applicable to a given data set.

The rule X ⇒ Y has support S if S% of the transactions in D contain X ∪ Y.

Support s(X à Y) = (X U Y) / N

Confidence is the ratio of the number of transaction that contain X ∪ Y to the number of
transactions that contain X, given by the following expression:

C = support (X ∪ Y) / support (X)

How frequently items in Y appear in transactions that contain X.

Sergio Viademonte, PhD. ITV DS


Association Rules

TID Bread Milk Coffe Beer Eggs


1 1 1 0 0 0
2 1 0 1 1 1
3 0 1 1 1 0
4 1 1 1 1 0
5 1 1 1 0 0

Tuple: {Milk, Coffe, Beer}

Rule: {Milk, Coffe} -> {Beer}

S {Milk, Coffe,Beer} = Σ Tuple / n = 2/5 = 0.4

C {Milk, Coffe} ->{ Beer} = 2/3 = 0.67

Sergio Viademonte, PhD. ITV DS


Association Rules
A common strategy adopted by many association rule mining
algorithms is to decompose the problem into two major subtasks:

1.Frequent Itemset Generation, whose objective is to find all the item-sets that
satisfy the minsup threshold. These itemsets are called frequent
itemsets.

2. Rule Generation, whose objective is to extract all the high-confidence rules from
the frequent itemsets found in the previous step. These rules are called strong rules.

Some AR generator algorithms:

• AIS (Agrawal, Imielinsk and Swami, 1993)


• SETM (Houtsma and Swami, 1993)
• DHP (Park, Chen and Yu, 1995)
• AprioriTid (Agrawal et al., 1998)
• Apriori (Agrawal et al., 1996).

Sergio Viademonte, PhD.


View publication stats
ITV DS

You might also like