Professional Documents
Culture Documents
Adane L. Mamuye
June 2020
Outline
• Missing values
• Duplication
• Inconsistent
• Outliers
• Data cleaning:
– Filling in missing values
– Numerosity reduction
• Parametric (regression and log-linear model) and
nonparametric (histograms, clustering, sampling and data
cube aggregation.)
– Data compression
• Lossless and lossy data compression techniques
Data Transformation and Discretization
• Given:
– A set of classes
– Instances (examples) of
each class
• Described as a set of
features or attributes
and their values
• Generate: A method (aka
model) that when given a
new instance it will
determine its class
Supervised Learning
• Classification
– Output type: discrete (binary/multi-classes)
– Trying to find: a boundary
– Evaluation: accuracy
• Regression
– Output type: continuous
– Trying to find: best fit line
– Evaluation: sum of squared errors
Slide share
Classification Techniques
Where
• x---- class
• a1 to ak– attribute values
• Wo-wk--- weights– calculated from the
training data
Linear Regression
• The predicted value, not the actual, for the first instance’s
class can be written as:
Source
Logistic Regression
source
Logistic Regression
• From linear to logistic regression--- using sigmoid function.
0 + 1 𝑥1 + 2 𝑥2 …𝑛 𝑥𝑛
• Decision Boundary
Adane L. Mamuye
June 2020
Outline
1. Select the attribute that performs best and use it as the root of the
tree;Order the attribute based on decreasing entropy or highest
information gain to lowest information gain.
2. To decide the descendant node down each branch of the root
(parent node), sort the training examples according to the value
related to the current branch and repeat the process described in
steps 1 and 2--- a recursive process
What is Good Attribute
• In other words:
– We want a measure that prefers attributes that have a high
degree of ”order”
• Maximum order: all examples are of the same class
• Minimum order: all classes are equally likely
• Needs a measure of impurity
Measures of Node Impurity
• Information Gain
– Determine how informative an attribute is
– Attributes are assumed to be categorical
• Gini Index
– Attributes are assumed to be continuous
– Assume there exist several possible split values for each
attribute
Information Gain
• Example:
Name Acidity Strength Class
Durability
Type-1 7 7 Bad
Type-2 7 4 Bad
Type-3 3 1 Good
Type-4 1 4 Good
• Advantages
– Conceptually simple, easy to understand and explain
– Very flexible decision boundaries
– Not much learning at all
• Disadvantages
– It can be hard to find a good distance measure
– Irrelevant features and noise can be very detrimental
– Typically can not handle more than a few dozen attributes
– Computational cost: requires a lot computation and memory
Thank You
Supervising Learning 3
Adane L. Mamuye
June 2020
Outline
– Receives inputs
– Activation function.
How ANN works
Activation functions
Preceptor
w15 4 w46
2 w24 6
x2
w25
w56
w34 5
w35
x3 3
6 = 6 + Δ𝑗 (0.1)(0.9)(0.1311
• Given:
– A doctor knows that meningitis causes stiff neck 50% of the
time
– Prior probability of any patient having meningitis is1/50,000
– Prior probability of any patient having stiff neck is1/20
• Belief Measure:
• Bayesian Networks
implicitly encode the
Markov assumption.
The joint probability
becomes:
Training Bayesian Belief Networks
Source
Thank you
Supervising Learning 5
Adane L. Mamuye
June 2020
Introduction SVM
• Output: set of weights w (or wi), one for each feature, whose
linear combination predicts the value of y. (just like neural
nets)
SVM-Mathematical Concepts
• Samples geometrically.
Purpose of vector representation
• Representing each sample/patient as a vector allows to
geometrically represent the decision surface that separates
two groups of samples/patients.
• Addition
Basic Operations on Vectors
• Subtraction
Basic Operations on Vectors
• Dot product
Basic Operations on Vectors
Where:
- where 𝑊 = {𝑤 1,
𝑤2, . . . , 𝑤𝑑} is
weight vector;
- 𝑏 is called as bias
Linear and non-linear separable data
• Genes of an individual
is represented using a
string (binary values),
in terms of an alphabet
• Simplest approach:
1. Generate multiple classification models
2. Each votes on test instance
3. Take majority as classification
Ensemble Method