Professional Documents
Culture Documents
Univariate Outliers
Bivariate Outliers
X
Detecting Univariate Outliers
If data is normally distributed, outliers can be identified with respect to
spread of mean with confidence intervals (usually 95% or higher)
Box and Whisker plot is most popular in this regard to detect univariate
outliers
Box and Whisker Plot
(i) for at least k objects o’∈D \ {p} it holds that d(p,o’) ≤ d(p,o), and
(ii) for at most k-1 objects o’∈D \ {p} it holds that d(p,o’) < d(p,o).
lrd MinPts(o)
lrd
MinPts ( p )
LOFMinPts( p )
oN MinPts ( p )
N MinPts( p )
It is the average of the ratio of the local reachability density of p and
those of p’s MinPts-nearest neighbors
Mahalanobis Distance Measure
Mahalanobis distance measures the distance between a point and a
distribution
This is a multi variate generalization of the concept that tries to find out
how many std deviation away a data point is lying from the mean of the
distribution
Mathematically,
DM x T S 1 x
Mahalanobis Distance Measure
“The Mahalanobis distance is simply the distance of the test point from
the center of mass divided by the width of the ellipsoid in the direction
of the test point” Source: wikipedia