Professional Documents
Culture Documents
1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 1 / 35
Univariate Analysis
In the vector view, we treat the sample as an n-dimensional vector, and write
X ∈ Rn .
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 2 / 35
Empirical Probability Mass Function
The empirical probability mass function (PMF) of X is given as
1 X
n
fˆ(x) = P(X = x) = I (xi = x)
n i =1
where the indicator variable I takes on the value 1 when its argument is true, and
0 otherwise. The empirical PMF puts a probability mass of 1n at each point xi .
1 X
n
F̂ (x) = I (xi ≤ x)
n i =1
If X is discrete, it is defined as
X
µ = E [X ] = x · f (x)
x
If X is continuous it is defined as
Z ∞
µ = E [X ] = x · f (x) dx
−∞
1 X
n
µ̂ = xi
n i =1
Frequency
bC
bC bC bC
bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
b X1
4 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
µ̂ = 5.843
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 6 / 35
Median
In terms of the (inverse) cumulative distribution function, the median is the value
m for which
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 7 / 35
Mode
The mode of a random variable X is the value at which the probability mass
function or the probability density function attains its maximum value, depending
on whether X is discrete or continuous, respectively.
The sample mode is a value for which the empirical probability mass function
attains its maximum, given as
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 8 / 35
Empirical CDF: sepal length
1.00
0.75
F̂ (x)
0.50
0.25
0
4 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
x
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 9 / 35
Empirical Inverse CDF: sepal length
8.0
7.5
7.0
6.5
F̂ −1 (q)
6.0
5.5
5.0
4.5
4
0 0.25 0.50 0.75 1.00
q
The median is 5.8, since
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 10 / 35
Range
The value range or simply range of a random variable X is the difference between
the maximum and minimum values of X , given as
r = max{X } − min{X }
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 11 / 35
Variance and Standard Deviation
The variance of a random variable X provides a measure of how much the values
of X deviate from the mean or expected value of X
X 2
(x − µ) f (x)
if X is discrete
x
σ 2 = var(X ) = E (X − µ)2 = Z
∞
(x − µ)2 f (x) dx if X is continuous
−∞
1 X
n
σ̂ 2 = (xi − µ̂)2
n i =1
The sample values for X comprise a vector in n-dimensional space, where n is the
sample size. Let Z denote the centered sample
x1 − µ̂
x2 − µ̂
Z = X − 1 · µ̂ = .
..
xn − µ̂
1 X
n
1 2 1
σ̂ 2 = kZ k = Z T Z = (xi − µ̂)2
n n n i =1
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 13 / 35
Variance of the Sample Mean and Bias
Sample mean µ̂ is itself a statistic. We can compute its mean value and variance
E [µ̂] = µ
σ2
var (µ̂) = E [(µ̂ − µ)2 ] =
n
The sample mean µ̂ varies or deviates from the mean µ in proportion to the
population variance σ 2 . However, the deviation can be made smaller by
considering larger sample size n.
The sample variance is a biased estimator for the true population variance, since
n−1
E [σ̂ 2 ] = σ2
n
E [σ̂ 2 ] → σ 2 as n → ∞
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 14 / 35
Bivariate Analysis
In bivariate analysis, we consider two attributes at the same time. The data D
comprises an n × 2 matrix:
X1 X2
x11 x12
D = x21 x22
.. ..
. .
xn1 xn2
Geometrically, D comprises n points or vectors in 2-dimensional space
x i = (xi 1 , xi 2 )T ∈ R2
D can also be viewed as two points or vectors in an n-dimensional space:
X1 = (x11 , x21 , . . . , xn1 )T
X2 = (x12 , x22 , . . . , xn2 )T
The bivariate mean is defined as the expected value of the vector random variable
X:
X1 E [X1 ] µ1
µ = E [X ] = E = =
X2 E [X2 ] µ2
!
X X 1 X
n
1X
n
µ̂ = x fˆ(x) = x I (x i = x) = xi
x x
n i =1 n i =1
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 16 / 35
Covariance
The covariance between two attributes X1 and X2 provides a measure of the
association or linear dependence between them, and is defined as
1 X
n
σ̂12 = (xi 1 − µ̂1 )(xi 2 − µ̂2 )
n i =1
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 17 / 35
Correlation
Pn
σ̂12 (xi 1 − µ̂1 )(xi 2 − µ̂2 )
ρ̂12 = = pPn i =1 Pn
σ̂1 σ̂2 i =1 (xi 1 − µ̂1 )
2
i =1 (xi 2 − µ̂2 )
2
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 18 / 35
Geometric Interpretation of Sample Covariance and
Correlation
Let X 1 and X 2 denote the centered attribute vectors in Rn :
x11 − µ̂1 x12 − µ̂2
x21 − µ̂1 x22 − µ̂2
X 1 = X1 − 1 · µ̂1 = .. X 2 = X2 − 1 · µ̂2 = ..
. .
xn1 − µ̂1 xn2 − µ̂2
The sample covariance and the sample correlation are given as
X T1 X 2
σ̂12 =
n
!T !
X T1 X 2 X TX 2 X X
ρ̂12 = q q =
1
=
1
2
= cos θ
X 1
X 2
X 1
X 2
X T1 X 1 X T2 X 2
The correlation coefficient is simply the cosine of the angle between the two
centered attribute vectors.
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 19 / 35
Geometric Interpretation of Covariance and Correlation
x1
Z2
b
Z1
b
x2
xn
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 20 / 35
Covariance Matrix
The variance–covariance information for the two attributes X1 and X2 can be
summarized in the square 2 × 2 covariance matrix
bC bC bC bC
3.5
bC bC bC bC bC bC bC bC bC
bC bC bC
bC
bC
bC
bC
bC
bC bC
bC
bC bC bC bC
bC bC bC bC bC bC
b= 0.681 −0.039
3.0 bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC Σ
bC bC
bC
bC
bC bC
bC bC
bC
bC
bC
bC
bC
bC
bC bC
bC
bC
bC
bC bC
−0.039 0.187
bC bC bC bC bC bC
bC bC bC bC bC
2.5 bC
bC
bC bC
bC
bC bC bC bC The sample correlation is
bC bC bC bC
bC bC
−0.039
2 bC ρ̂12 = √ = −0.109
4 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 0.681 · 0.187
X1 : sepal length
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 22 / 35
Multivariate Analysis
In the row view, the data is a set of n points or vectors in the d-dimensional
attribute space
x i = (xi 1 , xi 2 , . . . , xid )T ∈ Rd
In the column view, the data is a set of d points or vectors in the n-dimensional
space spanned by the data points
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 23 / 35
Mean and Covariance
In the probabilistic view, the d attributes are modeled as a vector random
variable, X = (X1 , X2 , . . . , Xd )T , and the points x i are considered to be a random
sample drawn from X , i.e., IID with X .
The multivariate mean vector is
T
µ = E [X ] = µ1 µ2 · · · µd
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 24 / 35
Covariance Matrix is Positive Semidefinite
Because Σ is also symmetric, this implies that all the eigenvalues of Σ are real and
non-negative, and they can be arranged from the largest to the smallest as
follows: λ1 ≥ λ2 ≥ · · · ≥ λd ≥ 0.
Qd
The total variance is given as: var (D) = i =1 σi2
Qd
The generalized variance is det(Σ) = i =1 λi ≥ 0
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 25 / 35
Sample Covariance Matrix: Inner and Outer Product
Let D represent the centered data matrix
T
x1 − µ̂T — x T1 —
x T − µ̂T x T2 —
2 —
D = D − 1 · µ̂T =
..
=
..
. .
T T
x n − µ̂ — x Tn —
Inner product and outer product form for sample covariance matrix:
T
X1 X1 X T1 X 2 ··· X T1 X d
T
X 2 X 1
X T2 X 2 ··· X T2 X d
X n
b = 1 DT D = 1
Σ b=1
Σ x i · x Ti
n n . .. .. .. n
.. . . . i =1
X Td X 1 T
X X2
d ··· T
Xd Xd
b is given as the pairwise inner or dot products of the centered attribute vectors,
i.e., Σ
normalized by the sample size, or as a sum of rank-one matrices obtained as the outer
product of each centered point.
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 26 / 35
Data Normalization
After transformation the new attribute takes on values in the range [0, 1].
Standard Score Normalization: Also called z-normalization; each value is
replaced by its z-score:
xi − µ̂
xi′ =
σ̂
where µ̂ is the sample mean and σ̂ 2 is the sample variance of X . After
transformation, the new attribute has mean µ̂′ = 0, and standard deviation σ̂ ′ = 1.
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 27 / 35
Normalization Example
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 28 / 35
Univariate Normal Distribution
A random variable X has a normal distribution, with the parameters mean µ and
variance σ 2 , if the probability density function of X is given as follows:
1 (x − µ)2
f (x|µ, σ 2 ) = √ exp −
2πσ 2 2σ 2
The term (x − µ)2 measures the distance of a value x from the mean µ of the
distribution, and thus the probability density decreases exponentially as a function
of the distance from the mean.
The maximum value of the density occurs at the mean value x = µ, given as
f (µ) = √ 1 2 , which is inversely proportional to the standard deviation σ of the
2πσ
distribution.
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 29 / 35
Normal Distribution: µ = 0, and Different Variances
f (x)
0.8
0.7
σ = 0.5
0.6
0.5
0.4
0.3 σ=1
0.2
0.1 σ=2
0 x
−6 −5 −4 −3 −2 −1 0 1 2 3 4 5
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 30 / 35
Multivariate Normal Distribution
( )
1 (x − µ)T Σ−1 (x − µ)
f (x|µ, Σ) = √ p exp −
( 2π)d |Σ| 2
(x i − µ)T Σ−1 (x i − µ)
measures the distance, called the Mahalanobis distance of the point x from the
mean µ of the distribution, taking into account all of the variance–covariance
information between the attributes.
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 31 / 35
Standard Bivariate Normal Density
−4 −3 −2 −1 0 1 2 3 4
−4
−3
0.
00
0 1 0 −2
07
0.
00
Parameters: µ = , Σ=
7
0.
0 0 1 −1
05
0.
13
0 b
X2
3
f (x)
4
X1
0.21
0.14
0.07 −4
0 b −3
−2
−1
−4
−3 0
−2 X2
−1 1
0
2
X1 1
2 3
3
4 4
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 32 / 35
Geometry of the Multivariate Normal
Compared to the standard multivariate normal, the mean µ translates the center of the
distribution, whereas the covariance matrix Σ scales and rotates the distribution. The
eigen-decomposition of Σ is given as
Σu i = λi u i
where λ1 ≥ λ2 ≥ . . . λd ≥ 0 are the eigenvalues and u i the corresponding eigenvectors.
This can be expressed compactly as follows:
Σ = UΛU T
where
λ1 0 ··· 0
0 λ2 ··· 0 | | |
Λ= . .. .. .. U = u 1 u2 ··· ud
.. . . . | | |
0 0 ··· λd
The eigenvectors represent the new basis vectors, with the covariance matrix given by Λ
(all covariances become zero). Since the trace of a square matrix is invariant to similarity
transformation, such as a change of basis, we have
d
X d
X
var (D) = tr (Σ) = σi2 = λi = tr (Λ)
i =1 i =1
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 33 / 35
Bivariate Normal for Iris: sepal length and sepal width
5.843
µ̂ =
3.054 f (x) 5 X2
b= 0.681 −0.039 4
Σ 3
−0.039 0.187
2
We have 1 u2
bC bC
bC Cb bC bC
bC bC bC Cb bC bC bC Cb bC bC
bC Cb bC Cb bC
b = UΛU T
Σ 2 bC
bC bC bC Cb bC Cb Cb bC Cb bC
bC bC bC
bC
bC bC
bC Cb Cb C b
3 bC
bC bC
bC bC bC bC bC
C b bC bC bC bC
bC bC bC Cb
bC Cb
bC Cb
bC
Cb bC bC bC bC bC bC bC bC bC bC bC bC
−0.997 −0.078 4
bC bC bC Cb bC Cb bC Cb
bC bC Cb bC Cb bC Cb bC
bC
bC Cb bC
U= bC
bC bC
bC
bC
bC
bC bC Cb
bC bC bC bC
bC
0.078 −0.997 bC
bC bC bC
5 bC
bC
0.684 0 6
Λ=
0 0.184 7 u1
8
Angle of rotation is:
9
cos θ = e T1 u 1 = −0.997 X1
or θ = 175.5◦
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 34 / 35
Data Mining and Machine Learning:
Fundamental Concepts and Algorithms
dataminingbook.info
1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 2: Numeric Attributes 35 / 35