Univariate and Multivariate Density

AIPC404 Fundamentals of
Machine Learning
UNIT 1
Bayesian Decision Theory and Normal
Distribution
Staff Incharge
Dr. M. Kalaiselvi Geetha
Professor
Department of CSE
Univariate and Multivariate
Normal Densities
Dr. M. Kalaiselvi Geetha, Professor, Dept.

of CSE, AU 2
Normal Distribution
Normal distribution or Gaussian distribution is a continuous
probability distribution that describes
The data that is distributed normally with
mean and variance.
The variable x is distributed normally with

mean and variance.
It can be categorized into:
 Univariate Density :
It involves single variable (one dimension). Ie., distribution of one
single variable. Example: Height of person
 Multivariate Density :
It involves more than one variable (two or more dimensions).
Example: Height and weight of person

of CSE, AU 3
Mean of The Dataset
• Consider the example data set
X = 1, 4, 2, 12, 15, 25, 67, 65, 6, 98
• The centroid of the points is defined by the
mean of each variable
• Mean is measured as
Dr. M. Kalaiselvi Geetha, Professor, Dept. 4

of CSE, AU 4
Drawbacks of Mean
• Mean does not give information about the
spread of the data
• Two data sets having different spread may
have same mean
• Example:
a = [3, 1, 24, 12] and b = [11, 9, 7, 13]
Mean = 10

of CSE, AU 5
Standard Deviation
• The Standard Deviation (SD) of a data set is a
measure of spread of this data
• SD can be defined as the average distance from
the mean of the data set to a point

of CSE, AU 6
Standard Deviation
• High standard deviation
– data are spread over a large
range of values
• Low standard deviation

– data points are very close
to the mean

of CSE, AU 7
Standard Deviation – Example 1
a = [3, 1, 24, 12] SD = 9.08
b = [11, 9, 7, 13] SD = 2.236
Mean = 10
a has more SD, when compared to b
Mean
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Reason:
Spread of data in a from the mean is more
of CSE, AU 8
Standard Deviation – Example 2
c = [15, 15, 15, 15, 15] SD=0
Reason:
• Spread is alike
• None of the data points deviate from the mean

of CSE, AU 9
Variance
• The spread of data can be measured using
another measure called variance.
• The variance of each data is the average squared
deviation of its n values around the mean of that
variable
• SD is square root of variance

of CSE, AU 10
Drawbacks
• Standard deviation and variance can be applied on
one dimensional data sets
• Goal of statistical analysis is to find the relationship
between the data and their dimensions.
Example:
student who has attended more classes –
attendance percentage is high or not.

of CSE, AU 11
Covariance
• It is needed to have a measure to see how two
different dimensions are related to each other
• Degree to which the data points are linearly
correlated is represented by their covariance
• Covariance (σ) for two dimensional data

of CSE, AU 12

Mean Vector and Covariance

 11
Mean 11


vector
12
12
and.covariance
. . .  for
1 n 
1n 
more
than onedimension 


  21
21  22
22 . .  . . 
1 2 n2 n 
  . .
31 

  
1
  32 

  
 
33
11
 . .
12
2
. 3.n 

1n
   31 32 33 3 n


 .   

2 21 22 2n
  .
  .
 ..  .. ... .. . . .. .  
31 32 33 3n
 .

 . 
n1  
.
. 
  
 n1 nn22  
 

. .
n
 . n1
.
n2
n
nn
 nn 
nn

of CSE, AU 13
Covariance Matrix
• For a n dimensional data, the covariance matrix
has (n × n) elements
• If the data is three dimensional, the covariance
matrix has (3×3) elements
• For 2 dimensional data,

of CSE, AU 14
Significance of Covariance
• If the value of covariance matrix is
 positive - both the dimensions increase together
 negative - as one dimension increases other
decreases
 zero - two dimensions are independent of each
other
Dr. M. Kalaiselvi Geetha, Professor, Dept. of CSE, AU 15

15
Univariate Density
Probability density function for univariate density is written as
 
1 x μ
 2
1    
p x   exp    
2π σ 

2 σ
 


 
Where μ is mean, σ 2 is variance.
 Mean ( μ) :
It is the average of the given feature x and it is
μ 
given by x
n
 Variance ( σ)2
The spread of the data can be measured using

variance.
2
σ  
x 
μ 2
n
Univariate Density (contd….)
The Normal distribution is symmetrical about mean and it
is a “Bell-Shaped Curve”.
Peak of univariate normal distribution occurs at x = µ and

1
its value is .
2 
Width of the univariate normal distribution proportional to

standard deviation (σ). 1
2πσ
0.607 0.607
2πσ 2πσ
P(x)
x
µ-2σ µ-σ µ µ+σ µ+2σ
Fig. 1 : Univariate normal distribution
Example for univariate density :
Height (h) of 165 170 160 154 175 155 167 177 158 178
males (adult)
Fit an univariate normal distribution (Gaussian) for h.

Mean  μ    
n
h 1
10
165 170 178  
1659
10
 165 9
  h  μ 2 1
Variance σ 2     165  165 9 2    178  165 9 2   72 89
n 10
Fig. 2 : Univariate normal distribution

for height of males (adult)
Univariate Density (contd….)
Test data :
i) Height (h) = 100
Find probability density function P(100) being an adult and
threshold (T ) = 0.00005
  
2
1 1

100
165
9

P
100 
exp  
5.485
-
015
0
2

3
14
8
54
2
8
54 
 
Result: The height 100 is not in the normal distribution,

so the person is not an adult (P(100) <T).
ii) Height (h) = 160
Find probability density function P(160) being an adult and

threshold (T ) = 0.00005
  
2
1 1

160 
165
9


P
160 
exp  0.03
2

3
14
8
54
2
854
 
Result: The height 160 is in the normal distribution,

so the person is an adult ( P(160)>T ).
Multivariate Density
The general multivariate normal density in d-dimensions is written as
1  1 
exp    x  μ    1  x  μ  
t
P(x) 
 2π  d   2 
Where x is a d - dimensiona l column vector

μ is the d - dimensiona l mean vector
 is the d - by - d covariance matrix
 is determinant of covariance matrix
 1 is inverse of covariance matrix
 x  μ  t is transpose of  x  μ 
Mean Vector (µ) and Covariance Matrix (∑)
1 n
μ 
 1
σ11 σ12  σ
1d  σ ij  σ ji   (x i -μ i )(x j -μ j )

μ 2
μ 
  σ
21
σ
22
 σ 
2d
n 1
 
       σij is variance between xi and xj ,
 μd  σd1 σd2  σ 
dd  i, j=1…d
Multivariate Density (contd…)
Covariance matrix ( ) is symmetric matrix and its diagonal elements are
variances within x which can only be positive.
Off-diagonal elements are the covariances which can be +ve or –ve.
Statistically Dependent Variables:
The variables which are causally related are called statistically dependent
variables.
Example: engine temperature and oil temperature
Statistically Independent Variables:

The variables which are not causally related are called statistically
independent variables.
Example: oil pressure in engine and air pressure in tire
Multivariate Density (contd…)
If the variables are statistically independent, the covariances are zero and
covariance matrix is a diagonal matrix.
σ 2  1 
0  0   σ2
0  0

 1   1 
 0
 σ2
2
 0   1
 
 0

1
σ2
2
 0 
 
  σ 12  σ 22     σ d2 
      
       
 0 0  σ d2   0 0  1 

 σ d2 
Multivariate Density ( Bivariate density ):
P(x) for a two dimensional (bivariate) data is a bell/ hill shaped

surface over the two dimensional plane (x1, x2).
Peak of the bivariate normal distribution occurs at  x1, x2    μ1, μ2 

1
and its value is .1
2π  2
The shape of the hump depends on the two variances

σ
correlation coefficient(ρ) by ρ  12
σ σ
1 2
Example for Multivariate Density ( Bivariate density ):
Height (h) of 165 170 160 154 175 155 167 177 158 178
males
Weight (w) of 78 71 60 53 72 51 64 65 55 69
males
Fit a bivariate normal distribution (Gaussian) for h and w.
Mean Vector (µ)

h 1
 μ1  μ1    165  170    178   165  9
μ    n 10
 μ2  w 1
μ2    78  71    69   63  8
n 10
165  9 
 
 63  8 
Bivariate Density (contd…)
Covariance Matrix (∑) 1 n
σ 
σ ij  σ ji   (x i - μ i )(x j - μ j )
 11
σ12  n 1
 
σ
 21 σ 22 
i, j  1 to 2.
 
1
σ 11   165  165 9  2  170  165 9  2    178  165 9  2 
10
 7289
1
σ 12   165  165 9  78  638   170  165 9  71  638     178  165 9  69  638
10
 5278
1
σ 21   78  638  165  165 9    71  638  170  165 9      69  638  178  165 9
10
 5278
1
σ 22   78  638  2   71  638  2     69  638  2 
10
 7216
72 89 52 78 

 
52  78 72  16 
Bivariate Density (contd…)
Fig. 3 : Bivariate normal distribution

for height and weight of males
Test data: Bivariate Density (contd…)
i) Height, weight = 75, 25
Find probability density function P(75, 25) being an adult and

threshold (T) = 0.000005.
P(75,25) = 1.26e-29 = 0
Result: Height and weight are not in the bivariate normal
distribution, so the person is not an adult. ( P(75,25) < T )
ii) Height, weight = 160, 60
Find probability density function P(160, 60) being an adult and

threshold (T) = 0.000005.
P(160, 60) = 0.0023

Result: Height and weight are in the bivariate normal distribution,
so the person is an adult. ( P(160,60) > T )

Univariate and Multivariate Density

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Univariate and Multivariate Density

Uploaded by

Copyright:

Available Formats

AIPC404 Fundamentals of

Dr. M. Kalaiselvi Geetha, Professor, Dept.

The variable x is distributed normally with

Dr. M. Kalaiselvi Geetha, Professor, Dept.

Dr. M. Kalaiselvi Geetha, Professor, Dept. 4

Dr. M. Kalaiselvi Geetha, Professor, Dept. 5

Dr. M. Kalaiselvi Geetha, Professor, Dept. 6

• Low standard deviation

Dr. M. Kalaiselvi Geetha, Professor, Dept. 7

• None of the data points deviate from the mean

Dr. M. Kalaiselvi Geetha, Professor, Dept. 9

• SD is square root of variance

Dr. M. Kalaiselvi Geetha, Professor, Dept. 11

Dr. M. Kalaiselvi Geetha, Professor, Dept. 12

Dr. M. Kalaiselvi Geetha, Professor, Dept.

Dr. M. Kalaiselvi Geetha, Professor, Dept. 14

Dr. M. Kalaiselvi Geetha, Professor, Dept. of CSE, AU 15

Where μ is mean, σ 2 is variance.

The spread of the data can be measured using

Peak of univariate normal distribution occurs at x = µ and

Width of the univariate normal distribution proportional to

Fit an univariate normal distribution (Gaussian) for h.

Fig. 2 : Univariate normal distribution

Find probability density function P(100) being an adult and

Result: The height 100 is not in the normal distribution,

Find probability density function P(160) being an adult and

Result: The height 160 is in the normal distribution,

Where x is a d - dimensiona l column vector

Statistically Independent Variables:

Multivariate Density ( Bivariate density ):

P(x) for a two dimensional (bivariate) data is a bell/ hill shaped

Peak of the bivariate normal distribution occurs at  x1, x2    μ1, μ2 

The shape of the hump depends on the two variances

Fit a bivariate normal distribution (Gaussian) for h and w.

Mean Vector (µ)

72 89 52 78 

Fig. 3 : Bivariate normal distribution

Find probability density function P(75, 25) being an adult and

ii) Height, weight = 160, 60

Find probability density function P(160, 60) being an adult and

P(160, 60) = 0.0023

You might also like