You are on page 1of 19

PRNN ASSIGNMNET –2

M.Renuka Sr.No: 14794 MTech(res) EE


Q1:
This problem is estimating a mixture density using EM algorithm. Four data sets are given, each
of which is a mixture of four one dimensional Gaussians.
Details of data:
I. Means and variances: μ1 = 0, μ2 = 4, μ3 = 8, μ4 = 12, σi2 = 2
(a) Mixture coefficients: λ = (0.25, 0.25, 0.25, 0.25)
(b) Mixture coefficients: λ = (0.1, 0.3, 0.4, 0.2)
II. Means and variances: μ1 = 0, μ2 = 2, μ3 = 4, μ4 = 6, σi2 = 2
(a) Mixture coefficients: λ = (0.25, 0.25, 0.25, 0.25)
(b) Mixture coefficients: λ = (0.1, 0.3, 0.4, 0.2)
Explanation:
Expectation and maximization algorithm is used to estimate the mixture density model. From EM
algorithm, iterative equations are used to find the means, variances, mixture coefficients and
responsible coefficients. Iteration is stopped when the absolute difference between the present and
previous log likelihood values is less than 0.01.
Results:
Means and variances: μ1 = 0, μ2 = 4, μ3 = 8, μ4 = 12, σi2 = 2
(a) Mixture coefficients: λ = (0.25, 0.25, 0.25, 0.25)
Initialization using k-mean clustering algorithm:
Means: [ 0.392 8.570 4.75 12.485]
Variances: [ 1.661 1.175 1.448 1.443 ]
Convex coefficients: [0.25, 0.25, 0.25, 0.25]
No of Data samples: 1000 500
Estimated parameters:
Means 3.864 11.967 7.981 0.236 0.393 12.621 4.79 8.814
variance 2.058 2.239 2.384 1.971 1.956 1.609 3.705 2.75
Mixture coefficients 0.244, 0.235, 0.267, 0.252 0.286, 0.164, 0.284, 0.264

Initialization using random values:


Means: [ 1.5 2.8 9.9 11.45]
variances: [ 0.9 2.7 3.4 2.2]
convex coefficients: [ 0.25 0.25 0.25 0.25]
No of Data samples: 1000 500
Estimated parameters:
Means 12.25829065 3.6784469 9.05761227 12.93881526
0.37893025 7.85906749 0.25113002 4.34910373
variance 1.92048256 1.25302852 5.64241879 1.08190963
2.1401904 4.8833603 1.75769022 5.90657845
Mixture coefficients 0.18528773 0.15717964 0.35232415 0.10171543
0.27488474 0.38264789 0.24240155 0.30355887
(b) Mixture coefficients: λ = (0.1, 0.3, 0.4, 0.2)
Initialization using k-mean clustering algorithm:
Means: [ 0.19369554 7.94349252 4.14737299 12.03507276]Variances: [ 1.62768382
1.2086989 1.00322176 1.67682529]convex coefficients: [0.25, 0.25, 0.25, 0.25]
No of Data samples: 1000 500
Estimated parameters:
Means 0.41243477 7.76352293 8.04201751 0.03975008
3.89869537 12.13636008 12.60059602 3.89781969
variance 2.27914732 2.89059773 2.31114475 2.28833371
1.0858704 1.95865604 1.52660472 1.58020699
Mixture coefficients 0.14780525122195676, 0.43560656760595462,
0.44358003176793837, 0.12994418152740156,
0.2279460288737257, 0.15565545512639165,
0.18066868813637915 0.27879379574025215

Initialization using random values:


Means: [ 1.5 2.8 9.9 11.45]
variances:[ 0.9 2.7 3.4 2.2]convex coefficients: [ 0.25 0.25 0.25 0.25]

No of Data samples: 1000 500


Estimated parameters:
Means 0.65373244 4.43670409 8.99909169 3.9833741
8.47397735 12.16314289 2.38597548 -1.22288273
variance 1.19151201 5.41281454 8.17225195 5.2586788
1.80223968 1.86780262 4.51487774 0.68610243
Mixture coefficients 0.06191664 0.49390164 0.58070149 0.31464642
0.26222952 0.1819522 0.05685935 0.04779275

Means and variances: μ1 = 0, μ2 = 2, μ3 = 4, μ4 = 6, σi2 = 2


a) Mixture coefficients: λ = (0.25, 0.25, 0.25, 0.25)
Initialization using k-mean clustering algorithm:
Means:[ 4.15122552 -0.52552276 6.59405083 1.8548552 ]
Variances:[ 0.41437168 0.84328951 0.89881232 0.44422335]
convex coefficients:[0.25, 0.25, 0.25, 0.25]
No of Data samples: 1000 500
Estimated parameters:
Means 3.98196298 -0.2309566 0.84150743 3.61432439
6.07766755 1.68508014 5.98199189 -1.57751029
variance 0.91949574 1.53849932 0.76572659 0.91925715
1.75023034 1.11978284 1.99001761 0.40930648
Mixture coefficients 0.29003727239852684, 0.29956457127115832,
0.20928111983823364, 0.33595482871716786,
0.24976450317845478, 0.2819950413805053,
0.25091710458478472 0.082485558631168548

Initialization using random values:


Means:[ 0.5 2.8 5.1 5.8]
Variances:[ 0.92 2.3 3.46 2.24]
convex coefficients:[ 0.25 0.25 0.25 0.25]
No of Data samples: 1000 500
Estimated parameters:
Means 3.92409613 5.87708489 - 4.78542496 3.11354441
0.03004335 2.15423479 0.04556595 9.80261266
variance 2.2577949 2.21361 3.47412482 4.5591313
1.68231266 3.26459354 1.82047346 0.02900859
Mixture coefficients 0.25845686 0.22958064 0.3769312 0.38075661
0.19181533 0.32014717 0.23997685 0.00233534
b) Mixture coefficients: λ = (0.1, 0.3, 0.4, 0.2)
Initialization using k-mean clustering algorithm:
Means:[ 4.32616535 2.2879091 6.71675145 -0.15379188]
Variances:[ 0.3988787 0.42015898 0.94862613 1.09906159]
convex coefficients[0.25, 0.25, 0.25, 0.25]
No of Data samples: 1000 500
Estimated parameters:
Means 4.18442402 2.36756763 3.73301298 5.69157774
6.27277664 0.968141454 2.03378372 0.43883713
variance 1.33864308 1.89928169 1.52026073 2.34973117
1.93084925 3.27172711 1.42999243 3.57233625
Mixture coefficients 0.30380456414062901, 0.24378931649934515,
0.29670581601993296, 0.3362607640540069,
0.1898430386374001, 0.29028521044803973,
0.20964658120203791 0.12966470899860824

Initialization using random values:


Means:[ 0.5 2.8 5.1 5.8]
Variances:[ 0.92 2.3 3.46 2.24]
convex coefficients:[ 0.25 0.25 0.25 0.25]
No of Data samples: 1000 500
Estimated parameters:
Means 4.42623909 1.49262621 3.02602852 9.27485898
2.33093529 7.87752826 5.99020967 2.50752538
variance 3.44446978 4.31674734 2.0821693 0.22802885
3.1350812 0.53218903 1.52620706 5.30478667
Mixture coefficients 0.53267028 0.19072128 0.32092395 0.00676934
0.26211833 0.01449011 0.2157721 0.4565346
Observations:
1. As the data sample size is increasing, absolute value of log likelihood value is increasing
and EM algorithm is giving better estimates, since EM algorithm maximizes the
expectation of complete data log likelihood (or minimizes the negative log likelihood).
2. As iteration number increases, log likelihood value is increasing (or negative log likelihood
value is decreasing) which is validating the convergence of EM algorithm.
3. EM algorithm is very sensitive to initializations. EM algorithms is giving better
performance with the initial values which are generated from k-means clustering algorithm
(k=4). Since, k-means clustering aims to partition the data samples into k-clusters in which
each data sample belongs to the cluster with the nearest mean, serving as a prototype of the
cluster. Randomly initialized EM algorithm doesn't give better performance compared to
K-means clustering initialized EM algorithm. If the random initialization of means of all
the densities are very close, then EM algorithm is not giving accurate estimates.
4. EM algorithm accuracy depends on number of mixture components (k). If k = 4 EM
algorithm estimated density perfectly fits the data histogram because the actual model of
the data has 4 mixture density components. If k =2 (k<4) EM algorithm estimated density
doesn't fit the data histogram perfectly since the actual model of the data has 4 mixture
densities. If k=6 (k>4) EM algorithm estimated density overfits the data histogram.
Q2:
Results:
(a): Considering full data size
Means:
[ 0.0084 , 0.0385, 0.0401, 0.0914, 0.2851699,0.007, 0.060, 0.0475, 0.003, 0.092]
[ 1.802, 1.985, 1.837, 1.712, 1.927, 1.7469, 2.013, 1.465, 2.071, 1.919]
Variances:
Diagonal elements:
[ 2.805, 1.984, 1.367, 2.712, 1.925, 2.464, 2.216, 2.465, 2.452, 2.019]
Diagonal elements:
[2.678, 2.345, 1.736, 2.687, 2.223, 1.371, 2.516, 2.896, 1.937, 1.959]
Mixture coefficients: [0.456, 0.544]

(b): Considering full data size


Means:[ 0.0145 , 0.0293, 0.0145, 0.0457, 0.3857, 0.0165, 0.2604, 0.0537, 0.0433, 0.0527]
[ 1.9245, 0.9586, 1.3456, 1.9211, 2.2751, 2.6759, 2.0135, 2.0567, 1.8956, 2.3034]
Variances:
Diagonal elements:
[ 1.342, 2.345, 2.567, 1.854, 1.456, 2.456, 1.789, 2.344, 1.355, 2.012]
Diagonal elements:
[1.834, 2.445, 2.836, 1.687, 2.356, 2.171, 1.576, 2.578, 1.872, 2.192]
Mixture coefficients: [0.259, 0.741]

Observations:
5. As the data sample size is increasing, absolute value of log likelihood value is increasing
and EM algorithm is giving better estimates, since EM algorithm maximizes the
expectation of complete data log likelihood (or minimizes the negative log likelihood).
6. As iteration number increases, log likelihood value is increasing (or negative log likelihood
value is decreasing) which is validating the convergence of EM algorithm.
7. EM algorithm is very sensitive to initializations. EM algorithms is giving better
performance with the initial values which are generated from k-means clustering algorithm
(k=4). Since, k-means clustering aims to partition the data samples into k-clusters in which
each data sample belongs to the cluster with the nearest mean, serving as a prototype of the
cluster. Randomly initialized EM algorithm doesn't give better performance compared to
K-means clustering initialized EM algorithm. If the random initialization of means of all
the densities are very close, then EM algorithm is not giving accurate estimates.
8. EM algorithm accuracy depends on number of mixture components (k). If k = 4 EM
algorithm estimated density perfectly fits the data histogram because the actual model of
the data has 4 mixture density components. If k =2 (k<4) EM algorithm estimated density
doesn't fit the data histogram perfectly since the actual model of the data has 4 mixture
densities. If k=6 (k>4) EM algorithm estimated density overfits the data histogram.
Q3:
Explanation:
This problem is 2-class classification problem where class conditional densities are mixtures of
Gaussians. Bayes' classifiers are implemented.
Class conditional densities are estimated in two ways:
(i) The class conditional densities are mixture of two Gaussians and estimating the density using
EM algorithm
(ii) the class conditional densities are single Gaussians and estimating it using maximum likelihood
method
The accuracies of these two Bayes' classifiers are compared and also these classifiers are compared
with nearest neighbour classifier.
Given data: class conditional densities: f1 – N(0,2), f2 – N(2,2), f3 – N(4,2), f4 – N(6,2)
DataSet1:
Histogram:

Class1: 0.5f1 + 0.5f2 Class2: 0.5f3+0.5f4


Results:
Initialized values for EM algorithm for class one:
means:[0.9, 3]
variances:[1.5, 3]
convex coefficients:[0.5, 0.5]
Initialized values for EM algorithm for class two:
means:[3.2, 7.8]
variances:[1.5, 3.4]
convex coefficients:[0.5, 0.5]
For full size of training data,
Estimated parameters using ML:
Class1: mean = 0.92773412 variance = 3.04458713
Class2: mean = 5.10724124 variance = 2.97988952
Estimated parameters using EM:
Class 1:
Means: 0.45327577 , 1.79083831
variances = 2.68604845, 2.54236184
Convex coefficients: 0.6452, 0.3547
Class 2:
Means: 4.48803422, 5.89692537
Variances: 2.66078359, 2.2742712
Convex coefficients: 0.560 0.4394
DataSet2:
Histogram:

Class1: 0.5f1 + 0.5f3 Class2: 0.5f2 + 0.5f4


Results:
Initialized values for EM algorithm for class one:
means:[0.9, 3]
variances:[1.5, 3]
convex coefficients:[0.5, 0.5]
Initialized values for EM algorithm for class two:
means:[3.2, 7.8]
variances:[1.5, 3.4]
convex coefficients:[0.5, 0.5]
For full size of training data,
Estimated parameters using ML:
Class1: mean = 1.8530873 variance = 5.65446024
Class2: mean = 4.01297498 variance = 5.97318128
Estimated parameters using EM:
Class 1:
Means: -0.12535948 , 3.71909734
variances = 1.68473773, 2.22478517
Convex coefficients: 0.4853, 0.51467
Class 2:
Means: 2.43829025, 6.18947128
Variances: 2.7963326, 2.19972346
Convex coefficients: 0.5802, 0.4197

Observations:
1. In all the cases Bayes' classifier using both ML estimation and EM algorithm estimation is
outperforming than the nearest neighbour classifier.
2. Accuracy of EM algorithm for dataset1 is high compared to accuracy of dataset2. In
dataset1 case means of class conditional densities are well separated. So the overlapping
region of class 1 and class2 densities of dataset2 is high compared to dataset1. So Bayes'
classifier with these class conditional densities is not performing better. For ML estimation
also the same reason is valid.
3. EM algorithms is giving better performance with the initial values which are generated
from k-means clustering algorithm (k=2). Since, k-means clustering aims to partition the
data samples into k-clusters in which each data sample belongs to the cluster with the
nearest mean, serving as a prototype of the cluster.
4. EM algorithm accuracy depends on number of mixture components (k). If k = 2 EM
algorithm estimated density perfectly fits the data histogram because the actual model of
the data has 2 mixture density components. If k =1 (k<2) EM algorithm estimated density
doesn't fit the data histogram perfectly since the actual model of the data has 4 mixture
densities. So when the classifier is designed with these this class conditional densities it
doesn't give better accuracy. If k=4 (k>2) EM algorithm estimated density overfits the
data histogram. The classifier gives better accuracy. But it can't have generalization ability.
Q4:
Explanation:
This problem is 2-class classification problem where class conditional densities are mixtures of
Gaussians. Bayes' classifiers are implemented.
Class conditional densities are estimated in two ways:
(i) The class conditional densities are mixture of two Gaussians and estimating the density using
EM algorithm
(ii) the class conditional densities are single Gaussians and estimating it using maximum likelihood
method
The accuracies of these two Bayes' classifiers are compared and also these classifiers are compared
with nearest neighbour classifier.
Given data: class conditional densities: f1 – N(0,2), f2 – N(4,2), f3 – N(8,2), f4 – N(12,2)
DataSet1:
Histogram:
Class1: 0.5f1 + 0.5f2 Class2: 0.5f3+0.5f4
Results:
Initialized values for EM algorithm for class one:
means:[0.9, 3]
variances:[1.5, 3]
convex coefficients:[0.5, 0.5]
Initialized values for EM algorithm for class two:
means:[6.9, 11.6]
variances:[1.5, 3.4]
convex coefficients:[0.5, 0.5]
For full size of training data,
Estimated parameters using ML:
Class1: mean = 1.98873822 variance = 10.03301297
Class2: mean = 6.06924616 variance = 5.73212566
Estimated parameters using EM:
Class 1:
Means: 1.732479, 2.11602963
variances = 1.732479, 2.11602963
Convex coefficients: 0.4545, 0.5454
Class 2:
Means: 7.6593934], 11.43740568
Variances: 1.44899449, 2.96050072
Convex coefficients: 0.3717, 0.6282
DataSet2:
Histogram:

Class1: 0.5f1 + 0.5f3 Class2: 0.5f2+0.5f4


Results:
Initialized values for EM algorithm for class one:
means:[0.9, 6.5]
variances:[1.5, 3]
convex coefficients:[0.5, 0.5]
Initialized values for EM algorithm for class two:
means:[3.5, 11.0]
variances:[1.5, 3.4]
convex coefficients:[0.5, 0.5]
For full size of training data,
Estimated parameters using ML:
Class1: mean = 3.97720365 variance = 17.64253797]
Class2: mean = 8.09626954 variance = 18.53789343
Estimated parameters using EM:
Class 1:
Means: 0.13237427, 8.07812975
variances = 1.71688626, 2.04401085
Convex coefficients: 0.5161, 0.4838
Class 2:
Means: 12.03635727, 1.80758142
Variances: 1.80758142, 2.09601184
Convex coefficients: 0.483, 0.5162
Observations:
1. In all the cases Bayes' classifier using both ML estimation and EM algorithm estimation is
outperforming than the nearest neighbour classifier.
2. EM algorithms is giving better performance with the initial values which are generated
from k-means clustering algorithm (k=2). Since, k-means clustering aims to partition the
data samples into k-clusters in which each data sample belongs to the cluster with the
nearest mean, serving as a prototype of the cluster.
3. For dataset1 and dataset2 EM algorithm is giving better performance compared to nearest
neighbour and ML algorithm. ML estimation based classifier of dataset2 is having poor
performance compared to dataset1 case. By observing the data histogram of the data set of
class1 and class2 it is inferred that with single gaussian model, the histogram can not be
fitted. So the Bayes' classifier with the ML estimated class conditional densities doesn’t
give better accuracy.
4. EM algorithm accuracy depends on number of mixture components (k). If k = 2 EM
algorithm estimated density perfectly fits the data histogram because the actual model of
the data has 2 mixture density components. If k =1 (k<2) EM algorithm estimated density
doesn't fit the data histogram perfectly since the actual model of the data has 4 mixture
densities. So when the classifier is designed with these this class conditional densities it
doesn't give better accuracy. If k=4 (k>2) EM algorithm estimated density overfits the
data histogram. The classifier gives better accuracy. But it can't have generalization ability.

You might also like