You are on page 1of 1

Electrical & Electronics Engineering Department

EE546 – Pattern Recognition – Homework # 3
Deadline: December 16, 2015

Important Note: Please include all the codes you wrote. Don’t forget adding your comments about the method and the
results for each question! You can return the answers via e-mail or as hardcopy!
Question :
The iris data set, one of the most well-known data set, has been used in pattern recognition literature to evaluate the
performance of various classification and clustering algorithms. This data set consists of 150 4-dimensional patterns
belonging to three types of iris flowers (setosa, versicolor, and virginica). There are 50 patterns per class. The 4 features
correspond to: sepal length in cm, sepal width in cm, petal length in cm, and petal width in cm. The data can be accessed at
http://www.cse.msu.edu/~cse802/iris.data . The class labels are indicated at the end of every pattern.
Randomly choose two third of the data (from each class) for training the classifier and the remaining one third for testing the
classifier. Find the non-parametric density estimates of the three classes using Parzen window density estimation method.
Assume 4 dimensional Gaussian kernel when estimating the density and try window widths of 0.01, 0.5, and 10.0. For the
Gaussian kernel, assume that the covariance matrix is diagonal. The diagonal entries are the sample variance. Repeat this
data-splitting fifteen times (That is do this experiment 15 times by randomly choosing training and testing data for each case).
Report the average and the variance of the error rate of the parzen classifier. Report error rates (average and variance) on the
test set for the above window widths. Plot the average error rate as a function of window width.
A sample MATLAB program for Parzen window density estimation with Gaussian kernel can be obtained at the attachment.
There is a file named parzenWindowDensityEstimator.m . The co-variance matrix of the Gaussian kernel is assumed to be
diagonal and the diagonal entries are the sample variance.