You are on page 1of 10

Topics in Pattern Recognition

2nd Semester 2013 1. Lecturer Soo-Hyung Kim, Prof. Dept. ECE, Chonnam National University shkim@jnu.ac.kr http://pr.jnu.ac.kr/shkim (office) 062-530-3430 (mobile) 010-2687-3430

2. Textbooks (1) S. Theodoridis, A. Pikrakis, K. Koutroumbas, D. Cavouras, Introduction to Pattern Recognition - A MATLAB Approach, Academic Press, 2010. Korean Version: , MATLAB , , 2013. (2) S. Theodoridis and K. Koutroumbas, Pattern Recognition, 4th ed., Academic Press, 2009. 3. Course Schedule - Introduction to Pattern Recognition - Introduction to MATLAB - Chapter 1 Classifiers Based on Bayes Decision Theory - Chapter 2 Classifiers Based on Cost Function Optimization - Chapter 3 Feature Generation and Dimensionality Reduction - Chapter 4 Feature Selection - Chapter 5 Template Matching - Chapter 6 Hidden Markov Models - Chapter 7 Clustering Chapter 1 2 3 4 5 6 7 total #-pages 28 50 28 30 10 12 50 208 #-sections 10 8 6 8 4 3 7 46 #-examples 13 12 8 13 4 5 12 67 #-exercises 13 5 3 7 1 2 13 44

- 1 -

4. Grading Criteria - Presentation Quality for the assigned topic - Home Works: Exercises from every chapters - Final Exam at the end of the semester - Term Project (optional): only for the students who do not have any presentation in the class - Class Attendance 5. e-Class (via CNU portal) - All the class materials can be downloaded from the e-Class - Homeworks should be uploaded upon the e-Class - Any kind of announcement will be posted on the e-Class 6. Topic Assignment
Chapter Contents Introduction to PR Introduction to MATLAB 1.1 ~ 1.5 (parametric methods) 1 1.6 ~ 1.10 (nonparametric methods) Exercises (Homework) 2.1 ~ 2.3 (least error methods) 2 2.4 ~ 2.5 (support vector machine) 2.7 ~ 2.8 (Adaboost & MLP) Exercises (Homework) 3.1 ~ 3.3 (PCA & SVD) 3 3.4 ~ 3.5 (Fishers LDA & kernel PCA) Exercises (Homework) 4.1 ~ 4.6 (data normalization) 4 5 6 4.7 ~ 4.8 (feature selection) Exercises (Homework) 5.1 ~ 5.3 (matching sequences) & Exercises 6.1 ~ 6.3 (HMM) Exercises (Homework) 7.1 ~ 7.4 (sequential clustering) 7 7.5 (cost optimization clustering) 7.7 (hierarchical clustering) Exercises (Homework) Speaker S.H. Kim, Prof. S.H. Kim, Prof. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

- 2 -

CHAPTER 1 Classifiers Based on Bayes Decision Theory 1 1.1 Introduction 1 1.2 Bayes Decision Theory 1 1.3 Gaussian Probability Density Function 2 Example 1.3.1: Compute the value of a Gaussian PDF Example 1.3.2: Given two PDFs for w1 & w2, classify a pattern x (Ex 1.3.1) Repeat Example 1.3.2 with different prior probabilities Example 1.3.3: Generate N=500 samples according to given parameters 1.4 Minimum Distance Classifiers 6 1.4.1 Euclidean Distance Classifier 1.4.2 Mahalanobis Distance Classifier Example 1.4.1: Given two PDFs for w1 & w2 in 3D, classify a pattern x by a Euclidean distance classifier and Mahalanobis distance classifier 1.4.3 ML Parameter Estimation of Gaussian PDFs Example 1.4.2: Generate N=50 samples according to given m & S, and compute ML estimate of m & S using the given samples (Ex 1.4.1) Repeat Example 1.4.2 with N=500, and N=5000 Example 1.4.3: Generate 1000 train- & test-samples for 3 classes, compute ML estimates m & S for each class, classify a pattern x (Ex 1.4.2) Repeat Example 1.4.3 using different covariance S (Ex 1.4.3) Repeat Example 1.4.3 with different prior probabilities (Ex 1.4.4) Repeat Example 1.4.3 with different covariance, S1, S2, and S3 1.5 Mixture Models 11 Example 1.5.1: Generate and plot 500 points from a Gaussian mixture, with different mixing probabilities 1.6 The Expectation-Maximization Algorithm 13 Example 1.6.1: Generate 500 points from a Gaussian mixture, and estimate the parameters using EM algorithm with different initializations Example 1.6.2: Given two-class samples, 500 for each, estimate GMs for w1 and w2 respectively, and test the classification accuracy for a Bayesian classifier using another 1,000 samples (Ex 1.6.1) Repeat Example 1.6.2 with different initial parameters 1.7 Parzen Windows 19 Example 1.7.1: Generate N=1,000 points from a Gaussian mixture, and estimate the PDF using a Parzen window method with h=0.1 (Ex 1.7.1) Repeat Example 1.7.1 with different N & h (Ex 1.7.2) Repeat Example 1.7.1 with a 2-D PDF (Ex 1.7.3) Repeat Example 1.4.3 with Parzen window estimator 1.8 k-NN Density Estimation 21 Example 1.8.1: Repeat Example 1.7.1 with k-NN estimator with k=21

- 3 -

(Ex 1.8.1) Repeat Example 1.8.1 with different k and N (Ex 1.8.2) Repeat Example 1.4.3 with Pk-NN density estimator 1.9 Naive Bayesian Classifier 22 Example 1.9.1: Generate 50 5-D data from 2 classes, and estimate the two PDFs for naive Bayesian classifier, and then test with 10,000 samples (Ex 1.9.1) Classify the data in Example 1.9.1 with the original PDF, and repeat Example 1.9.1 with 1,000 training data 1.10 Nearest Neighbor Rule 25 Example 1.10.1: Generate 1,000 data from two equiprobable classes, and classifiy another 5,000 samples with k-NN classifier with k=3 and adopting squared Euclidean distance (Ex 1.10.1) Repeat Example 1.10.1 for k=1, 7, 15.

CHAPTER 2 Classifiers Based on Cost Function Optimization 29 2.1 Introduction 29 2.2 Perceptron Algorithm 30 Example 2.2.1: Generate 4 different data sets containing 1 & +1 classes; apply the perceptron algorithm to get the separating line 2.2.1 Online Form of the Perceptron Algorithm Example 2.2.2: Repeat Example 2.2.1 with the online version algorithm 2.3 Sum of Squared Errors (SSE) Classifier 35 Example 2.3.1: Generate train and test data, each of which has 200 samples from the two Gaussians having the same co-variance; apply SSE method to get the separating line; repeat with 100,000 samples; compare with the optimal Bayesian classifier Example 2.3.2: skip 2.3.1 Multi-class LS Classifier Example 2.3.3: Generate train data with 1,000 samples and test data with 10,000 samples from the three Gaussians having the same co-variance; apply SSE to get the three discriminant functions; prove that the value of these functions correspond to posterior probabilities; and compare with the optimal Bayesian classifier (Ex 2.3.1) Repeat Example 2.3.3 in more non-separable situation 2.4 SVM: The Linear Case 43 Example 2.4.1: Given 400 points of two classes in a 2D space, apply SVM with 6 different values of C, and compute the accuracy, count the support vectors, computer the margin, and plot the separating line (Ex 2.4.1) Repeat Example 2.4.1 with a different data distribution 2.4.1 Multi-class Generalization

- 4 -

Example 2.4.2: Given 3-classes samples as in Example 2.3.3, get the three SVM classifiers 2.5 SVM: Nonlinear Case 50 Example 2.5.1: Generate a set of N=150 training samples in 2D region [-5, 5][-5, 5], belonging to either one of +1 and 1 classes nonlinearly separable; apply linear SVM with C=2, tol=0.001; apply a nonlinear SVM using RBF kernel; apply a nonlinear SVM using polynomial kernel Example 2.5.2: Generate a set of 270 training samples in a 33 grid, or 9 cells, where the +1 and 1 classes are alternating; apply linear SVM with C=200, tol=0.001; apply a nonlinear SVM using RBF kernel; apply a nonlinear SVM using polynomial kernel 2.6 Kernel Perceptron Algorithm 58 2.7 AdaBoost Algoithm 63 Example 2.7.1: Apply Adaboost algorithm to make a classifier between two classes in 2D, where each class is a Gaussian mixture - use 100 samples where 50 are for class +1 and the other 50 are for class 1; observe the error rates according to the number of base classifiers (Ex 2.7.1) Repeat Example 2.7.1 where the two classes are described by normal distribution with means (1, 1) and (s, s), s=2, 3, 4, 6 2.8 Multi-Layer Perceptron 66 Example 2.8.1: Generate two-class samples in 2D, where class +1 is from a mixture of 3 Gaussians and class 1 is from another mixture of 4 Gaussians; train a 2-layer feedforward NN with 2 nodes in the hidden layer and another 2-layer feedforward NN with 4 hidden nodes by using a standard BP with lr=0.01; repeat the standard BP with lr=0.0001; train the same NNs with adaptive BP; (Ex 2.8.1) Repeat Example 2.8.1 where the two classes are more spread-out with the larger covariance values Example 2.8.2: Generate two-class samples in 2D, where class +1 is from a mixture of 4 Gaussians and class 1 is from another mixture of 5 Gaussians; train a 2-layer feedforward NN with 3 nodes in the hidden layer, a 2-layer feedforward NN with 4 hidden nodes, and a 2-layer feedforward NN with 10 hidden nodes by using a standard BP with lr=0.01; train the same NNs with adaptive BP; (Ex 2.8.2) Generate two-class samples in 2D, where class +1 is from a mixture of 8 Gaussians and class 1 is from another mixture of 8 Gaussians; train a 2-layer FNN with 7, 8, 10, 14, 16, 20, 32, 40 hidden nodes with a adaptive BP algorithm; repeat with more spreadout samples skip

- 5 -

CHAPTER 3 Data Transformation: Feature Generation and Dimensionality Reduction 3.1 Introduction 79 3.2 Principal Component Analysis(PCA) 79 Example 3.2.1: Generate a set of 500 samples from a Gaussian distribution, and perform PCA and get two eigenvalue/eigenvector pairs; repeat the same procedure with a different distribution Example 3.2.2: Generate two data sets X1 and X2; perform PCA on X1 and project the data on the first principal component; repeat with X2 (Ex 3.2.1) Generate two data sets X1 and X2, having different mean points in 3D; perform PCA on X1 and project the data on the first two principal components; repeat the same procedure with X2 3.3 Singular Value Decomposition Method 84 (Ex 3.3.1) Perform SVD on the data X1 and X2 in Ex 3.2.1 and compare Example 3.3.1: Generate a set of 100 samples from a Gaussian distribution in 2000-dimensional space; apply PCA and SVD and compare the results 3.4 Fishers Linear Discriminant Analysis 87 Example 3.4.1: Apply Fishers LDA on the data X2 in Example 3.2.2 Example 3.4.2: Generate 900 3D samples of two classes the first 100 samples from a zero-mean Gaussian distribution, and the rest from 8 groups of Gaussian distributions with different means; plot the data; perform Fishers LDA and project the data; repeat the same for a 3-class problem where the last group of 100 samples is labeled class 3 3.5 Kernel PCA 92 Example 3.5.1: Data set in original space; data set in transformed space; perform PCA in the transformed space; mapping back to the original space Example 3.5.2: Example 3.5.3: (Ex 3.5.1) 3.6 Eigenmap 101 skip

CHAPTER 4 Feature Selection 107 4.1 Introduction 107 4.2 Outlier Removal 107 Example 4.2.1: Generate N=100 points from a 1-D Gaussian distribution, and then add 6 outliers; use a library function for outlier detection 4.3 Data Normalization 108

- 6 -

Example 4.3.1: Normalize two different data by the use of three normalization methods 4.4 Hypothesis Testing: t-Test 111 Example 4.4.1: Given two-class data from equiprobable Gaussian distributions with m1=8.75 and m2=9 (variance=4); Test whether the two mean values differ significantly with significance level of 0.05 and 0.001 (Ex 4.4.1) Repeat the t-Test with different variance of 1 and 16 4.5 Receiver Operating Characteristic (ROC) Curve 113 Example 4.5.1: Consider two Gaussians m1=2, s1=1, and m2=0 s2=0; plot the respective histograms; get AUC using ROC function (Ex 4.5.1) Repeat Example 4.5.1 with { m1=m2=0 } and { m1=5, m2=0 } 4.6 Fishers Discriminant Ratio 114 Example 4.6.1: Given 200 samples of two Gaussians in 5D, compute the FDR values for the five features Example 4.6.2: Compute FDR values for the discriminatory power of the four features from a ultrasonic images 4.7 Class Separability Measures 117 4.7.1 Divergence Example 4.7.1: Generate 100 normally distributed samples for two different classes and compute divergence between the two classes (Ex 4.7.1) Repeat Example 4.7.1 with three different situations 4.7.2 Bhattacharyya Distance and Chernoff Bound Example 4.7.2: Compute Bhattacharyya distance and Chernoff bound for the same data in Example 4.7.1 (Ex 4.7.2) Compute Bhattacharyya distance and Chernoff bound for the same data in Ex 4.7.1 4.7.3 Scatter Matrices Example 4.7.3: For the data in Example 4.6.2, select three features out of the four based on the J3 measure (Ex 4.7.3) Compare the 3-feature obtained from Example 4.7.3 with all other 3-feature combinations 4.8 Feature Subset Selection 122 4.8.1 Scalar Feature Selection Example 4.8.1: Normalize the features of the data in Example 4.6.2, and use FDR to rank the four features; use scalar feature selection technique to rank the features 4.8.2 Feature Vector Selection Example 4.8.2: Among the four features from a mammography data, apply the exhaustive search method to sect the best combination of three features according to the divergency, Bhattacharyya distance, and the J3 measure

- 7 -

(Ex 4.8.2) Compute the J3 measure for mean, skewness, kurtosis in Example 4.8.2 and compare the results Example 4.8.3: Suboptimal searching (Ex 4.8.3) Repeat Example 4.8.3 with different number of selected features Example 4.8.4: Designing a Classification System via data collection, feature generation, feature selection, classifier design, and evaluation

CHAPTER 5 Template Matching 137 5.1 Introduction 137 5.2 Edit(Levenstein) Distance 137 Example 5.2.1: Compute the edit distance between book and bokks; Repeat for template and replatte (Ex 5.2.1) Find the most likely word with igposre among impose, ignore, and restore in terms of edit distance 5.3 Matching Sequences of Real Numbers 139 Example 5.3.1: Compute the matching cost between P={-1, -2, 0, 2} and T={-1, -2, -2, 0, 2} by Sakoe-Chiba local constraints Example 5.3.2: Compute the matching cost between P and T1, P and T2, where P={1, 0, 1} and T1={1, 1, 0, 0, 0, 1, 1, 1} or T2={1, 1, 0, 0, 1} using the standard Itakura constraints Example 5.3.3: Compute the matching cost between P={-8, -4, 0, 4, 0, -4} and T={0, -8, -4, 0, 4, 0, -4, 0, 0} by Sakoe-Chiba local constraints; repeat with endpoint constraints 5.4 Dynamic Time Warping in Speech Recognition 143 skip

CHAPTER 6 Hidden Markov Models 147 6.1 Introduction 147 6.2 Modeling 147 6.3 Recognition and Training 148 Example 6.3.1: Given a sequence of Head-Tail observations, O=HHHTHHHHTHHHHTTHH, and two M1, M2 with different transition matrices, compute the recognition probability, P(O|M1) and P(O|M2) Example 6.3.2: For the setting of Example 6.3.1, comoute the Viterbi score and respective best-state sequence for M1 and M2 Example 6.3.3: Train the HMM with a set of 70 training samples by using Baum-Welch algorithm; use two different initializations (Ex 6.3.1) Repeat Example 6.3.3 with a different initialization

- 8 -

Example 6.3.4: Repeat Example 6.3.3 by using Viterbi training Example 6.3.5: Compute the Viterbi score for a set of 30 test samples (Ex 6.3.2) Compare the two results in Example 6.3.5

CHAPTER 7 Clustering 159 7.1 Introduction 159 7.2 Basic Concepts and Definitions 159 Example 7.2.1: Two clusterings for 7 samples in 2D space 7.3 Clustering Algorithms 160 7.4 Sequential Algorithms 161 7.4.1 BSAS Algorithm 7.4.2 Clustering Refinement Example 7.4.1: Apply the BSAS algorithm on 15 samples with the variations in presentation order, , and q values Example 7.4.2: Generate 400 samples from 4 different Gaussians, and then apply BSAS algorithm by estimating the number of compact clusters (Ex 7.4.1) Repeat step 1 of Example 7.4.2 with a set of 300 samples from zero-mean and identity covariance matrix 7.5 Cost Function Optimization Clustering Algorithms 168 7.5.1 Hard Clustering Algorithms Example 7.5.1: Generate 400 samples of 4 groups in 2D; apply k-means algorithm for m=4; repeat for m=3; repeat m=5; repeat for m=4 with a specific initializations; (Ex 7.5.1) Apply k-means algorithm for m=2,3 on the data in Ex 7.4.1 Example 7.5.2: Generate 500 samples, where the first 400 are as in Example 7.5.1, and the other 100 from a uniform distribution in [-1, 12][-2, 12]; apply k-means for m=4 and compare the results Example 7.5.3: Apply k-means algorithm (m=2) for a set of 515 samples, where the first 500 stem from a zero-mean normal distribution, and the other 15 stem from a normal distribution centered at [5, 5] Example 7.5.4: Apply k-means algorithm for m=4 on a data set with 4 groups as in Figure 7.5 Example 7.5.5: Run the k-means algorithm for each value in a range, and find the significant knee on the various data in Example 7.5.1, Example 7.5.2, Example 7.4.2, and Ex 7.5.1 Example 7.5.6: Generate a set of 216 samples, where the first 100 stem from a zero-mean Gaussian, the next 100 stem from a Gaussian centered at [12, 13], and other two groups with 8 samples around [0, -40] and [-30, -30], respectively; apply k-means and PAM algorithm for m=2

- 9 -

(Ex 7.5.2) Repeat Example 7.5.1 for the PAM algorithm (Ex 7.5.3) Repeat Example 7.5.5 for the PAM algorithm (Ex 7.5.4) Repeat Example 7.5.1 using the GMDAS algorithm (Ex 7.5.5) Repeat Example 7.5.5 using the GMDAS algorithm 7.5.2 Nonhard Clustering Algorithms (Ex 7.5.6) Repeat Example 7.5.1 using FCM with q=2 (Ex 7.5.7) Repeat Example 7.5.3 using FCM with q=2 (Ex 7.5.8) Repeat Example 7.5.5 using FCM with q=2 (Ex 7.5.9) Repeat Example 7.5.1 using FCM with q=2, 10, 25, and compare Example 7.5.7: Apply FCM algorithm on the data in Example 7.5.6, and compare the result from k-means and PAM (Ex 7.5.10) Apply PCM algorithm on the data in Example 7.5.1 for m=4, 6, 3. Use q=2, n=3; repeat with different initialization (Ex 7.5.11) Apply PCM on the data in Example 7.5.3 for m=2, q=2 7.6 Miscellaneous Clustering Algorithms 189 7.7 Hierarchical Clustering Algorithms 198 7.7.1 Generalized Agglomerative Scheme 7.7.2 Specific Agglomerative Clustering Algorithms Example 7.7.1: Apply the single-link algorithm & complete link algorithm on a set of six points 7.7.3 Choosing the Best Clustering Example 7.7.2: Generate a set of 40 samples of 4 groups in 2D; apply single link and complete link algorithm; determine the best clusterings (Ex 7.7.1) Repeat Example 7.7.2 on the data with 4 clusters of 30, 20, 10, 51 points respectively skip

- 10 -

You might also like