You are on page 1of 147

4.

Pattern
Recognition

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
▷ Introduction to Pattern Recognition System

▷ Feature Extraction 효율화: Haar-like feature와 Integral Image

▷ Dimension Reduction: PCA

▷ Bayesian Decision Theory

▷ Bayesian Discriminant Function for Normal Density

▷ Linear Discriminant Analysis

▷ Linear Discriminant Functions

▷ Support Vector Machine

▷ k Nearest Neighbor

▷ Statistical Clustering

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Introduction

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Machine Perception [2]
• Build a machine that can recognize patterns:

– Speech recognition

– Fingerprint identification

– OCR (Optical Character Recognition)

– DNA sequence identification

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Components of Pattern Classification System [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Types of Prediction Problems [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Feature and Pattern [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Feature and Pattern [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Classifier [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Pattern Recognition Approaches [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Pattern Recognition Approaches [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Machine Perception [2]
(Example)
“Sorting incoming Fish on a conveyor according to species using optical
sensing”

Sea bass
Species
Salmon

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Machine Perception [2]
Problem Analysis: set up a camera
and take some sample images to
extract features

Preprocessing: use a segmentation


operation to isolate fishes from one
another and from the background

Feature extraction: information


from a single fish is sent to a feature
extractor whose purpose is to reduce
the data by measuring certain
features

The features are passed to a


classifier

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Feature Selection [2]
The length of the fish as a possible feature for discrimination

 The length is a poor feature alone!

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Feature Selection [2]
The lightness of the fish as a possible feature for discrimination

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Feature Selection [2]
• Adopt the lightness and add the width of the fish

Fish xT = [x1, x2]

Lightness Width

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Generalization [2]

The central aim of designing a classifier is to correctly classify novel input.


input

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Generalization [3]
Polynomial Curve Fitting

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Generalization: Model Selection [3]
Polynomial Curve Fitting
0th Order Polynomial 1st Order Polynomial

3rd Order Polynomial 9th Order Polynomial

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Generalization: Model Selection [3]
Polynomial Curve Fitting, Over-fitting

Root‐Mean‐Square (RMS) Error:

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Generalization: Sample Size [3]
Polynomial Curve Fitting
9th Order Polynomial N=15

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Generalization: Sample Size [3]
Polynomial Curve Fitting
9th Order Polynomial N=100

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Generalization: Regularization [3]
Polynomial Curve Fitting
Regularization: Penalize large coefficient values

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Learning and Adaptation [2]

• Supervised Learning: a teacher provides a category label or cost for each


pattern in a training set, and seeks to reduce the sum of the costs for
these patterns.

• Unsupervised Learning: there is no explicit teacher, and the system forms


clusters or “natural grouping” of the input patterns.

• Reinforcement Learning: no desired category signal is given; instead, the


only teaching is that the tentative category is right or wrong.

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Linear Discriminant Functions [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Feature Extraction 효율화
: Haar-like feature와
Integral Image

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Haar-like Feature [7]
The simple features used are reminiscent of Haar basis functions which
have been used by Papageorgiou et al. (1998).
Three kinds of features: two-rectangle feature, three-rectangle feature, and
four-rectangle feature
Given that the base resolution of the detector is 24x24, the exhaustive set
of rectangle feature is quite large, 160,000.

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Haar-like Feature: Integral Image [7]
Rectangle features can be computed very rapidly using an intermediate representation
for the image which we call the integral image.
The integral image at location x,y contains the sum of the pixels above and to the left of
x, y, inclusive:

where ii (x, y) is the integral image and i (x, y) is the original image (see Fig. 2). Using
the following pair of recurrences:

(where s(x, y) is the cumulative row sum, s(x,−1) =0, and ii (−1, y) = 0) the integral
image can be computed in one pass over the original image.

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Haar-like Feature: Integral Image [7]
Using the integral image any rectangular sum can be computed in four
array references (see Fig. 3).

Our hypothesis, which is borne out by experiment, is that a very small


number of these features can be combined to form an effective classifier.
The main challenge is to find these features.

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Dimension
Reduction: PCA

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Abstract [1]
Principal component analysis (PCA) is a technique that is useful for the compression and
classification of data. The purpose is to reduce the dimensionality of a data set (sample)
by finding a new set of variables, smaller than the original set of variables, that
nonetheless retains most of the sample's information.

By information we mean the variation present in the sample, given by the correlations
between the original variables. The new variables, called principal components (PCs),
are uncorrelated, and are ordered by the fraction of the total information each retains.

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Geometric Picture of Principal Components [1]

A sample of n observations in the 2-D space

Goal:
Goal to account for the variation in a sample in as few variables as
possible, to some accuracy

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Geometric Picture of Principal Components [1]

• the 1st PC is a minimum distance fit to a line in X space


• the 2nd PC is a minimum distance fit to a line in the plane perpendicular
to the 1st PC

PCs are a series of linear least squares fits to a sample, each orthogonal
to all the previous.

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Usage of PCA: Data Compression [1]
Because the kth PC retains the kth greatest fraction of the variation
we can approximate each observation by truncating the sum at the first m < p PCs

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Usage of PCA: Data Compression [1]

Reduce the dimensionality of the data


from p to m < p by approximating

where is the n x m portion of


and is the p x m portion of
n: sample number

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Derivation of PCA using the Covariance Method [8]
Let X be a d-dimensional random vector expressed as column vector.
Without loss of generality, assume X has zero mean. We want to find
a orthonormal transformation matrix P such that

with the constraint that


is a diagonal matrix and
 PX is a random vector with all its distinct components pairwise uncorrelated.

By substitution, and matrix algebra, we obtain:

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Derivation of PCA using the Covariance Method [8]
We now have:

Rewrite P as d column vectors, so

and as:

Substituting into equation above, we obtain:

Notice that in ,
Pi is an eigenvector of the covariance matrix of X. Therefore, by finding the
eigenvectors of the covariance matrix of X, we find a projection matrix P
that satisfies the original constraints.
E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Bayesian Decision
Theory

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
State of Nature [2]

We let  denote the state of nature, with = 1 for sea bass and = 2 for
salmon.

Because the state of nature is so unpredictable, we consider  to be a


variable that must be described probabilistically.
P(1) = P(2) (uniform priors)
P(1) + P( 2) = 1 (exclusivity and exhaustivity)

More generally, we assume that there is some a priori probability (or simply
prior) P(1) that the next fish is sea bass, and some prior probability P(2)
that it is salmon.
P(1) + P( 2) = 1 (exclusivity and exhaustivity)
Decision rule with only the prior information
Decide 1 if P(1) > P(2) otherwise decide 2

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Class-Conditional Probability Density [2]

In most circumstances we are not asked to make decisions with so little


information. In our example, we might for instance use a lightness
measurement x to improve our classifier.

We consider x to be a continuous random variable whose distribution


depends on the state of nature and is expressed as p(x|). This is the
class-conditional probability density function, the probability density
function for x given that the state of nature is .

Hypothetical class-conditional probability


density functions show the probability
density of measuring a particular feature
value x given the pattern is in category ωi.

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Posterior, likelihood, evidence [2]

Suppose that we know both the prior probabilities P(j) and the conditional
densities p(x|j) for j=1, 2.
Suppose further that we measure the lightness of a fish and discover that
its value is x.

How does this measurement influence our attitude concerning the


true state of nature – that is, the category of the fish?

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Posterior, likelihood, evidence [2]
Bayes formula: P( x |  j ) P( j )
P( j | x) 
P( x)

Where in case of two categories


j2
P ( x )   P ( x |  j )P (  j )
j 1

Then,
P( x |  j ) P( j ) P( x |  j ) P( j )
P( j | x)   j 2
P( x)
 P( x |  ) P( )
j 1
j j

likelihood  prior
posterior 
evidence

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Posterior, likelihood, evidence [2]

Posterior probabilities for the particular priors P(ω1) = 2/3 and P(ω2)= 1/3
for the class-conditional probability densities shown in Fig. 2.1.

Thus in this case, given that a pattern is measured to have feature value x
= 14, the probability it is in category ω2 is roughly 0.08, and that it is in ω1
is 0.92.
At every x, the posteriors sum to 1.0.
E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Decision given the Posterior Probabilities [2]

x is an observation for which:


if P(1 | x) > P(2 | x) True state of nature = 1
if P(1 | x) < P(2 | x) True state of nature = 2

Therefore:
Whenever we observe a particular x, the probability of error is :
P(error | x) = P(1 | x) if we decide 2
P(error | x) = P(2 | x) if we decide 1

Decide 1 if P(1 | x) > P(2 | x);


otherwise decide 2

Therefore:
P(error | x) = min [P(1 | x), P(2 | x)]
(Bayes decision)
E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Bayesian Decision Theory : Risk Minimization [2]
Generalization of the preceding ideas

- Use of more than one feature


- Use more than two states of nature
- Allowing actions and not only decide on the state of nature
- Introduce of a loss function which is more general than the probability
of error

Feature vector & feature space


Feature vector x is in a d-dimensional Euclidean space Rd, called the
feature space.

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Risk Minimization: Loss Function [2]
Formally, the loss function states how costly each action taken is, and is
used to convert a probability determination into a decision.

Let {1, 2,…, c} be the set of c states of nature (or “categories”)

Let {1, 2,…, a} be the set of possible actions

Let (i | j) be the loss incurred for taking action i when the state of
nature is j
Conditional risk

Overall risk
R = Sum of all R(i | x) for i = 1,…,a R   R    x  | x  p  x  dx

Minimizing R Minimizing R(i | x) for i = 1,…, a


j c
R(  i | x )    (  i |  j ) P (  j | x )
j 1
for i = 1,…,a

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Risk Minimization [2]
Two Category Classification
1 : deciding 1
2 : deciding 2
ij = (i | j)

loss incurred for deciding i when the true state of nature is j

Conditional risk:

R(1 | x) = 11P(1 | x) + 12P(2 | x)


R(2 | x) = 21P(1 | x) + 22P(2 | x)

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Risk Minimization [2]

Two Category Classification

Our rule is the following:


if R(1 | x) < R(2 | x)
action 1: “decide 1” is taken

This results in the equivalent rule :


decide 1 if:

(21- 11) P(x | 1) P(1) > (12- 22) P(x | 2) P(2)
and decide 2 otherwise

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Risk Minimization [2]

Two Category Classification

Likelihood ratio:
ratio

The preceding rule is equivalent to the following rule:

P ( x |  1 ) 12   22 P (  2 )
if  .
P ( x |  2 )  21  11 P (  1 )
Then take action 1 (decide 1)
Otherwise take action 2 (decide 2)

Optimal decision property


“If the likelihood ratio exceeds a threshold value independent of the input
pattern x, we can take optimal actions”

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Minimum Error Rate Classification [2]
Actions are decisions on classes
If action i is taken and the true state of nature is j then:
the decision is correct if i = j and in error if i  j

Seek a decision rule that minimizes the probability of error which is the
error rate

Introduction of the zero-one loss function:

0 i  j
 (  i , j )   i , j  1 ,..., c
1 i  j
j c
Therefore, the conditional risk is: R(  i | x )    (  i |  j )P (  j | x )
j 1

  P(  j | x )  1  P(  i | x )
j 1

“The risk corresponding to this loss function is the average probability


error”
E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Bayesian Decision Theory : Continuous Features [2]

Generalization of the preceding ideas

- Use of more than one feature


- Use more than two states of nature
- Allowing actions and not only decide on the state of nature
- Introduce a loss of function which is more general than the
probability of error

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Classifier, Discriminant Functions, and Decision Surface [2]

Set of discriminant functions gi(x), i = 1,…, c


The classifier assigns a feature vector x to class i
if: gi(x) > gj(x)  j  i

The functional structure of a general statistical pattern classifier which includes d


inputs and c discriminant functions gi (x). A subsequent step determines which
of the discriminant values is the maximum, and categorizes the input pattern
accordingly. The arrows show the direction of the flow of information, though
frequently the arrows are omitted when the direction of flow is self-evident.

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Classifier, Discriminant Functions, and Decision Surface [2]
The Multi-category case

Let gi(x) = - R(i | x)


(max. discriminant corresponds to min. risk!)

For the minimum error rate, we take


gi(x) = P(i | x)

(max. discrimination corresponds to max. posterior!)


gi(x)  P(x | i) P(i)

gi(x) = ln P(x | i) + ln P(i)


(ln: natural logarithm!)

Feature space divided into c decision regions


if gi(x) > gj(x)  j  i then x is in Ri
(Ri means assign x to i)

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Classifier, Discriminant Functions, and Decision Surface [2]
The two-category case

A classifier is a “dichotomizer” that has two discriminant functions g1


and g2

Let g(x)  g1(x) – g2(x)

Decide 1 if g(x) > 0 ;


Otherwise decide 2

The computation of g(x)

g ( x)  P(1 | x)  P(2 | x)
P ( x | 1 ) P (1 )
 ln  ln
P( x | 2 ) P (2 )

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Classifier, Discriminant Functions, and Decision Surface [2]
The two-category case

In this two-dimensional two-category classifier, the probability densities


are Gaussian, the decision boundary consists of two hyperbolas, and thus
the decision region R2 is not simply connected. The ellipses mark where
the density is 1/e times that at the peak of the distribution.
E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Bayesian
Discriminant
Function for Normal
Density

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The Normal Density [2]

Univariate density

Density which is analytically tractable


Continuous density
A lot of processes are asymptotically Gaussian
Handwritten characters, speech sounds are ideal or prototype
corrupted by random process (central limit theorem)

1  1  x   2 
P( x )  exp     ,
2   2    

Where:
 = mean (or expected value) of x
2 = expected squared deviation or variance

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The Normal Density [2]

A univariate normal distribution has roughly 95% of its area in the range
|x − μ| ≤ 2σ, as shown. The peak of the distribution has value
p(μ) = 1/ 2

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The Normal Density [2]

Multivariate density

Multivariate normal density in d dimensions is:

1  1 
P( x)  exp  ( x   ) t  1 ( x   )
(2 )   2 
d /2 1/ 2

where:
x = (x1, x2, …, xd)t (t stands for the transpose vector form)
 = (1, 2, …, d)t mean vector
 = d×d covariance matrix
|| and -1 are determinant and inverse respectively

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Function for the Normal Density [2]

We saw that the minimum error-rate classification can be achieved by the


discriminant function
gi(x) = ln P(x | i) + ln P(i)
Case of multivariate normal
1 d 1
g i ( x)   ( x  i ) t  i1 ( x  i )  ln 2  ln  i  ln P (i )
2 2 2
1 1
  ( x  i )t i1 ( x  i )  ln i  ln P(i )
2 2
quadratic discriminant function

[6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Covariance Matrix [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Function for the Normal Density [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Functions for the Normal Density [2]

If the covariance matrices for two distributions are equal and proportional
to the identity matrix, then the distributions are spherical in d dimensions,
and the boundary is a generalized hyperplane of d −1 dimensions,
perpendicular to the line separating the means.

In these one-, two-, and three-dimensional examples, we indicate


p(x|ωi ) and the boundaries for the case P(ω1) = P(ω2). In the three-
dimensional case, the grid plane separates R1 from R2.

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Functions for the Normal Density [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Functions for the Normal Density [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Functions for the Normal Density [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Functions for the Normal Density [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Functions for the Normal Density [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Functions for the Normal Density [2]

Probability densities (indicated by the surfaces in two dimensions and


ellipsoidal surfaces in three dimensions) and decision regions for equal
but asymmetric Gaussian distributions. The decision hyperplanes need not
be perpendicular to the line connecting the means.

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Functions for the Normal Density [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Functions for the Normal Density [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Functions for the Normal Density [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Functions for the Normal Density [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Functions for the Normal Density [2]

Arbitrary Gaussian distributions lead to


Bayes decision boundaries that are general
hyperquadrics. Conversely, given any
hyperquadric, one can find two Gaussian
distributions whose Bayes decision
boundary is that hyperquadric. These
variances are indicated by the contours of
constant probability density.

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Functions for the Normal Density [2]

Arbitrary three-dimensional Gaussian


distributions yield Bayes decision
boundaries that are two-dimensional
hyperquadrics. There are even
degenerate cases in which the
decision boundary is a line.
E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Discriminant Functions for the Normal Density [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Linear Discriminant
Analysis

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
LDA, Two-Classes [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
LDA, Two-Classes [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
LDA, Two-Classes [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
LDA, Two-Classes [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
LDA, Two-Classes [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
LDA, Multi-Classes [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
LDA, Multi-Classes [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
LDA, Multi-Classes [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
LDA Vs. PCA [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Limitations of LDA [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Linear Discriminant
Functions

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Linear Discriminant Functions [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Gradient Descent [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Perceptron Learning [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Perceptron Learning [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Perceptron Learning [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Minimum Squared Error Solution [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Minimum Squared Error Solution [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The Pseudo-Inverse Solution [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Least-Mean-Squares Solution [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Summary: Perceptron vs. MSE Procedures [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The Ho-Kashyap Procedure [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The Ho-Kashyap Procedure [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Support Vector
Machine

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Optimal Separating Hyperplanes [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Optimal Separating Hyperplanes [6]
Distance between a plane and a point

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Optimal Separating Hyperplanes [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Optimal Separating Hyperplanes [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Lagrange Multipliers [9]

Consider the two-dimensional optimization problem:

We can visualize contours of f given by

f(x,y)=d

g(x,y) = c

Find x and y to maximize f(x,y)


subject to a constraint (shown in
red) g(x,y) = c.
E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Lagrange Multipliers [9]
When f(x,y) becomes maximum on the path of g(x,y)=c, the contour line for g=c meets
contour lines of f tangentially. Since the gradient of a function is perpendicular to the
contour lines, this is the same as saying that the gradients of f and g are parallel.

Contour map. The red line shows the constraint g(x,y) = c. The
blue lines are contours of f(x,y). The point where the red line
tangentially touches a blue contour is our solution.

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Lagrange Multipliers [9]
To incorporate these conditions into one equation, we introduce an
auxiliary function

and solve

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Kuhn-Tucker Theorem [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The Lagrangian Dual Problem [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The Lagrangian Dual Problem [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The Lagrangian Dual Problem [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Dual Problem [10]
Minimize (in w, b)

Subject to (for any i=1,…n)

One could be tempted to expressed the previous problem by means of non-negative


Lagrange multipliers αi as

we could find the minimum by sending all αi to ∞. Nevertheless the previous


constrained problem can be expressed as

This is we look for a saddle point.


E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Support Vectors [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Non-separable Case [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Non-separable Case [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Non-separable Case [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Non-separable Case [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Non-linear SVMs [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Non-linear SVMs [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Non-linear SVMs [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Implicit Mappings: An Example [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Kernel Methods [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Kernel Methods [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Kernel Methods [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Kernel Methods [6]
Kernel Functions

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Architecture of an SVM [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Case Study: XOR [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
k Nearest Neighbor

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The k Nearest Neighbor Classification Rule [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The k Nearest Neighbor Classification Rule [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The k Nearest Neighbor Classification Rule [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The k Nearest Neighbor Classification Rule [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The k Nearest Neighbor Classification Rule [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Statistical Clustering

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Non-parametric Unsupervised Learning [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Proximity Measures [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Proximity Measures [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Proximity Measures [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Proximity Measures [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Criterion Function for Clustering [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Cluster Validity [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
Iterative Optimization [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The k-means Algorithm [6]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
The k-means Algorithm [4]

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
References
1. Frank Masci, “An Introduction to Principal Component Analysis,”
http://web.ipac.caltech.edu/staff/fmasci/home/statistics_refs/PrincipalC
omponentAnalysis.pdf
2. Richard O. Duda, Peter E. Hart, David G. Stork, Pattern Classification,
second edition, John Wiley & Sons, Inc., 2001.
3. Christopher M. Bishop, Pattern Recognition and Machine Learning,
Springer, 2007.
4. Sergios Theodoridis, Konstantinos Koutroumbas, Pattern Recognition,
Academic Press, 2006.
5. Ho Gi Jung, Yun Hee Lee, Pal Joo Yoon, In Yong Hwang, and Jaihie Kim,
“Sensor Fusion Based Obstacle Detection/Classification for Active
Pedestrian Protection System,” Lecture Notes on Computer Science, Vol.
4292, 294-305.
6. Ricardo Gutierrez-Osuna, “Pattern Recognition, Lecture Notes,” available
at http://research.cs.tamu.edu/prism/lectures.htm
7. Paul Viola, Michael Jones, “Robust real-time object detection,”
International Journal of Computer Vision, 57(2), 2004, 137-154.
8. Wikipedia, “Principal component analysis,” available at
http://en.wikipedia.org/wiki/Principal_component_analysis
E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung
References
9. Wikipeida, “Lagrange multipliers,”
http://en.wikipedia.org/wiki/Lagrange_multipliers.
10.Wikipeida, “Support Vector Machine,”
http://en.wikipedia.org/wiki/Support_vector_machine.

E-mail: hogijung@hanyang.ac.kr
http://web.yonsei.ac.kr/hgjung

You might also like