This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

3

from Pattern Recognition/4e by Theodoridis and Koutroumbas

Heath Hunnicutt

23 October 2012

The following notes are my elaboration of the steps presented in Example 2.3 from Pattern Recognition/4e

by Theodoridis and Koutroumbas. Text presented with a gray background is quoted from that ex-

ample. Equation numbers such as (2.55) match the numbers used in the text. Equation numbers

such as (H.3) do not appear in the text.

1 Example 2.3

Example 2.3 is contained in section 2.5.1, Maximum Likelihood Parameter Estimation:

Assume that N data points, x

1

, x

2

, ..., x

N

, have been generated by a one-dimensional

Gaussian pdf of known mean, µ, but of unknown variance. Derive the ML estimate of

the variance.

The technique of Maximum Likelihood Parameter Estimation (MLPE) requires the user to assume

a probability distribution function (pdf) of population data from which the sample data points have

been drawn.

Example 2.3 states that the assumed distribution of the population data is a Gaussian pdf. The

mean, µ, is assumed to be known.

As stated in the example, the parameter being estimated, θ, is the variance, or θ = σ

2

.

The technique of MLPE is to deﬁne a likelihood function, p(X, θ), where X = x

1

, x

2

, ..., x

N

, and to

ﬁnd the maximum of this function with respect to varying parameter θ and ﬁxed data points X. The

likelihood function is the probability that all of the measurements x

k

could occur given the unknown

parameter θ. If the samples x

k

are believed to be causally independent of each other, then their

probabilities under the parameter are considered independent. In this case, the probability that all

measurements occurred is the product of the probability of each measurement having occurred, or

equation (2.55) from the text:

p(X; θ) =

N

k=1

p(x

k

; θ) (2.55)

1

If it can be shown that the likelihood must be non-zero for any value of the parameter, the log-

likelihood function can be used. Solving for the maximum may be easier when using the log-

likelihood, depending on the pdf assumed. In the case of the Gaussian, each probability p(x

k

, θ)

must be non-zero for all θ – the Gaussian pdf never vanishes. Therefore use of the log-likelihood

is mathematically valid for the Gaussian pdf. We will also see that log-likelihood is convenient for

the Gaussian pdf. The log-likelihood may be written L(X; θ) or L(θ) when the context allows for

such brevity:

The log-likelihood function for this case is given by

L(θ) = ln

N

k=1

p(x

k

; σ

2

) = ln

N

k=1

1

√

2π

√

σ

2

exp

_

−

(x

k

−µ)

2

2σ

2

_

The left equation is a reference to equation 2.58 from the text, rewritten in terms of θ = σ

2

:

L(θ) = ln

N

k=1

p(x

k

; θ) (2.58)

Substituting θ = σ

2

, we derive the left equation from the example:

L(θ) = L(σ

2

) = ln

N

k=1

p(x

k

; σ

2

)

To obtain the equation on the right, we expand p(x

k

; σ

2

) as a Gaussian pdf. Any Gaussian pdf is

deﬁned by the probability function:

p(x) =

1

√

2πσ

exp

_

−

(x −µ)

2

2σ

2

_

so that:

p(x

k

; σ

2

) =

1

√

2π

√

σ

2

exp

_

−

(x

k

−µ)

2

2σ

2

_

and therefore:

L(σ

2

) = ln

N

k=1

1

√

2π

√

σ

2

exp

_

−

(x

k

−µ)

2

2σ

2

_

as given by the text in the equation shown above on the right.

2

Example 2.3 continues with:

or

L(σ

2

) = −

N

2

ln(2πσ

2

) −

1

2σ

2

N

k=1

(x

k

−µ)

2

which indicates an application of the logarithmic identity:

ln

k

a

k

=

k

ln a

k

(H.1)

Resuming our derivation of the likelihood function, we most recently obtained:

L(σ

2

) = ln

N

k=1

1

√

2π

√

σ

2

exp

_

−

(x

k

−µ)

2

2σ

2

_

(H.2)

We now apply the logarithmic identity (H.1), twice:

L(σ

2

) =

N

k=1

ln

_

1

√

2π

√

σ

2

exp

_

−

(x

k

−µ)

2

2σ

2

__

(H.3)

=

N

k=1

_

ln

1

√

2π

√

σ

2

+ ln exp

_

−

(x

k

−µ)

2

2σ

2

__

(H.4)

Considering ln = exp

−1

, we rewrite this as:

=

N

k=1

_

ln

1

√

2π

√

σ

2

−

_

(x

k

−µ)

2

2σ

2

__

(H.5)

We apply the identity ln a

y

= y ln a:

=

N

k=1

_

−

1

2

ln

_

2πσ

2

_

−

_

(x

k

−µ)

2

2σ

2

__

and distribute the summation notation:

=

N

k=1

_

−

1

2

ln

_

2πσ

2

_

_

−

N

k=1

_

(x

k

−µ)

2

2σ

2

_

The left term does not depend on k and can be simpliﬁed as:

= −

N

2

ln

_

2πσ

2

_

−

N

k=1

_

(x

k

−µ)

2

2σ

2

_

3

The right term contains a factor which does not depend on k:

L(σ

2

) = −

N

2

ln

_

2πσ

2

_

−

1

2σ

2

N

k=1

(x

k

−µ)

2

Which restates the likelihood function as matching the equation given by the text.

Now that we have derived a likelihood function, our remaining task is to ﬁnd a maximum of this

function, in terms of θ. We will proceed as usual, by ﬁnding an expression for the derivative

L(θ)

dθ

and solving for values of θ which have zero (0) derivative. It will be helpful to rewrite our likelihood

function as:

L(σ

2

) = −

N

2

ln (2π) −

N

2

ln

_

σ

2

_

−

1

2

σ

−2

N

k=1

(x

k

−µ)

2

and to substitute θ = σ

2

:

L(θ) = −

N

2

ln (2π) −

N

2

ln (θ) −

1

2

θ

−1

N

k=1

(x

k

−µ)

2

Continuing with the example as presented in the text:

Taking the derivative of the above with respect to σ

2

and equating to zero, we obtain

−

N

2σ

2

+

1

2σ

4

N

k=1

(x

k

−µ)

2

= 0

We will equivalently ﬁnd the derivative

dL

dθ

, remembering that

d ln x

dx

=

1

x

:

dL(θ)

dθ

= −

N

2θ

+

1

2

θ

−2

N

k=1

(x

k

−µ)

2

It is now convenient to reverse our substitution θ = σ

2

:

dL(σ

2

)

dσ

2

= −

N

2σ

2

+

1

2

σ

−4

N

k=1

(x

k

−µ)

2

and rewrite the second term:

= −

N

2σ

2

+

1

2σ

4

N

k=1

(x

k

−µ)

2

Now our expression for the derivative matches the text.

4

To ﬁnd an extremum of L(σ

2

), we will solve for an expression of σ

2

at which the derivative is zero

(0):

−

N

2σ

2

+

1

2σ

4

N

k=1

(x

k

−µ)

2

= 0

Multiplying by 2σ

4

gives:

−Nσ

2

+

N

k=1

(x

k

−µ)

2

= 0

or the equivalent equation:

N

k=1

(x

k

−µ)

2

= Nσ

2

Dividing by N:

1

N

N

k=1

(x

k

−µ)

2

= σ

2

We have obtained the same result as the text, which makes a change of notation at this point,

writing ˆ σ

2

ML

instead of σ

2

:

and ﬁnally the ML estimate of σ

2

results as the solution of the above,

ˆ σ

2

ML

=

1

N

N

k=1

(x

k

−µ)

2

(2.63)

2 Commentary

Reviewing our derivation, we can consider the convenience of log-likelihood L(X; θ), as compared

to likelihood p(X; θ).

In equation (H.2), the presence of the logarithm function allowed us to rewrite the product notation,

, of (H.2) as summation notation,

**, in (H.3). This is helpful later when we ﬁnd the derivative.
**

Taking the derivative of a summation is easier than taking the derivative of a product notation.

In equation (H.4), the presence of the logarithm function was conveniently able to cancel the expo-

nential factor introduced by the Gaussian pdf, yielding (H.5).

5

3 About Heath Hunnicutt

Heath Hunnicutt is a non-credentialed amateur mathematician and professional software developer.

Via the Internet, Heath oﬀers on-line videoconference training in a variety of amateur mathematics.

These areas include: signal processing, pattern recognition, information theory, thermodynamics,

statistics, abstract algebra, formal correctness of software, and software development.

Heath’s email address is not written plainly to avoid computer programs which ﬁnd addresses and

send advertising. To contact Heath, you may calculate his email address: starting with his full

name, convert every letter to lower-case. Concatenate the ﬁrst and last name by adding a dot

between them, as in “john.doe” – this is the email address at gmail.com.

6

Five pages of detail expanding upon Example 2.3 from the book Pattern Recognition, fourth edition, by Theodoridis and Koutroumbas.
This is a step-by-step exploration of the example at a level wh...

Five pages of detail expanding upon Example 2.3 from the book Pattern Recognition, fourth edition, by Theodoridis and Koutroumbas.

This is a step-by-step exploration of the example at a level which will be accessible to most students.

This is a step-by-step exploration of the example at a level which will be accessible to most students.

- Lecture5 Time Domain System Analysis
- D-T Signals Definition of DFT –Numerical FT Analysis
- EXP2 DSP pracitcal
- L14
- Problem Set 6asda
- Total Response Using Z Transform
- Experiment No.4
- Signals Course Plan 10
- Fft
- cctut
- Mathoverflow Net Questions 204577 Uniqueness-Of-A-smooth-function 204578#204578
- EMTP_I
- New Microsoft Word Document (3)
- TOE-C843-9.23D.pdf
- Wavelet Based Compression Using Sofm
- MA105 1L Mather
- Tutorial
- Mat Lab Problems
- Strata
- 48502706-DIP
- 814 User Manual
- info_241_en
- Transdutor Linear de Haste BTL7,
- Surf Els
- Pressure Booster Pump V1 1
- A Survey Paper on Character Recognition
- lect03-CS491M
- Block-Based Neural Network for Automatic Number Plate Recognition
- Mass Balancing Examples 1
- 4416.526 Rev.2

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd