You are on page 1of 28

0 INTRODUCTION

Chapter 2 Bayesian statistics


0 Introduction

In Subject CT3 we looked at the classical approach to statistical estimation, using the method of maximum likelihood estimation and the method of moments.

0 INTRODUCTION

In this chapter we will look at Bayesian methods, which provide an alternative approach. The Bayesian philosophy involves a completely dierent approach to statistics. The Bayesian version of estimation is considered here for the basic situation concerning the estimation of a parameter given a random sample from a particular distribution. The fundamental dierence between Bayesian and classical methods is that the parameter is considered to be a random variable in Bayesian methods. In classical statistics, is a xed but unknown quantity.

0 INTRODUCTION

Another advantage of Bayesian statistics is that it enables us to make use of any information that we already have about the situation under investigation.

1 BAYES THEOREM

Bayes Theorem

If B1, B2, . . . , Bk constitute a partition of a sample space S and P(Bi) = 0 for i = 1, 2, . . . , k , then for any event A in S such that P(A) = 0 P(Br |A) = where
k

P(A|Br ) P(Br ) P ( A)

P(A) =
i=1

P(A|Bi) P(Bi)

for r = 1, 2, . . . , k . Conditional probability P(Br |A) = Law of total probability P(A) = P(A B1) + + P(A Bk ) = P(A|B1) P(B1) + + P(A|Bk ) P(Bk ) Bayes theorem allows us to turn round a conditional probability. P(Br A) P ( A)

1 BAYES THEOREM

1.1

An example

Three manufacturers supply clothing to a retailer. 60% of the stock comes from manufacturer 1, 30% from manufacturer 2 and 10% from manufacturer 3. 10% of the clothing from manufacturer 1 is faulty, 5% from manufacturer 2 is faulty and 15% from manufacturer 3 is faulty. What is the probability that a faulty garment comes from manufacturer 3? What are the probabilities that a faulty garment comes from each of the other manufacturers?

1 BAYES THEOREM

The example on pages 7-8 shows that we can also apply Bayes theorem when the event A corresponds to a continuous random variable taking a specied value. Example on pages 7-8.

2 PRIOR AND POSTERIOR DISTRIBUTIONS

Prior and posterior distributions

Suppose X = (X1, X2, . . . , Xn) is a random sample from a population specied by the density or probability function f (x; ) and it is required to estimate . As a result of the parameter being a random variable, it will have a distribution. This allows the use of any knowledge available about possible values for before the collection of any data. This knowledge is quantied by expressing it as the prior distribution of . Then after collecting appropriate data, the posterior distribution of is determined and this forms the basis of all inference concerning .

2 PRIOR AND POSTERIOR DISTRIBUTIONS

The information from the random sample is contained in the likelihood function for that sample. So the Bayesian approach combines the information obtained from the likelihood function with the information in the prior distribution. Both sources of information are combined to obtain a posterior estimate for the required population parameter.

2 PRIOR AND POSTERIOR DISTRIBUTIONS

2.1

Notation

As is a random variable, it should really be denoted by the capital and its prior density written as f(). For simplicity just denote the density by f (). Note that referring to a density here implies that is continuous. In most applications this will be the case, as even when X is discrete. The range of values taken by the prior distribution should also reect the possible parameter values. Also the population density or probability function will be denoted by f (x|) rather than f (x; ) as it represents the conditional distribution of X given .

2 PRIOR AND POSTERIOR DISTRIBUTIONS

10

2.2

Determining the posterior density

Suppose that X is a random sample from a population specied by f (x|) and that has the prior density f (). The posterior density of |X is determined by applying the basic denition of a conditional density: f (, X ) f (X |)f () f (|X ) = = f (X ) f (X ) Note that f (X ) = f (X |)f ()d. This result is like a continuous version of Bayes theorem from basic probability.

2 PRIOR AND POSTERIOR DISTRIBUTIONS

11

It is often convenient to express this result in terms of the value of a statistic, such as X , rather than the sample values X . So, for example, f (|X ) = f (X |)f () f (X )

2 PRIOR AND POSTERIOR DISTRIBUTIONS

12

A useful way of expressing the posterior density is to use proportionality. f (X ) does not involve because we integrated out when we calculated f (X ) from the integral. So, f (|X ) f (X |)f () Also note that f (X |), being the joint density of the sample values, is none other than the likelihood. So the posterior is proportional to the product of the likelihood and the prior.

2 PRIOR AND POSTERIOR DISTRIBUTIONS

13

The idea of proportionality is important. It enables us to do many questions on Bayesian methods relatively easily. Example on pages 11-13.

2 PRIOR AND POSTERIOR DISTRIBUTIONS

14

2.3

Continuous prior distributions

The same logic underlying the proportional method applies in the case where the unknown parameter is assumed to have a continuous distribution.

2 PRIOR AND POSTERIOR DISTRIBUTIONS

15

The steps involved in nding the posterior distribution are: STEP 1 (selecting a prior distribution) STEP 2 (determining the likelihood function) STEP 3 (determining the posterior parameter distribution) Multiply the prior parameter distribution and the likelihood function to nd the form of the posterior parameter distribution.

2 PRIOR AND POSTERIOR DISTRIBUTIONS

16

STEP 4 (identifying the posterior parameter distribution) EITHER look for a standard distribution that has a PDF with the same algebraic form and range of values as the posterior distribution you have found (e.g. by comparing with the PDFs in the Tables). OR (if your posterior distribution doesnt match any of the standard distributions) integrate out (or sum out) the unknown parameter to nd the normalization constant in the PDF (or PF) of the posterior distribution.

2 PRIOR AND POSTERIOR DISTRIBUTIONS

17

Question 2.4 on page 14 If x1, . . . , xn is a random sample from an Exp() distribution where is an unknown parameter, nd the posterior distribution for , assuming the prior distribution is Exp( ).

2 PRIOR AND POSTERIOR DISTRIBUTIONS

18

2.4

Conjugate priors

For a given likelihood, if the prior distribution leads to a posterior distribution belonging to the same family as the prior distribution, then this prior is called the conjugate prior for this likelihood. The likelihood function determines which family of distributions will lead to a conjugate pair. Conjugate distributions can be found by selecting a family of distributions that has the same algebraic form as the likelihood function, treating the unknown parameter as the random variable.

2 PRIOR AND POSTERIOR DISTRIBUTIONS

19

Example on page15 Using conjugate distributions often makes Bayesian calculations simpler. They may be appropriate to use where there is a family of distributions that might be expected to provide a natural model for the unknown parameter. e.g. in the previous example where the probability parameter p had to lie in the range 0 < p < 1 (which is the range of values over which the beta distribution is dened).

2 PRIOR AND POSTERIOR DISTRIBUTIONS

20

2.5

Improper prior distributions

Sometimes it is useful to use an uninformative prior distribution, which assumes that an unknown parameter is equally likely to take any value. For example, we might have a sample from a normal population with mean where we know nothing at all about . This leads to a problem in this example because we would need to assume a U (, ) distribution for , which doesnt make sense, since the PDF of this distribution would be 0 everywhere. We can easily get round this problem by using the distribution U (N, N ) where N is a very large number and then letting N to tend to innity.

3 THE LOSS FUNCTION

21

The loss function

To obtain an estimator of , a loss function must rst be specied. This is a measure of the loss incurred when g (X ) is used as an estimator of . Three loss functions: quadratic loss function (c.f. mean square error) L(g (x), ) = [g (x) ]2 absolute error loss function L(g (x), ) = |g (x) | all-or-nothing loss function L ( g (x ), ) = 0 if g (x) = 1 if g (x) =

3 THE LOSS FUNCTION

22

The Bayesian estimator that arises by minimizing the expected loss for each of those loss functions in turn is the mean, median and mode, respectively, of the posterior distribution, each of which is a measure of location of the posterior distribution. The expected posterior loss is EP L = E{L(g (x), )} = L(g (x), )f (|x)d

3 THE LOSS FUNCTION

23

3.1

Quadratic loss

g will be written instead of g (x) The expected posterior loss is EP L = d EP L = 2 dg g f (|x)d = g = (g )2f (|x)d (g )f (|x)d = 0 f (|x)d f (|x)d = E(|x)

Question 2.6 on page 19 Ten IID observations from a P oisson() distribution gave 3,4,3,1,5,5,2,3,3,2. Assuming an Exp(0.2) prior distribution for , found the Bayesian estimator of under squared error loss.

3 THE LOSS FUNCTION

24

3.2

Absolute error loss

The expected posterior loss is


+

EP L =
g

|g |f (|x)d
+

|g |f (|x)d +
g b(y )

|g |f (|x)d

Because d dy =

f (x, y )dx f (x, y )dx + b (y )f (b(y ), y ) a (y )f (a(y ), y ) y


g +

a(y ) b(y )

a(y )

so, d EP L = dg thus
g

f (|x)d
+ g

f (|x)d = 0

f (|x)d =
g

f (|x)d

that is P ( g ) = P ( g ) which species the median of the posterior distribution.

3 THE LOSS FUNCTION

25

3.3

All-or-nothing loss

The dierentiation approach cannot be used here. Instead a direct approach will be used with a limiting argument. Consider L ( g (x ), ) = 0 if g < < g + 1 otherwise 0, this tends to the

so that, in the limit as required loss function.


g+

The expected posterior loss is EP L = 1


g

f (|x)d = 1 2 f (g |x) for small

This is minimized by taking g to be the mode of f (|x). i.e. f (|x) is maximized.

3 THE LOSS FUNCTION

26

Question 2.7 on page 21 x1, x2, . . . , xn are IID observations from a Gamma(, ) distribution, where is unknown, but is known. The prior distribution of is Exp() where is a known constant. Find the Bayesian estimator of under zero-one error loss.

3 THE LOSS FUNCTION

27

3.4

An example

For the estimation of a binomial probability from a single observation X with the prior distribution of being beta with parameters and , investigate the form of the posterior distribution of and determine the Bayesian estimator of under quadratic loss. Question 2.8 on page 22 What would be the estimate using all-or-nothing loss?

3 THE LOSS FUNCTION

28

3.5

Some Bayesian posterior distributions

A table in which the likelihood function is given, together with the distributions of the prior and the posterior is on page 24.

You might also like