Professional Documents
Culture Documents
In Subject CT3 we looked at the classical approach to statistical estimation, using the method of maximum likelihood estimation and the method of moments.
0 INTRODUCTION
In this chapter we will look at Bayesian methods, which provide an alternative approach. The Bayesian philosophy involves a completely dierent approach to statistics. The Bayesian version of estimation is considered here for the basic situation concerning the estimation of a parameter given a random sample from a particular distribution. The fundamental dierence between Bayesian and classical methods is that the parameter is considered to be a random variable in Bayesian methods. In classical statistics, is a xed but unknown quantity.
0 INTRODUCTION
Another advantage of Bayesian statistics is that it enables us to make use of any information that we already have about the situation under investigation.
1 BAYES THEOREM
Bayes Theorem
If B1, B2, . . . , Bk constitute a partition of a sample space S and P(Bi) = 0 for i = 1, 2, . . . , k , then for any event A in S such that P(A) = 0 P(Br |A) = where
k
P(A|Br ) P(Br ) P ( A)
P(A) =
i=1
P(A|Bi) P(Bi)
for r = 1, 2, . . . , k . Conditional probability P(Br |A) = Law of total probability P(A) = P(A B1) + + P(A Bk ) = P(A|B1) P(B1) + + P(A|Bk ) P(Bk ) Bayes theorem allows us to turn round a conditional probability. P(Br A) P ( A)
1 BAYES THEOREM
1.1
An example
Three manufacturers supply clothing to a retailer. 60% of the stock comes from manufacturer 1, 30% from manufacturer 2 and 10% from manufacturer 3. 10% of the clothing from manufacturer 1 is faulty, 5% from manufacturer 2 is faulty and 15% from manufacturer 3 is faulty. What is the probability that a faulty garment comes from manufacturer 3? What are the probabilities that a faulty garment comes from each of the other manufacturers?
1 BAYES THEOREM
The example on pages 7-8 shows that we can also apply Bayes theorem when the event A corresponds to a continuous random variable taking a specied value. Example on pages 7-8.
Suppose X = (X1, X2, . . . , Xn) is a random sample from a population specied by the density or probability function f (x; ) and it is required to estimate . As a result of the parameter being a random variable, it will have a distribution. This allows the use of any knowledge available about possible values for before the collection of any data. This knowledge is quantied by expressing it as the prior distribution of . Then after collecting appropriate data, the posterior distribution of is determined and this forms the basis of all inference concerning .
The information from the random sample is contained in the likelihood function for that sample. So the Bayesian approach combines the information obtained from the likelihood function with the information in the prior distribution. Both sources of information are combined to obtain a posterior estimate for the required population parameter.
2.1
Notation
As is a random variable, it should really be denoted by the capital and its prior density written as f(). For simplicity just denote the density by f (). Note that referring to a density here implies that is continuous. In most applications this will be the case, as even when X is discrete. The range of values taken by the prior distribution should also reect the possible parameter values. Also the population density or probability function will be denoted by f (x|) rather than f (x; ) as it represents the conditional distribution of X given .
10
2.2
Suppose that X is a random sample from a population specied by f (x|) and that has the prior density f (). The posterior density of |X is determined by applying the basic denition of a conditional density: f (, X ) f (X |)f () f (|X ) = = f (X ) f (X ) Note that f (X ) = f (X |)f ()d. This result is like a continuous version of Bayes theorem from basic probability.
11
It is often convenient to express this result in terms of the value of a statistic, such as X , rather than the sample values X . So, for example, f (|X ) = f (X |)f () f (X )
12
A useful way of expressing the posterior density is to use proportionality. f (X ) does not involve because we integrated out when we calculated f (X ) from the integral. So, f (|X ) f (X |)f () Also note that f (X |), being the joint density of the sample values, is none other than the likelihood. So the posterior is proportional to the product of the likelihood and the prior.
13
The idea of proportionality is important. It enables us to do many questions on Bayesian methods relatively easily. Example on pages 11-13.
14
2.3
The same logic underlying the proportional method applies in the case where the unknown parameter is assumed to have a continuous distribution.
15
The steps involved in nding the posterior distribution are: STEP 1 (selecting a prior distribution) STEP 2 (determining the likelihood function) STEP 3 (determining the posterior parameter distribution) Multiply the prior parameter distribution and the likelihood function to nd the form of the posterior parameter distribution.
16
STEP 4 (identifying the posterior parameter distribution) EITHER look for a standard distribution that has a PDF with the same algebraic form and range of values as the posterior distribution you have found (e.g. by comparing with the PDFs in the Tables). OR (if your posterior distribution doesnt match any of the standard distributions) integrate out (or sum out) the unknown parameter to nd the normalization constant in the PDF (or PF) of the posterior distribution.
17
Question 2.4 on page 14 If x1, . . . , xn is a random sample from an Exp() distribution where is an unknown parameter, nd the posterior distribution for , assuming the prior distribution is Exp( ).
18
2.4
Conjugate priors
For a given likelihood, if the prior distribution leads to a posterior distribution belonging to the same family as the prior distribution, then this prior is called the conjugate prior for this likelihood. The likelihood function determines which family of distributions will lead to a conjugate pair. Conjugate distributions can be found by selecting a family of distributions that has the same algebraic form as the likelihood function, treating the unknown parameter as the random variable.
19
Example on page15 Using conjugate distributions often makes Bayesian calculations simpler. They may be appropriate to use where there is a family of distributions that might be expected to provide a natural model for the unknown parameter. e.g. in the previous example where the probability parameter p had to lie in the range 0 < p < 1 (which is the range of values over which the beta distribution is dened).
20
2.5
Sometimes it is useful to use an uninformative prior distribution, which assumes that an unknown parameter is equally likely to take any value. For example, we might have a sample from a normal population with mean where we know nothing at all about . This leads to a problem in this example because we would need to assume a U (, ) distribution for , which doesnt make sense, since the PDF of this distribution would be 0 everywhere. We can easily get round this problem by using the distribution U (N, N ) where N is a very large number and then letting N to tend to innity.
21
To obtain an estimator of , a loss function must rst be specied. This is a measure of the loss incurred when g (X ) is used as an estimator of . Three loss functions: quadratic loss function (c.f. mean square error) L(g (x), ) = [g (x) ]2 absolute error loss function L(g (x), ) = |g (x) | all-or-nothing loss function L ( g (x ), ) = 0 if g (x) = 1 if g (x) =
22
The Bayesian estimator that arises by minimizing the expected loss for each of those loss functions in turn is the mean, median and mode, respectively, of the posterior distribution, each of which is a measure of location of the posterior distribution. The expected posterior loss is EP L = E{L(g (x), )} = L(g (x), )f (|x)d
23
3.1
Quadratic loss
g will be written instead of g (x) The expected posterior loss is EP L = d EP L = 2 dg g f (|x)d = g = (g )2f (|x)d (g )f (|x)d = 0 f (|x)d f (|x)d = E(|x)
Question 2.6 on page 19 Ten IID observations from a P oisson() distribution gave 3,4,3,1,5,5,2,3,3,2. Assuming an Exp(0.2) prior distribution for , found the Bayesian estimator of under squared error loss.
24
3.2
EP L =
g
|g |f (|x)d
+
|g |f (|x)d +
g b(y )
|g |f (|x)d
Because d dy =
a(y ) b(y )
a(y )
so, d EP L = dg thus
g
f (|x)d
+ g
f (|x)d = 0
f (|x)d =
g
f (|x)d
25
3.3
All-or-nothing loss
The dierentiation approach cannot be used here. Instead a direct approach will be used with a limiting argument. Consider L ( g (x ), ) = 0 if g < < g + 1 otherwise 0, this tends to the
26
Question 2.7 on page 21 x1, x2, . . . , xn are IID observations from a Gamma(, ) distribution, where is unknown, but is known. The prior distribution of is Exp() where is a known constant. Find the Bayesian estimator of under zero-one error loss.
27
3.4
An example
For the estimation of a binomial probability from a single observation X with the prior distribution of being beta with parameters and , investigate the form of the posterior distribution of and determine the Bayesian estimator of under quadratic loss. Question 2.8 on page 22 What would be the estimate using all-or-nothing loss?
28
3.5
A table in which the likelihood function is given, together with the distributions of the prior and the posterior is on page 24.