You are on page 1of 12

Biometrika Trust

The Bayesian Estimation of Common Parameters from Several Responses Author(s): George E. P. Box and Norman R. Draper Reviewed work(s): Source: Biometrika, Vol. 52, No. 3/4 (Dec., 1965), pp. 355-365 Published by: Biometrika Trust Stable URL: . Accessed: 08/11/2011 15:52
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact

Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

Biometrika(1965), 52, 3 and 4, p. 355 With4 text-figures Printed in GreatBritain


from parameters several of estimation common The Bayesian responsest

By GEORGE E. P. BOX AND NORMAN R. DRAPER of University Wisconsin 1. SUMMARY Suppose that n multivariateobservations are available with the uth such observation responses,such as the yields of k Yu = (Ylu5 Y2u) . Yku)recordingthe values of k different different products of a chemical reaction at the uth set of experimentalconditions. Suppose that the k expected values E(y1u), E(y2j),..., E(yku) can be represented by parameters01,02, .... 0m. For example, the chemical involvingm common k knownfunctions of functions the basic rate constants yield ofthe k productsmightbe expressedby k different them we concerning of the system. In attemptingto estimate the 0's and make inferences naturally wish to combine appropriately the relevant informationfrom the different responses. In thispaper the appropriateBayesian techniques are workedout and examples are given. 2. FORMULATION
Let y (i = 1,2, ..., ik;u= 1,2, ..., n) represent n sets of observations on each of k responses. Let xQh = 1, 2, ..., k; u = 1, 2, ..., n; q = 1, 2, ..., r) represent n sets of observations on each (i

of k sets each of r independentvariables. Considerthe k models

m) X4u, Xi'u;01,02) ..* O + Ciu) Yiu= fi(XilU, ..*I* by which we can denote more conveniently the abbreviated form Oh) Yiu= fi(xiQu, + ciu, where of foreach i = 1,2, ..., k, wherethefi are k responsefunctions knownform,


Oh E(eiu) = 0
E(e4 ) =

(h= 1,2,...,m),
E(ciu e1) = 0

are m unknownparametersand wherethe cjudenote random errorssuch that

all i,u;

all i,j,u * v;



j, all u.

is E, where

may (These are verygeneralmodels. In cases met in practice,considerablesimplification the time occur. For example, it may happen that xiu = xju. In the examples given,xiu = tu, at which the uth experimentwas run and is in fact independent of i. Another possible simplicationis that not all of the 0's may appear in all of the models.) the vector ofuth observaY2u ..., yku) to represent This impliesthat ifwe writeyu = (ylu, = 1,2, ..., n) on each of the k variables then the variance-covariancematrix of Y. tions (u
111 012
... ...


k1 Lkl







of t Research supportedby the United States Navy throughthe Office Naval Research.







The elements of y. and Y, -i * v are = {1ii}. and where o-ij= oji. Let A = -1= the vector of the m parametersin equations = (01, 02, ..., O,, represent ) uncorrelated.Let 0' normaldistribution. a What can be deduced (2a1). Suppose yl, Y2u .... 'YkU follow multivariate about the possible values of the elementsof O? We definethe quantities



0)}, 0)} E {Yiu-f (xtu, {Yju f3(xjq,

which are the sums of squares and sums of products of the deviations of the observed yiu's fromtheirrespectivemodels. the wereknown, likelihoodwould be a monotonic Now ifthe variances and covariances o-ij k k functionof the quadratic form z= E vijVij.
i=l j=1

Minimization of this quadratic formprovides the generalization of the method of least would result if it were known that simplification squares due to Aitken (1935). A further the elements of the vector observations were uncorrelated (o-ij= 0, i * j). In this case, to of minimization z would correspond standardweightedleast squares, usingthereciprocals and it is this ofthe variances as the weights. In mostpractical problemsthe o-ijare unknown case we shall discuss. 3. BAYESIAN SOLUTION Since the n sets of observationsYu = (y1u,Y2u, p(yI0,oii) =


are independent,the likelihoodis


IAlinexp 2


wherewe denote the data by y' = (yj, y', . .., Yn) for priordistributions 0 and We shall apply the Bayes theoremusing 'non-informative' the 0rij. We suppose that little is known ab initio about the values of the constants 0. of Specificallywe suppose that, when suitably parametrized,the prior distribution the 0's will not change very much over a regionin which the likelihoodis appreciable, i.e. we take a locally uniform priordistribution P(O) oc dO. The invariance theory of Jeffreys (1961) leads to p(o-ii) = Ji, where J is the Jacobian so As ID(o-1)/D(1i?)I.shown in the Appendix, J = IAK-(k+l), that (see also Savage, 1961; Geisser & Cornfield, 1963; Tiao & Zellner, 1964)
p(i p(o-j.) =
A -'(k+l) IAI(k1) A

for Combiningthe priordistribution 0 and the oi1 with the likelihood,

p(O,coiIy)dOlldij cc(27T)_k |An|IAI(n k1) exp


2 E

. C H ojvij dO do-i


of To findthe marginaldistribution 0 we can integrateout the cii by comparingthe righthand side of the joint posteriordistributionwith the Wishart distribution(Wilks, 1962, page 551, equation 18.2.27), notingthat, whileWilks's variables are the vij,ours are the o-ii.

that It follows


where is the normalizingconstant.



The Bayesian estimationof commonparameters


a for density involving, therefore,remarkably simpleform theposterior apart We find, is, which ofcourse, of only of from constant, a power thedeterminant thedispersion a matrix of generalization least we a function 0 only. In particular, noticethat an appropriate of provided.If we wishto obtainpointestimates forthe 0h' we squaresis immediately a7h the shouldminimize value ofthedeterminant IvijI. It is, of interest notethe relation to between thiscriterion the one thatwouldbe and of matrix wereknown.If Vijis the co-factor vjj in if appropriate the variancecovariance we can write vijI, k k k k



E vijvij =

EijVij j=1

E vij*/k. E i=1 1=1

Thus zo is of the same form asz=

weights the o-ii. estimates theo-iireplacing unknown of maximum-likelihood this specialcases. in of We nowconsider general result thelight somefamiliar



to proportional the E vjjSoi but withquantities



model 441. Univariate linear regression Whenk = 1 andf1(xg, 0) = p(O) where v = n-rn

is shown be distribution readily to Oq theposterior xg,

m \2

constant + (0-0)' X'X(0 (I and

n =




q x

1954;Dunnett& Sobel,1954). (Cornish, multivariate t-distribution i.e. thewell-known rnodels with independent regression 4*2. Multivariate observations, linear the there a linear is so model, that vector of Supposethatfor ithelement theobservation
f (xg u,0)=

m E

XqXiu (i=1,2,...,k).

= 0 distribution thecommon of distributed thatoyij 0,i * j. Thentheposterior so pendently is givenby theproduct p(O) = constant
where v = n-m,

that the k elements of the observation vector (yju,Y2u Suppose further

are inde-

+ (O

tsYiu m





O 6qXu

the estimate of 0 from ith set of observationsi = 1, 2, ..., k,and Oiis the maximumposterior multivariatet-distributions.

(x' , x4u,.. ., xi) is the uth row (u = 1, 2, ..., n) of Xi, i.e. by the product of k independent
Biom. 52






= Suppose in the previous case k = 2, m = 1 and the models are given by f1 of distribution 0 is now the product of two univariate t-distributions
p(0) = constant

mean of 4-3. Distribution theweighted


0. The

A1 +

where Oirepresents the maximum posteriorestimate of 0 fromthe ith set of observations of i = 1,2. This corresponds the fiducialdistribution the weightedmean given by Fisher to (1961). The weightedmean problemis the simplestexample ofthe moregeneralproblemof fromseveral responses. (See also Sukhatme, 1938; Yates, 1939; combininginformation James, 1959).



Considera chemical reaction in which a product A is decomposingto formB in such a way that the rate ofreactionis proportionalto the proportionY, ofA leftunchanged.With a Y2 representing proportionof A which has reacted to formB, the systemis describedby

thedifferential equations
Yil= 1,




> 0)

at t= oo.

with respectto time t,with boundary conditions wherethe dot denotes differentiation


t= O V,



The equations (5.1) have solution We observe Yi and Y2 where

VI = e-ot, Y2 =

= ys?u Vju+ eju (u = 1, 2, .. n, i = 1, 2), whereoy c2 and p are unknown. = = ) = oj', = and where E(ejue2u) Pcr10r2 0i12, E(eju) 0, E(e4 It should be noted that Yl + Y2 * 1 when the responsesare observed separately. In consideringa parameter like the specificrate 05which is essentially positive, it is probably most realistic to take 0 = ln 6,-0oo < 0 < oo, as locally uniforma priori. This would be would mean, for example, that having guessed a value for qS, an experimenter about equally preparedto accept a value twice as big as he would to accept one-halfas big. Suppose two observationsare taken at each offivedistinctvalues ofthe timet. If a single response,Yi alone, or Y2 alone is available, the posteriordistributionof 0 will be given by P( IYi) Cc vjf5(i = 1,2). If, however,both Yi and Y2 are observed, the posteriordistribution fromboth sources will be whichnow makes use of information
P(0jY)0C (V11v22 -v12)

A set of manufactureddata for this type of example is given in the second and third columnsofTable 1,labelled Example 1. These data wereobtained by addingrandomnormal deviates to calculated values and taking lrll = 0 0004, 0r22 = 00016 and o-12= 00004 so was P12 = 0 5. Fig. 1 shows the posterior betweenthe errors that the correlationcoefficient for distributions Yi alone, Y2 alone and forYi and Y2 taken together. fromYi and Y2 separately,we see that they we VVhen considerthe posteriordistributions 0. presentconsistentevidence concerning As is to be expected,the precisionofthe estimate

The Bayesian estimationof commonparameters


fromYi (whichhas the smallervariance) is greaterthan that fromY2. Also, even though Yi providea distribution to and Y2 are correlated some extent,the two responsestaken together which is sharperthan eitherof those fromthe individual responses. Table 1. Data for examples,k
Example 1
tu Y1u Y2u

2, m = 1
Example 2

1 1 2 2 4 4 8 8 25 -

0 907 *915 *801 *825 *649 *675 *446 *468 *233 *187

0-127 *064 *134 *200 *274 *375 *570 *535 *792 *803

0 907 *915 *801 *825 *649 *675 *446 *468 *233 *187

0*142 *079 *160 *188 *315 *416 *624 *589 *838 *849

Y1and Y2











Fig. 1. Posteriordistributions.Example 1.

Lack offit
It is important noticehow the overall criterion to in may be upsetby lack offit a particular 0) response. Suppose the tth functionfitsbadly. Then the residual quantities yt -ft(xtu, may become excessivelylarge in magnitude even forthe indicated 'best' value of 0. Thus the vtj(= vjt)(j = 1,2, ..., k) will be affected and so will the co-factors (j VJj(=VJt) = 1, 2, ..., k). k k We recall that

zo =

&=1 j=l

XE JVi. Vi

act Since the co-factors as estimatedweightswe see that lack of fitof one factorcan affect the weightgiven to anotherin the overall criterion.







A second example (Example 2, Table 1) will serve to demonstratethe situation which may arise whenlack offitexists. In generating thesedata, the Yi columnwas taken as before but a different value of 0 was used to generatethe Y2 column. On inspectionof Fig. 2, we notice immediatelythat the evidence about 0 fromYi alone contradictsthe evidence from Y2 alone. This ofitself clearlyindicatesthat eitherone orboth ofthe modelsare inappropriate. We notice that lack of fit makes it appear that Yi and Y2 togetherprovide less precise evidence about 0 than does Yi alone.
20 -

/\--Y, alone

15 is-Y1











Fig. 2. Posteriordistributions.Example 2.

We are rarely in the position that we can safely a&sume a mathematical model to be adequate. Rather our attitude to the model should be that it is to be 'tentatively entertained' and provisionshould be made to check it. In multivariateproblemsit is not only necessaryto checkthe adequacy ofeach responsemodel individuallybut also to check their mutual consistency. In the practical analysis of problems of this sort it is importantthat the investigator should notresortimmediatelyto the joint analysis of responses. Rather he should: (i) Check the individual fitof each responseby analysing residuals, of from various responsesby comparing the (ii) Considerthe consistency the information posteriordistributions. In those cases wherehe is satisfiedwith the overall fithe can then proceed with the joint analysis. A formalmultivariatelack of fittest (which we hope to discuss in more detail at a latertime) could also be used in conjunctionwiththe less formalanalysismentionedabove.



The informative when thereis more power of this multivariateapproach is most striking than one parameter.This will be illustratedby a further example. In the study of kinetic mechanisms,a chemistwill oftenbe concernedwith estimating more than one parameter,forexample, the rate constants forthe system,and will wish to

The Bayesian estimationof commonparameters


do so using measurementsof several chemical products. Perhaps the simplest example which can illustratethis situation is a reaction of the type A -- B -> C. If VI, Y2 and 93 representthe proportionsof reactants A, B and C present at a particular time t, then the systemmay be describedby the differential equations
Y2 =

- 01VP,
VI - 02V21



with boundary conditions VI = I Y2 = V3= 0 at t = 0, and where the dot denotes difwithrespectto time t,and 01, 02 are unknownrate constants. Equations (6-1) ferentiation solution have t, j = e-1L (e-Ol -e-02t) (6-2) 01/(02-1)i 1 + (-02 e-O1t+ b1 e-02t)/(W2A- 01). As before,it is probably most natural for the experimenterto think in terms of the logarithmsofthe rate constants,Oi = ln Xi (i = 1, 2), and to regardthese as locally uniformly distributeda priori. Suppose observationsYl, Y2 and y3 of all threeproducts were available, two independent multivariateobservationsbeing made at each of six distinctvalues of time t, as shown in Table 2. We shall suppose these observationshave arisenfrom12 independent experimental runs, as would be the case if each run were carried out in a sealed tube, reaction being terminatedat the appropriatetime by sudden cooling. There are now threevariances and three covariances, all unknown.
Y2 = V3 =

Table 2. Data for examplek = 3, m = 2

1 1




1 1 2 2 4 4 8 8 16 16

0.959 *914 *855 *785 *628 *617 0-480 *423 *166 *205 *034 *054

0025 *061 *152 *197 *130 *249 0.184 *298 *147 *050 *000 *047

0028 *000 *068 *096 *090 *118 0.374 *358 *651 *684 *899 *991

The estimationsituationis completelyportrayedby the posteriordistributions.Analysis of Yi supplies information only on the parameter01. However, joint posteriordistributions for01 and 02 may be calculated whichreflect information the about 01 and 02 comingfrom (a) Y2 alone, (b) y3 alone, (c) Y2 and y3 jointly,and (d) Yl, Y2, and y3 jointly. It is particularly instructiveto compare these in the formof superimposedcontour diagrams. For clarity, we show in Fig. 3 onlya singlecontourforeach distribution.In each case, thisis the contour which includes approximately 99*75% of the posterior distribution.This approximate Bayesian regionis given by that contourforwhich (Box & Cox, 1964) logp(#)-logp(0) = 1X2(1J-C) where a = 0-0025.







It is easyto see thatthiswouldgivetheprecise if region the jointdistribution weremultia very variatenormal. Fig. 3 provides goodillustration thevalueofconsidering whole of the thana single rather the distribution posterior pointestimate.Essentially same pointhas madeby Barnardin discussing likelihood beenrepeatedly the principle.
082-In O2


'4 alone

Y,,Y2and Y3


Fig. 3. 99.75 % Bayesian regionsfor01 and


forExample 3 (N= 12).

sum, or some such functionof the specificrates. That this is the case forthis example is contour shown by the oblique 'north-westto south-east' orientation the crescent-shaped of forP(01 021Y3). In this particular example we see, by inspection of the solution fory3 in equations (6 2), that y3 is symmetric 01 and 02 and hence in 01 and 02* It follows in that, when y3 alone is considered,if any point (01, 02) is included in the Bayesian region,so must be its mirror image (02, 01) in the 01 = 02 axis. (This can lead, as in this example, to a double maximum.) The informationsupplied by the intermediate product Y2 is of essentially different character. From it we obtain information both specificrates but principally on the on difference ratio of the rates. It is noticeable that this regionis obliquely orientedin a or directionapproximatelyat rightangles to that ofP(01, 021Y3). When information fromY2 and y3 iS combined,we find,as would be expected, a much smaller regionin the neighbourhoodof the intersectionof the regionsforY2 alone and y3 alone. Finally,we can considerthe effect adding Yi whichprovidesinformation 01only. of on

of In any sequential reaction -->B - Ca..., ... we shouldexpectthat observation A etc., on onlythe end-product in ourexample)wouldprovide (C accurateinformation onlythe

The Bayesian estimation common of parameters


Study of the probabilitycontourfrom P(01, 02IY1Y y3) shows,again as would be expected, Y2) that the regionis changedprincipallyin beingnarrowedin the 01 direction.Point estimates may be obtained corresponding the maximumposterior to densities.The estimatesavailable fromthe various sources are shown in Table 3. This example perhaps serves to illustratethe caution and commonsense which ought to be applied in analysingthis type ofproblemand the necessityforconsidering each problem individuallyand not hopingfor'automatic' answersfrom computerprogram.The authors a know of one case where an elaborate consecutivemechanisminvolvingten constants was postulated, but observationswere taken on the single end-productonly and on no intermediate product. An iterativenon-linearestimationroutinerun on a computerconverged only veryslowlyon to what appeared to be nonsensicalanswers.This was undoubtedlydue to the peculiarities of the ten-dimensionallikelihood surface,which probably contained multidimensional ridges and multiplemaxima. Observations on additional responses and application of the theory of this paper, would have eliminated many, if not all, of the ambiguities. In the complete set of data just used the reaction is almost complete when the last observationis taken. In practice,forone reason or another,data ofthiskind occurin which the available observationstrace only part ofthe course ofthe reaction.We have illustrated the effect this by omittingthe last fourobservationson Yl, Y2 and y3 and repeatingthe of analysis. As shown in Fig. 4, over the ranges studied, the contoursforP(1IY2) and p(0yI3) do not close. Neverthelessquite precise estimationis possible using Y2 and Y3 togetherand the addition of Yi improves the estimation further.Table 3 shows the final estimates remarkablyclose to those obtained before. Table 3. EstimatesforO1and o2fromvarioussources
When all 12 observationsare used Responses used

When the last four observationsare omitted


-1-561 -1-619
-1-585 -1.570


-0.685 -0.635 -O0707


-1-715 -1-273
-1-561 -1.565

-0.942 -1-022

Yl, Y2'

Y2' Y3




Note. For y3 alone, interchangeof the coordinatesfor?1 and see text.

provides alternativeestimates;

7. APPENDIX (see ? 3) We use results due to Hsu, given by Deemer & Olkin (1951); the full notation of that paper is not requiredhowever. Result. If E = {orij} and A = then the Jacobian

{ori} are two symmetric matrices of order k,

has value IAI-(k+1). Proof. By Property 5B. 3, page 366 of Deemer & Olkin, D(A, E) = D(A*, E*), where A* = dA and E* = dE are the matrices of differentials. Taking differentials E = A-' of

(0ij)1a(cri-) = D(orij, ori) = D(A, E)








by Theorem 4 4, page 357, ofDeemer & Olkin. By Theorem 3-7and Corollary3 7, page 348 (A')-' we see that of Deemer & Olkin, and noting that since A is symmetric,A-' D(A*, E*) = IAI-(k+l)whence the resultfollows.






Y2eand r

Fig. 4. 99 75 %/Bayesian regions. For 0,,and 02; forExample 4 (N =8).


variate analysis, based on lectures by P. L. Hsu. Biometrika, 38, 345-67. DUNNETT, C. W. & SOBEL, M. (1954). A bivariate generalization of Student's t-distribution, with tables for special cases. Biometrika, 41, 153-69. FISHER, R. A. (1961). The weighted mean of two Normal samples with unknown variance ratio.

55, 42. Box, G. E. P. & Cox, D. R. (1964). An analysis of transformations.J. R. Statist. Soc. B, 26, 211-43. t-distribution associated witha set ofnormalsample deviates. CORNISH, E. A. (1954). The multivariate Austr.J. Phys. 7, 531-42. DEEMER, W. L. & OL:KIN, I. (1951). The Jacobians of certain matrix transformations useful in multi-

AITKEN, A. C. (1935). On least squares and linear combinationof observations. Proc. Roy. Soc. Edin.

Sankhya, 23 A, 103-14.


JAMES, G. S. (1959). The Behrens-Fisher distribution and weighted means.

J. B. Statist.Soc. B, 25, 368-76.


S. & CORNFIELD, J. (1963).

Posterior distributions for multivariate normal parameters. J. B. Statist. Soc. B, 21,


H. (1961). TheoryofProbability (3rd edition). Oxford: Clarendon Press.

The Bayesian estimationof commonparameters



L. J. (1961). The Subjective Basis of Statistical Practice. (Mimeographedmanuscript.) Universityof Michigan. SUEHATME, P. V. (1938). On Fisher and Behrens' test of significance the difference means oftwo for in normal samples. Sankhya,4, 39-48. TIAO, G. C. & ZELLNER, A. (1964). On the Bayesian estimation of multivariate regression. J. B. Statist.Soc. B, 26, 277-85. WILKS, S. S. (1962). Mathematical Statistics. New York: John Wiley and Sons, Inc. YATES, F. (1939). An apparent inconsistencyarising from tests of significance based on fiducial distributions unknownparameters. Proc. Camb. Phil. Soc. 35, 579-91. of