revised for Human and Ecological Risk Assessment, 97-01

Fitting Second-Order Parametric Distributions to Data Using Maximum Likelihood Estimation David E. Burmaster Alceon Corporation PO Box 382669 Harvard Square Station Cambridge, MA 02238-2669 tel: 617-864-4300; fax: 617-864-9954 deb@Alceon.com Keywords variability, uncertainty, second-order random variables, maximum likelihood estimation Abstract Second-order random variables, i.e., parametric random variables with uncertain parameters, give risk assessors a way to distinguish and represent both the variability and the uncertainty in an exposure variable. In this manuscript, we explore ways to fit second-order random variables to data using maximum likelihood estimation (MLE). 1.0 Introduction Kimberly M. Thompson Harvard University School of Public Health 718 Huntington Avenue Boston, MA 02115 tel: 617-432-4345; fax: 617-432-0190 KimT@hsph.harvard.edu

When estimating exposures to chemicals in the environment, risk assessors need a way to represent both the variability and the uncertainty in the exposure variables (see, e.g., NAS, 1994; NCRP, 1996; Burmaster & Wilson, 1996). In a probabilistic exposure assessment, a risk assessor may use a second-order probability distribution to represent the variability and the uncertainty in one or more of the exposure variables (Bogen, 1990; MacIntosh et al, 1994; McKone, 1994; Frey & Rhodes, 1996; Hattis & Barlow, 1996; Price et al, 1996). Although other authors have used either professional judgment (e.g, Hoffman & Hammonds, 1994; NCRP, 1996; Barry, 1996; Cohen et al, 1996) or the bootstrap method (e.g., Frey, 1996) to develop second-order random variables used in calculations and in "two-dimensional" Monte Carlo simulations, we show how to use the method of maximum likelihood estimation (MLE) to fit secondorder distributions to data. In a "two-dimensional" Monte Carlo simulation, some or all of the input variables and all of the output variables are represented by second-order random variables (Burmaster & Wilson, 1996).

31 May 1997

1

© Alceon ®, 1997

revised for Human and Ecological Risk Assessment, 97-01

Sir Ronald A. Fisher developed the method of maximum likelihood estimation (MLE) as a powerful, general purpose method for fitting a parametric distribution to data. The general idea is to choose an estimator for the parameter(s) in a distribution as that function of the sample observations (i.e., data) which will, when substituted into the distribution, make the probability of the sample a maximum (paraphrased from Keeping, 1995). He later generalized the idea to develop joint confidence regions for the parameter(s), an idea that was further generalized to the profile likelihood method for marginal distributions for parameter(s). In this manuscript, we apply some of the standard MLE techniques to an original and perturbed version of each of three data sets to show how to fit a second-order random variable to data. Before using the MLE methods, we first select an appropriate family of distributions through exploratory data analysis. In each of these six cases, MLE methods readily determine the joint and marginal distributions for the data sets in a way that can be easily used in "two-dimensional" Monte Carlo simulations. 2.0 Three Data Sets, Each with a Perturbation

In this manuscript, we consider three data sets, each in its original form and also in a perturbed form. The first two data sets are synthetic (and unitless), but the third data set comprises clinical measurements of body weights (kg) for a random sample of adult women between the ages of 18 and 40 years. For each of the three data sets, we investigate the original data and the perturbed data to see how the perturbation changes the second-order distributions fit by maximum likelihood estimation (MLE). Data Set 1 Original (DS1o in Table 1), a synthetic data set, contains 19 positive values drawn randomly from a LogNormal distribution of the form exp[Normal(µ, σ)] with µ = 2 and σ = 1 and then rounded to the nearest integer. The arithmetic mean of the parent distribution equals exp[µ + 0.5 σ2 ] = 12.2, approximately, and the arithmetic mean of this sample equals 14, exactly. When tested by the Wilk-Shapiro (W-S) test for Normality (Madansky, 1988), the natural logarithms of these 19 data points do not fail using standard criteria (p-value = 0.1492). Data Set 1 Perturbed (DS1p in Table 1), also a synthetic data set, contains 19 positive values. The first 18 values are the same as those in DS1o, but we have perturbed the largest value to change it from 101 in DS1o to 301 in DS1p. The arithmetic mean of these 19 values equals 24.5, approximately. When tested by the Wilk-Shapiro (W-S)
31 May 1997 2 © Alceon ®, 1997

In the first step. not all the individual values. but from half as many subjects). The first 24 values are the same as those in DS2o. however. we fit second-order random variables to the original and the perturbed versions of each of the three data sets in five steps. the original researchers reported only key percentiles of the data. Data Set 3 Perturbed (DS3p in Table 3) differs from the original measurements in the sense that we consider how our inferences would change if the same percentile statistics arose from measuring the body weights of 979 women (i. The arithmetic mean of these 25 values equals approximately 0. the same percentiles for body weight. In this instance.28. contains 25 positive values. we will use a more general form of MLE methods to fit a second-order distribution to these "binned" data. Since the values in DS3o come from empirical measurements for a random sample of adult women.28. we do know that one or two LogNormal distributions have been fit successfully to empirical measurements for the body weights of another (larger) random sample of adult women between the ages of 18 and 75 years (Brainard & Burmaster. and the arithmetic mean of this sample equals approximately 0. 31 May 1997 3 © Alceon ®. we use graphical methods from exploratory data analysis to see if a parametric distribution may reasonably fit the data. 3. contains 25 positive values randomly drawn from a Beta distribution of the form Beta[α.revised for Human and Ecological Risk Assessment. also a synthetic data set.61 in DS2o to 0. but we have perturbed the largest value to change it from 0. exactly.0 Overview of the MLE Approach As detailed in the next sections. a synthetic data set. 97-01 test for Normality. we do not know a priori if they can be fit by a parametric distribution. i.958 adult women in the United States between the ages of 18 and 40 years (Stern.e. Data Set 3 Original (DS3o in Table 3). The arithmetic mean of the parent distribution α equals (α+β) = 0. contains values of body weight (kg) measured for a random sample of 1. 1992). Since we know selected percentiles for DS3o and do not know the individual measurements in the data set. the natural logarithms of these 19 data points do fail using standard criteria (p-value = 0. β] with α = 2 and β = 6 and then rounded to two decimal places.75 in DS2p. Data Set 2 Perturbed (DS2p in Table 2). 1996). In the second step.0096).e..25. we fit a first-order random variable. Data Set 2 Original (DS2o in Table 2). a real data set. 1997 .

072 1. for example: Mood et al. 1995). it is important to check the intermediate and final results using graphs and plots. to the data. 1996. we develop and explore the likelihood function (and the loglikelihood function) for the data (see. 1957. we use LogNormal probability plots (Burmaster & Hull. σ ] Eqn 1 which is equivalent to: X ~ exp[ Normal[ µ. 1997 . respectively. 1974. Combining the first and second steps. we find these point values for µ and σ from the intercept and the slope. (ii) first-order random variables with a single underscore.1-O and -P. 4. Crow & Shimizu. Edwards.922 4 © Alceon ®. Although the MLE method is quite general.0 The Method Applied to Data Set 1 -. 1989. (ii) second-order random variables with a double underscore. 1992. and Normal[ µ. From the probability plots ˆ ˆ shown in Figures 1.revised for Human and Ecological Risk Assessment. 1990). 97-01 an ordinary random variable represented by parametric distribution with fixed parameters. we denote (i) real variables and real functions with plain letters.014 0. Keeping. In the final step. In the next sections. exp[ • ] represents the exponential function. σ ] ] Eqn 1' where ln[ • ] represents the Napierian (or natural) logarithm function. σ ] represents the Normal or Gaussian distribution with mean µ and standard deviation σ (with σ > 0).992 0. In the third step.845 2. we differentiate the loglikelihood function to develop and fit second-order random variables to the data (Cox & Snell.110 0. Ross. of the straight line fit to the plot using ordinary least-squares regression: Data Set 1 Original ˆ µ ˆ σ adjR2 31 May 1997 Data Set 1 Perturbed 2.Original and Perturbed We know a priori that these synthetic data come from a LogNormal distribution. and (iv) vectors or matrices in bold. 1986) to fit this LogNormal distribution to the original and the perturbed versions of DS1 (Aitchison & Brown. 1988): ln[ X ] ~ Normal[ µ. D'Agostino & Stephens.

revised for Human and Ecological Risk Assessment.1 • (ln(xσ)2.µ)2 • exp[ .2 ln[2 π] .ln[σ] .2 • Σ[ ] σ2 Eqn 6 2 = = The values of µ and σ that maximize the loglikelihood function for the sample are called ˆ ˆ the MLE estimates µ and σ. σ ] ] Eqn 5 The loglikelihood function for the N independent random samples is a function of µ and σ: LogLikelihood J[ µ.µ)2 • exp[ . σ ] = 1 (ln(x) . 97-01 Based on standard methods in logarithmic space.µ) ] 2 N 1 (ln(xi) .2 • ] σ2 σ  π √2 1 Eqn 2 and the likelihood function for a single. is: p[ µ. the probability distribution for drawing a single random value from the model in Eqn 1 is (Evans et al.2 • ] σ2 σ  π √2 1 Eqn 3 In this framework. σ ] = Σ [ ln [ p[ µ.072 2. randomly drawn sample. the probability of drawing N independent random samples is: Probability = Π [ p[ ln(xi) | µ.2 ln[2 π] . 1997 . σ | ln( xi ) ] = 1 (ln(xi) . σ | ln(xi) ] ] ] 1 i Σ [ . xi. 1993): p[ ln[ X ] | µ.N ln[σ] . σ ] ] Eqn 4 and the likelihood function for the N independent random samples is: Likelihood = Π [ p[ ln(xi) | µ.014 5 © Alceon ®. each is a point value. Data Set 1 Original ˆ µ 31 May 1997 Data Set 1 Perturbed 2.µ)2 .

1974. Again using standard methods (Mood et al. σ ] χ2 0. 90. respectively. Wickham-Jones. Again using standard methods (Mood et al. Similarly. ∑ 31 May 1997 = Inverse[ ObsInfo ] 6 Eqn 9 © Alceon ®. In Figures ˆ ˆ 1.991 2 (with df = 2) Eqn 7 = where χ 2 0. the observed information matrix for the sample equals: ObsInfo = ∂2J ∂µ2 ∂2J ∂σ∂µ ∂2J ∂µ∂σ ∂2J .163 -29. σ }. 1995). Keeping. the 95-percent joint confidence region is defined by this contour: J[ µ.2-O and -P show the 95-. Edwards. σ ] = ˆ ˆ J[ µ .revised for Human and Ecological Risk Assessment. intermediate.05 refers to the ChiSquared distribution with two degrees of freedom (df = 2). Chapter 2) present and discuss similar plots (and their corresponding marginal distributions) in an illuminating way.50. the loglikelihood function has a single maximum at { µ . σ Eqn 8 and. certain contours of the loglikelihood function define the joint confidence region for { µ. 1989.8 ˆ ˆ In this example.10 and χ2 0. µ and σ are distributed according to a MultiVariate Normal (MVN) distribution with this variance-covariance matrix: [EndNote 1].2-O and -P. substituted into Eqn 7 (each with df = 2). the dots near the center of the plot show the locations of { µ . the 90-percent and 50-percent joint confidence regions follow similar contours with χ2 0. under the standard Taylor series approximation and the standard regularity conditions (both met in this example). 1995). The solid lines Figures 1.05 2 5. 97-01 ˆ σ max J 0. σ ] ˆ ˆ J[ µ . smallest ovals.997 -26.and 50-percent joint confidence regions as the largest. 1991. 1974.9 1. respectively (Wolfram. 1994). σ }. Cox & Snell. For example. Edwards. Keeping. 1992. σ }. 1997 . 1989. 1992. Cox & Snell.2 ∂σ - ˆ ˆ µ. Box & Tiao (1973.

05 2 ]  Var(µ) Var(σ) √  √ 1- Cov2( µ.50. the ellipse that approximates the 95-percent joint confidence region for { µ.162) 0 Data Set 1 Perturbed N(2. the approximations to the joint confidence regions are ellipses.163. 0. σ ] = 2π 1 • exp [χ 2 0. we find that ˆ ˆ µ and σ are each well approximated by Normal distributions with vanishing correlation. 0.revised for Human and Ecological Risk Assessment. χ 2 0. Similarly. 0. we have now fit this second-order random variable to the data: ln[ X ] ~ Normal[ µ. 97-01 =  Var[ µ ]  Cov[ σ µ ]  Cov[ µ σ ]   Var[ σ ]  Eqn 10 With the Taylor series approximation to the loglikelihood function. For example.072. Applying these methods to the original and the perturbed versions of DS1. respectively.229) N(0. σ ] N(2.997. σ ] ] Eqn 11' 31 May 1997 7 © Alceon ®. Thus.014. σ } is this contour of the MultiVariate Normal distribution (MVN) with df = 2: MVN[ µ. substituted into Eqn 10 (each with df = 2). the 90-percent and 50-percent joint confidence regions follow similar ellipses with χ 2 0.267) N(1. σ ) Var( µ ) Var( σ ) Eqn 10 In Eqn 10. σ ] Eqn 11 which is equivalent to: X ~ exp[ Normal[ µ. Data Set 1 Original ˆ µ ˆ σ ˆ ˆ Corr[ µ .05 refers to the ChiSquared distribution with two degrees of freedom (df = 2).189) 0 ˆ and with the constraint σ > 0. 1997 .10 and χ2 0. 0.

90.1-P. Combining the first and second steps. σ = 1 }.2-O. in Figures 1. In contrast. the ˆ ˆ estimates { µ . 97-01 The dashed lines Figures 1. and all fall within the 95-percent confidence ˆ ˆ band in Figure 1. Finally. 31 May 1997 8 © Alceon ®. respectively. smallest ellipses. the lines show the 95-percent confidence bands on the probability plot using the isopleths developed in Burmaster & Wilson (1996). the estimates { µ . the joint confidence regions developed from the Taylor series approximation to the loglikelihood function (the ellipses shown with dashed lines) are similar to the joint confidence regions developed directly from the loglikelihood function (the ovals shown with solid lines). In these figures.2-P. and do not all fall within the 95percent confidence band in Figure 1.1-O. have small joint confidence regions (ellipses and ovals) in Figure 1.Original and Perturbed We know a priori that these synthetic data come from a Beta distribution. the 19 data points in DS1p do not pass the Wilk-Shapiro test for Normality. As estimated by different methods.3-O. σ } are close to { µ = 2. the ellipses (dashed lines) and the ovals (solid lines) will converge. 5.1-O and -P. As the number of data points increases. 1997 . Discussion: As expected. σ } differ from { µ = 2. [EndNote 2].0 The Method Applied to Data Set 2 . we could have estimated α ˆ and β by nonlinear regression.3-O and -P. β ] Eqn 12 ˆ On the empirical CDF plots shown in Figures 2.revised for Human and Ecological Risk Assessment. have a good-sized outlier on the probability plot in Figure 1. intermediate. are well fit on the probability plot in Figure 1. As estimated by different methods.2-O and -P show the 95-. Overall. σ = 1 } by about 4 percent and 20 percent. the values used to synthesize the 19 original data points. we use empirical cumulative distributions (CDFs) to fit this Beta distribution to the original and the perturbed versions of DS2: X ~ Beta[ α. respectively.and 50-percent joint confidence regions as the largest. have larger joint confidence regions (ellipses and ovals) in Figure 1. as expected.3-P. the perturbation applied to the largest datum propagates into ˆ ˆ ˆ ˆ larger relative changes in σ and σ and into smaller relative changes in µ and µ . the 19 data points in DS1o pass the Wilk-Shapiro test for Normality.

β | xi ] ] ] Σ [ (α-1) ln[ xi ] + (β-1) ln[1 . is: p[ α. 1997 .revised for Human and Ecological Risk Assessment. certain contours of the loglikelihood function define the joint confidence region for { α. 97-01 Working in linear space. Data Set 2 Original ˆ α ˆ β max J 2. β ) =  u α-1 (1-u) β-1 du.xi)β-1 BetaFn( α. β }. β ) Eqn 14 In this framework.ln[BetaFn( α.xi] . the loglikelihood function has a single maximum at { α. For example. β }. ⌡0 randomly drawn sample. 1993): p[ X | α. β }.686 11. β ] = = Σ [ ln [ p[ α. the probability distribution for drawing a single random value from the model in Eqn 12 is (Evans et al.874 4.2-O and -P.099 5. the dots near the center of the plot show the locations of { α. β ] = xα-1 (1 . β | xi ] = xiα-1 (1 .2 Data Set 2 Perturbed 1. Using the same standard methods. In Figures ˆ ˆ 2. The likelihood function for a single. β ) ] Eqn 15 The values of α and β that maximize the loglikelihood function for the sample are called ˆ ˆ the MLE estimates α and β. xi.x)β-1 BetaFn( α. β ) Eqn 13 ⌠1 where BetaFn( α. the 95-percent joint confidence region is defined by this contour: 31 May 1997 9 © Alceon ®. each is a point value.6 ˆ ˆ In this example. the loglikelihood function for N independent random samples is: LogLikelihood J[ α.479 13.

the ellipse that approximates the 95-percent joint confidence region for { α.479.099. β ) Var(α) Var(β) Applying these methods to the original and the perturbed versions of DS2.2 ∂β - Eqn 17 = ˆ ˆ α. 0.and 50-percent joint confidence regions follow similar contours with appropriate values of the ChiSquared distribution substituted in Eqn 16. β Eqn 18 With the Taylor series approximation to the loglikelihood function. 1. Under the same assumptions as the first example.874.686.333) 31 May 1997 10 © Alceon ®. β ] = χ2 0.05 ˆ ˆ J[ α.revised for Human and Ecological Risk Assessment.2 (with df = 2) = Eqn 16 and the 90. positive correlation.991 ˆ ˆ J[ α.555) N(5.559) Data Set 2 Perturbed N(1. Data Set 2 Original ˆ α ˆ β N(2.493) N(4. 1. 97-01 J[ α.2 5. the approximations to the joint confidence regions are ellipses. we find that ˆ ˆ α and β are each approximated by Normal distributions with large. 0. β ] . β ] = 2π 1 • exp [χ 2 0. 1997 . α and β are distributed according to a MultiVariate Normal distribution (MVN) with the variance-covariance matrix equal to the inverse of the observed information matrix for the sample: ∑ =  Var[ α ]  Cov[ α β ]  Inverse Cov[ α β ]   Var[ β ]  ∂2J ∂α2 ∂2J ∂β∂α ∂2J ∂α∂β ∂2J . β } is this contour of the MultiVariate Normal distribution (MVN) with df = 2: MVN[ α. β ] .05 2 ] Eqn 19  Var(α) Var(β) √  √ 1- Cov2( α. For example.

The results here correspond to this second-order random variable: X ~ Beta[ α.3-P. β ) 0. Letting BW denote the distribution of body weight. In this example. as expected. the estimates { α .3-O and -P.0 The Method Applied to Data Set 3 . In contrast. D'Agostino & Stephens. 90. but they have relatively large joint confidence regions (ellipses ˆ ˆ and ovals) in Figure 2. 1996. Note that Corr( α .1-O. the joint confidence regions developed from the Taylor series approximation to the loglikelihood function (the ellipses shown with dashed lines) are similar to the joint confidence regions developed directly from the loglikelihood function (the ovals shown in solid lines). As estimated by different methods. β = 6 }. we approximate the 95-percent confidence bands on the empirical CDFs by plotting four Beta distributions in each with parameters chosen from the ends of the major and minor axes of the corresponding 95-percent ellipses. 1997 . the 19 data points in DS1p have a better fit on the empirical CDF in Figure 2. smallest ellipses.832 ˆ ˆ ˆ ˆ and with the two constraints α >1 and β > 1. have smaller joint confidence regions (ellipses and ovals) in Figure 2. β ] Eqn 20 The dashed lines Figures 2. As the number of data points increases.850 0.and 50-percent joint confidence regions as the largest.Original and Perturbed We do not know a priori that these measured data come from a parametric distribution.2-P. 97-01 ˆ ˆ Corr( α .2-O and -P show the 95-. Discussion: As expected. β ) is the correlation ˆ ˆ between α and β . and also fall within the 95-percent confidence band in Figure 2.1-P. the 19 data points in DS2o are reasonably fit on the empirical CDF plot in Figure 2.revised for Human and Ecological Risk Assessment. the values used to synthesize the 19 original data points. respectively. the ellipses and the ovals will again converge.2-O. the perturbation applied to the largest datum propagates into a smaller joint ˆ ˆ confidence region for α and β . β } are reasonably close to { α = 2. in Figures 2. 6. In these figures. intermediate. Finally. 1986) to see if a LogNormal distribution can model the original and the perturbed versions of DS3: 31 May 1997 11 © Alceon ®. we follow the lead of Brainard and Burmaster (1992) by using LogNormal probability plots (Burmaster & Hull.

we immediately see that we may need to consider a more complex model. we proceed to fit a single LogNormal distribution to the ˆ ˆ data. we must use a cumulative distribution function (CDF) to develop the loglikelihood function for these "binned" data (e.222 0. σ)) = 1 2 [ 1 + Erf[ x . plo} and {BWhi.revised for Human and Ecological Risk Assessment. The CDF for the univariate Normal distribution is: CDF(Normal(x | µ. Tanner. 1992). σ] .µ ] ] σ 2 √ Eqn 22 where Erf[ • ] denotes the Error Function (Abramowitz & Stegun. 1996.171 0.plo) ) and the loglikelihood function for all of the measurements is: 31 May 1997 12 © Alceon ®. we estimate µ and σ in Eqn 21 by fitting a straight line to the plot using ordinary least-squares regression: Data Set 3 Original ˆ µ ˆ σ adjR2 4.1-O and -P. 1964).. Given that the probability for one of the measurements for body weight (BW in kg) in the bin between {BWlo.g. plo} and {BWhi. phi} is: [ CDF[ ln[ BWhi] | µ. perhaps a mixture model (Brainard & Burmaster. phi} is: [ CDF[ ln[ BWhi] | µ. σ ] Eqn 21 Since the points plotted in Figures 3. for these data. As shown in Figures 3. σ ] ] ^ ( N (phi .1-O and -P fall in a pattern with positive curvature. σ ] ] the probability for all of the measurements in the bin between {BWlo. 15).CDF[ ln[ BWlo] | µ. since the curvature in each plot is small. However. σ] . p.CDF[ ln[ BWlo] | µ.967 Data Set 3 Perturbed same same same Since we do have the individual measured data. 97-01 ln[ BW ] ~ Normal[ µ. 1997 .

the marginal distributions are uncorrelated.171. 1992). the joint confidence regions (in Figures 3.revised for Human and Ecological Risk Assessment.958 to 979. we use MLE to fit second-order random variables to the original and perturbed data.0051) N(0. σ ] = N• ∑ ( phi . σ | ln[ BWlo] ] ] Eqn 23 ˆ ˆ The MLE estimates µ and σ are: Data Set 3 Original ˆ µ ˆ σ max J 4.CDF[ µ.0052) 0 ˆ with the constraint σ > 0. As a first option. 1992).226.plo ) • ln[ CDF[ µ.171. we have two options to proceed. At this point.1-O and -P suggest the need for a mixture model (see Figure 4 and related text in Brainard & Burmaster. Discussion: Even though Figures 3. 97-01 J[ µ.171 0.0037) 0 Data Set 3 Perturbed N(4. Data Set 3 Perturbed same same -3926. 1997 . As N decreases from 1. σ | ln[BWhi] ] . 0. by taking the inverse of the observed information matrix. we could use professional judgment to increase the amount of uncertainty in the second31 May 1997 13 © Alceon ®.2-O and -P) grow in size but the 95-percent confidence bands (in Figures 3. 0.226 -7853. As a second option.226. σ ] N(4. the parameters for the fitted second-order distributions are: Data Set 3 Original ˆ µ ˆ σ ˆ ˆ Corr[ µ . As expected.3-O and -P). We could then consider other parametric distributions or mixture models (Brainard & Burmaster. we could acknowledge that the second-order LogNormal distributions just fit to the data do not capture the variability and the uncertainty in the data. 0.0072) N(0. and.3-O and -P) widen but do not enclose all the points in either plot (in Figures 3. 0.

Frey & Burmaster. the analyst must choose a family of (parametric) probability distributions. measurements reported as "nondetect" with a stated detection limit. two. it works with every type of parametric distribution. the analyst must discriminate carefully among them. Thus. If the analyst chooses an inappropriate family of distributions.0037 ). 1995. 31 May 1997 14 © Alceon ®. (ii) methods based on the bootstrap technique (Efron & Tibshirani. 1996.g. Third. including mixtures of parametric distributions. Sivia. Gelman et al. Second. 1973. the analyst must use many lines of reasoning -.. Fourth. 1997). especially: (i) methods based on Kolmogorov-Smirnov techniques (Bickel & Doksum. it works with censored and/or binned data. the MLE method will lead to suboptimal or erroneous results. it can be difficult to visualize the results if a distribution (or mixture of distributions) has more than three parameters. Fifth. If several families of distributions appear to fit the data reasonably well. 1997 . Finally. the fitting procedures often require more power than is typically built into commercial spreadsheet programs. 1996). or three fitted parameters. when using the MLE method. 97-01 ˆ order distribution. Sixth. as the number of data points grows large. The MLE method also has some limitations: First. it converges asymptotically to Normal theory and produces joint confidence regions as ellipses. e. the 95-percent confidence bands in Figures 3. Third. it works with truncated distributions. O'Ruanaidh & Fitzgerald. 0. 1991 and 1993.226.revised for Human and Ecological Risk Assessment.from computer visualizations to professional judgment -when selecting the family of distributions. it produces joint confidence regions with the proper correlations among the parameters being estimated. Second. For example. 1977). and (iii) methods based on Bayesian techniques (Box & Tiao. when we increase the uncertainty in σ to Normal( 0.3-O and -P spread to enclose all the data points in each figure.0 Discussion and Conclusions The MLE method has many strengths: First. 7. it produces results that are easily visualized and used in "twodimensional" Monte Carlo simulations. with one. we recognize the need to explore other approaches to fitting second-order random variables to data as well.

1996): ln[X] [ zU | zV ] • µµ + zV • µσ + zU • √)2  (σµ)2 + (zV • σσ where ln[X] [ zU | zV ] denotes the point value at zU conditional on zV." Thus. Bosch.440 and then exponentiate the result. 1997 .wri. In this equation.alceon. Cox. the MultiVariate Normal distribution has this probability density function (Anderson.645 and zU = 0.2 • (ξ . Rose & Smith.com Alceon ® is a registered trademark of Alceon Corporation: http://www. the symbol "•" denotes "is approximately equal to.µ) ∑ (ξ . first evaluate this equation with zV = 1. 1958. 31 May 1997 15 © Alceon ®. and Murray D.µ) ] (2 π) p[ ξ ] = 2. Inc: http://www. Trademarks Mathematica ® is a registered trademark of Wolfram Research. Jr. The isopleths for the confidence bands are (Burmaster & Wilson. Christopher Frey.com EndNotes 1. Alceon Corporation funded this research.revised for Human and Ecological Risk Assessment. H. With ξ as a k × 1 vector. 97-01 Acknowledgments We thank Ronald J. 1990). Smith for helpful suggestions during this research. 1996): ξ ~ MVN[ µ. Dedication We dedicate this manuscript to Frederick Mosteller. to approximate the point value for the 67th percentile of uncertainty on the 95th percentile of variability of the LogNormal distribution. Louis A.. ∑ ] ∑ |∑|1/2 1 1 T -1 ξ ξ k/2 • exp[ . We also thank two anonymous reviewers for excellent suggestions for improving the manuscript. and we thank Ronald J. Bosch for doing the Wilk-Shapiro tests (SAS.

. J. and J. New York. 97-01 References Abramowitz & Stegun. in press Cohen et al. 1996. pp 267 . UK Crow & Shimizu. 1996 Burmaster. 1988 Crow.P. Handbook of Mathematical Functions with Formulas. 1990. 1988. and T. London. An Introduction to Multivariate Statistical Analysis. 1990 Bogen. G. M.275 Burmaster & Wilson. New York. Distributions on a Budget. 1973 Box. Cambridge University Press. and E.C. UK Anderson. US Government Printing Office. Human and Ecological Risk Assessment. New York. K. 1973.. Shimizu. Volume 2.E. 1992 Brainard. 1996 Burmaster. Garland Publishing. Tiao. NY Box & Tiao. 1997 . and I. D. New York. Human and Ecological Risk Assessment. Cambridge. Wiley. Eds. T.R. Analysis of Binary Data. Bayesian Inference in Statistical Inference. Risk Analysis. presented at US EPA's Workshop on Monte Carlo Analysis. Holden-Day. and K. NY Brainard & Burmaster. 1996. 1992. Lampson. Basic Ideas and Selected Topics. An Introduction to Second-Order Random Variables in Human Health Risk Assessment. 1957. and G. and K.W. San Francisco. NY 31 May 1997 16 © Alceon ®. 1996 Cohen.revised for Human and Ecological Risk Assessment. Applied Mathematics Series Number 55.T. 1992. 1989. 14 May 1996. Snell. Hull. The Use of Two-Stage Monte Carlo Simulation Techniques to Characterize Uncertainty and Variability in Risk Analysis. Volume 2. Bivariate Distributions for Height and Weight of Men and Women in the United States.. Chapman & Hall.919 Burmaster & Hull. Tenth Printing with corrections in December 1972.C. CA Bogen. 1989 Cox. E. D. Wilson.A. and D. J. The Lognormal Distribution. 1964.A.A. 1958 Anderson.J. Number 2.S. pp 892 . Issued June 1964. and D. DC Aitchison & Brown. Second Edition. P. New York. D.M.A. Doksum.J. 1996. Burmaster.T. J. pp 939 . Stegun. Bowers. Using LogNormal Distributions and LogNormal Probability Plots in Probabilistic Risk Assessment. Wiley & Sons. and A. Brown.E. 1957 Aitchison. Human and Ecological Risk Assessment. Number 4.971 Cox & Snell. Theory and Applications.A.. 1977. 1996. Mathematical Statistics. Washington..E.E. 1996 Barry. NY Bickel & Doksum. Volume 12.L. 1977 Bickel. Uncertainty in Environmental Risk Assessment. National Bureau of Standards. Lognormal Distributions. Graphs. and Mathematical Tables. Number 4. T. NY Barry. Eds. Marcel Dekker. 1964 Abramowitz. 1958.M. M.

Hammonds. and M. Risk Analysis.. . An Introduction to the Bootstrap. Propagation of Uncertainty in Risk Assessments: The Need to Distinguish Between Uncertainty due to Lack of Knowledge and Uncertainty due to Variability. B.B. 1996. Volume 2.220 Hoffman & Hammonds.B. New York. 1996. and J. Stern. 1988 Madansky. F. H. pp 707 . A.. Science. NY MacIntosh et al. Number 4. Baltimore. Springer-Verlag. pp 390 .. Statistical Data Analysis in the Computer Age. Rhodes. 1991. New York. Methods for Characterizing Variability and Uncertainty: Comparison of Bootstrap Simulation and Likelihood-Based Approaches. and R. 1994. Volume 7. Volume 253. 1994 Hoffman. Marcel Dekker. 1993. 1986. Prescriptions for Working Statisticians. and D. London. Barlow. and R. Bayesian Data Analysis. 1996 Frey. 14 May 1996. NY 31 May 1997 17 © Alceon ®.O.W. Stephens. D. Suter. B.. 1996 Frey. Volume 14. 1996 Hattis. UK Hattis & Barlow. 97-01 D'Agostino & Stephens. 1997 . Likelihood.797 Frey & Burmaster. 26 July 1991 Evans et al. and D.A. Burmaster. 1993. 1988. 1992. and B. and Analyzing Variability and Uncertainty: An Illustration of Methods Using an Air Toxics Emissions Example. M. New York.S. Carlin. Uses of Probabilistic Exposure Models in Ecological Risk Assessments of Contaminated Sites. NY Edwards.revised for Human and Ecological Risk Assessment.J. Human and Ecological Risk Assessment. in preparation Gelman et al. H. presented at US EPA's Workshop on Monte Carlo Analysis. 1994 MacIntosh. 1996. Second Edition. R. H.C. 1986 D'Agostino. 1992 Edwards..Human Interindividual Variability in Cancer Risks -.S. pp 762 .C. Volume 2. N. Dover. A. Number 4. and D.S. Quantitative Techniques for Analysis of Variability and Uncertainty in Exposure Assessment. Goodness-of-Fit Techniques. Statistical Distributions. Number 1. New York. NY Efron & Tibshirani.420 Madansky.F. Characterizing. H. A. Introduction to Statistical Inference. MD Efron & Tibshirani. Hastings. 1995.712 Keeping. Hoffman. New York.B.S. and K. 1997. Risk Analysis. NY Frey. John Wiley & Sons. Monographs on Statistics and Applied Probability 57. NY Frey & Rhodes. 1995.W. Risk Analysis. 1995 Keeping. Number 4. Tibshirani. Simulating. 1994. 1993 Efron. Rubin. Tibshirani. 1991 Efron. Human and Ecological Risk Assessment.C.. 1993 Evans.. G.E. 1995 Gelman. and F.395. Peacock.Technical and Management Challenges.O. John Hopkins University Press. J. New York. Chapman & Hall. D. Chapman & Hall.M.J. pp 405 . pp 194 . 1997 Frey. E.

and D. 1990 Ross.A. NY Price et al. Volume 16. Boes. 1996. Smith. 1994 National Academy of Sciences.E. A Guide for Uncertainty Analysis in Dose and Risk Assessments Related to Environmental Contamination. New York. Springer-Verlag. Nonlinear Estimation. Burmaster Tanner. Third Edition. Addison. P. 1996. Introduction to the Theory of Statistics. A System for Doing Mathematics by Computer. 1996 Sivia. NJ Department of Environmental Protection. Telos. Second Edition. MD.S. 1996. and W.. J. Harrington. Mathematica Journal. Uncertainty and Variability in Indirect Exposure Assessments: An Analysis of Exposure to Tetrochlorodibenzo-p-dioxin from a Beef Consumption Pathway. 1996 O'Ruanaidh. personal communication with D. DC NCRP.. The Multivariate Normal Distribution..revised for Human and Ecological Risk Assessment. Volume 6. 10 May 1996 O'Ruanaidh & Fitzgerald. 1991. T. 1994. Uncertainty and Variability in Human Exposures to Soil Contaminants through Home-Grown Food: A Monte Carlo assessment Risk Analysis. Redwood City. C. 1996. A. McGraw Hill. New York. 1996. G.37. Number 4. Mathematica Graphics. Data Analysis. 1996 Price. Fitzgerald.D. 1994 Wickham-Jones. Oxford. Number 2. 1994 McKone. National Academy Press. F. Graybill. Volume 2. 1994. 1996 Rose. NCRP Commentary. pp 449 .J. S.S. Volume 14. 97-01 McKone. 1996. 1974 Mood. Risk Analysis. 1996 Tanner.E.. 1996.E. NY Wickham-Jones. Su.S. 1990. Mathematica®. D. Number 14. Numerical Bayesian Methods Applied to Signal Processing. 1996 National Council on Radiation Protection and Measurement. A Bayesian Tutorial. pp 32 . New York.C.. and R.R.H. Science and Judgment in Risk Assessment. NY NAS.. pp 263 . Washington.H. NC Sivia. 1974.. M. J. T. 1994. CA Wolfram. NY SAS.Wesley.J. S. Santa Clara. 1990.M.K. Cary. Springer-Verlag.A. 1996 Stern. Tools for Statistical Inference. Techniques & Applications. Version 6. Clarendon Press. SAS/STAT User's Guide.. 1997 . and M. A. Springer Verlag.. 1991 Wolfram. 4th Edition.277 Rose & Smith. Keenan. UK Stern. CA 31 May 1997 18 © Alceon ®. New York. Winter 1996 Ross. Bethesda. 1990 SAS Institute. Springer-Verlag.463 Mood et al.

6 1.3-P: Confidence Bands.8 0.1-O: LogNormal Probability Plot.5 1.25 2.2-O: Joint Confidence Regions.25 1. Original Figure 1.25 2.75 2 2.2 E(X) =5 1.5 1. Perturbed 2 σ E(X) = 20 E(X) = 40 E(X) = 80 2 σ 1.5 2.2-P: Joint Confidence Regions.4 1.4 1. Perturbed 7 6 5 4 3 2 1 0 -1 -2 -1 0 1 7 ln[X] 6 5 4 3 2 1 ln[X] z 2 0 -1 -2 -1 0 1 z 2 Figure 1.6 1.3-O: Confidence Bands. Original Figure 1.5 2.1-P: LogNormal Probability Plot.75 µ 3 Figure 1.75 2 2.75 µ 3 0.8 0.8 1.6 1. Perturbed . Original Figure 1.6 E(X) = 10 1.2 1 1 0.7 6 5 4 3 2 1 0 -1 -2 -1 0 1 7 ln[X] 6 5 4 3 2 1 ln[X] z 2 0 -1 -2 -1 0 1 z 2 Figure 1.25 1.8 1.

5 α 4 Figure 2. Original Figure 2.6 0.5 3 3.8 X 1 Figure 2.5 2 2.3 8 8 6 6 E(X) = 0. Perturbed .4 0. Original Figure 2.6 0.8 1 0.3-O: Confidence Bands.2-O: Joint Confidence Regions.8 1 0.2-P: Joint Confidence Regions.2 0.2 0.6 0.2 0.4 4 4 E(X) = 0.8 X 1 Figure 2.6 0.4 0.5 α 4 2 1 1. Perturbed β 10 10 β E(X) = 0.6 0.8 0.5 2 2.3-P: Confidence Bands.4 0.2 X 1 0 0 0.5 2 1 1.1-O: Cumulative Distribution.2 0 0 0.4 0.1-P: Cumulative Distribution.8 0.8 0. Perturbed 1 0.6 0.2 0 0 0.6 0.2 X 1 0 0 0.1 0.8 0. Original Figure 2.4 0.2 E(X) = 0.4 0.2 0.5 3 3.1 E(X) = 0.6 0.4 0.4 0.

5 5 4.958 Figure 3.18 µ 4.19 0.25 E(BW) = 68 σ 0.22 µ 4.5 5 4.17 4.19 Figure 3.1-P: LogNormal Probability Plot.17 4.25 σ E(BW) = 66 E(BW) = 67 0.958 Figure 3.2-O: Joint Confidence Regions. n = 1.16 4.2-P: Joint Confidence Regions.21 4. n = 979 0.1-O: LogNormal Probabiliity Plot.24 0.5 ln[BW] z 3 -3 -2 -1 0 1 2 3 3 -3 -2 -1 0 1 2 z 3 Figure 3.5 5.3-O: Confidence Bands.5 4 3.5 ln[BW] z 3 -3 -2 -1 0 1 2 3 3 -3 -2 -1 0 1 2 z 3 Figure 3.958 Figure 3.24 0. n = 979 6 6 ln[BW] 5. n = 1. n = 1.18 0.22 E(BW) = 65 0.23 0.5 4 3.6 6 ln[BW] 5.5 4 3.3-P: Confidence Bands.16 4.21 4.5 4 3.5 5 4. n = 979 .23 0.5 5.5 5 4.

79 1.10 1.14 3.T1.79 1.69 1.08 2.08 2.69 0.39 1.19 index ••••• 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ••••• AMean AStdDev exp[N[2.39 1.71 ••••• 2.95 2.01 1.39 1.07 1.79 1.47 Data Set 1 Original ln(xi) ••••• 0.39 1.40 2.14 3.02 Data Set 1 Perturbed ln(x'i) ••••• 0.39 1.10 1.39 1.47 4.39 1.69 1.10 1.53 67.00 22.66 Data Set 1 Perturbed x'i ••••• 2 2 3 3 4 4 4 4 6 6 7 8 8 11 15 23 23 32 301 ••••• 24.40 2.1]] 29 December 1996 © Alceon ®.39 1.79 1.71 3.14 3.69 0.14 3.71 3.47 5.synLN VU. 1996 .95 2.08 2.Fit Table 1 Data Set 1 Data Set 1 Original xi ••••• 2 2 3 3 4 4 4 4 6 6 7 8 8 11 15 23 23 32 101 ••••• 14. .62 ••••• 2.08 2.10 1.

14 0.15 0.20 0.20 0.22 0.33 0.42 0.22 0.61 ••••• 0.17 index ••••• 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ••••• AMean AStdDev Beta[2.15 0.31 0.29 0. 1996 .32 0.30 0.12 0.32 0.57 0.12 0.04 0.52 0.10 0.09 0.32 0.28 0.06 0.75 ••••• 0.42 0.40 0.06 0.38 0.28 0.04 0.16 Data Set 2 Perturbed x'i ••••• 0. .Fit Table 2 Data Set 2 Data Set 2 Original xi ••••• 0.21 0.10 0.09 0.38 0.18 0.18 0.T2.6] 29 December 1996 © Alceon ®.29 0.26 0.33 0.14 0.32 0.41 0.21 0.30 0.57 0.26 0.40 0.52 0.41 0.31 0.synB VU.

2 87.27 0.47 0.7 67.nhanes Table 3 Data Set 3 Body Weight (kg) ••••• 43.7 106.BW.71 0.0 69.5 48.7 61.19 0.49 Percentile ••••• 0.9 70.11 0.6 75.09 0.35 0.7 120.1 54.03 0.3 1.1 63.6 55.85 0.33 0.9 53.65 0.23 0.T3.87 0.7 56.95 0.1 52. .1 52.6 76.4 62.6 71.6 65.9 59.9 90.29 0.81 0.89 0.1 84.2 55.5 57.6 77.51 0.13 0.69 0.9 74.2 65.4 81.83 0.6 96.45 0.7 72.67 0.9 66.15 0.93 0.9 60.01 0.6 57.958 979 Percentile ••••• 0.4 54.5 Body Weight (kg) ••••• 63.1 56.59 0.7 50.61 0.37 0.63 0.79 0.77 0.97 0.21 0.0 47.9 58.53 0.17 0.91 0.25 0.6 59.8 49.07 0.39 0.41 0.43 0.6 46.05 0.8 79.57 0.55 0.9 64.5 51.75 0.31 0.5 58.99 No = Np = 29 December 1996 © Alceon ® 1996 .4 101.5 69.73 0.

Sign up to vote on this title
UsefulNot useful