Professional Documents
Culture Documents
W6. 2 Probability Plots
W6. 2 Probability Plots
4 Variables and
Probability Distributions
3
Probability Plots
For example, the article “Toothpaste Detergents: A
Potential Source of Oral Soft Tissue Damage” (Intl. J. of
Dental Hygiene, 2008: 193–198) contains the following
statement:
4
Probability Plots
As justification for this leap of faith, the authors wrote that
“Descriptive statistics showed standard deviations that
suggested a normal distribution to be highly likely.” Note:
This argument is not very persuasive.
5
Probability Plots
The essence of such a plot is that if the distribution on
which the plot is based is correct, the points in the plot
should fall close to a straight line.
6
Sample Percentiles
7
Sample Percentiles
The details involved in constructing probability plots differ a
bit from source to source. The basis for our construction is
a comparison between percentiles of the sample data and
the corresponding percentiles of the distribution under
consideration.
8
Table A.3
9
Sample Percentiles
Thus the 50th percentile satisfies , and the 90th percentile
(.5) satisfies F((.5)) = .5, and the 90th percentile satisfies
F((.9)) = .9. Consider as an example the standard normal
distribution, for which we have denoted the cdf by () .
10
Sample Percentiles
Since .2005 appears at the intersection of the –.8 row and
the .04 column, the 20th percentile is approximately –.84.
Similarly, the 25th percentile of the standard normal
distribution is (using linear interpolation) approximately
–.675 .
11
Sample Percentiles
The 50th-sample percentile should separate the smallest
50% of the sample from the largest 50%, the 90th
percentile should be such that 90% of the sample lies
below that value and 10% lies above, and so on.
12
Sample Percentiles
To proceed further, we need an operational definition of
sample percentiles (this is one place where different people
do slightly different things). Recall that when n is odd, the
sample median or 50thsample percentile is the middle
value in the ordered list, for example, the sixth-largest
value when n = 11.
13
Sample Percentiles
Then if we call the third-smallest value the 25th percentile,
we are regarding that value as being half in the lower group
(consisting of the two smallest observations) and half in the
upper group (the seven largest observations).
1 2 3 4 5 6 7 8 9 10
14
Sample Percentiles
This leads to the following general definition of sample
percentiles.
Definition
Order the n sample observations from smallest to largest.
Then the ith smallest observation in the list is taken to be
the [100(i – .5)/n]th sample percentile.
16
A Probability Plot
17
A Probability Plot
Suppose now that for percentages 100(i – .5)/n(i = 1,…, n)
the percentiles are determined for a specified population
distribution whose plausibility is being investigated.
18
A Probability Plot
That is, for i = 1, 2,…, n there should be reasonable
agreement between the ith smallest sample observation
and the [100(i – .5)/n]th percentile for the specified
distribution. Let’s consider the (population percentile,
sample percentile) pairs—that is, the pairs
19
A Probability Plot
If the sample percentiles are close to the corresponding
population distribution percentiles, the first number in each
pair will be roughly equal to the second number. The
plotted points will then fall close to a 45 line .
20
Example 29
The value of a certain physical constant is known to an
experimenter. The experimenter makes n = 10 independent
measurements of this value using a particular
measurement device and records the resulting
measurement errors (error = observed value – true value).
These observations appear in the accompanying table.
21
Example 29 cont’d
22
Example 29 cont’d
Plots of pairs (z percentile, observed value) for the data of Example 29:
Figure 4.33
23
Example 29 cont’d
25
A Probability Plot
An investigator is typically not interested in knowing just
whether a specified probability distribution, such as the
standard normal distribution (normal with = 0 and = 1)
or the exponential distribution with = .1, is a plausible
model for the population distribution from which the sample
was selected.
26
A Probability Plot
The values of the parameters of a distribution are usually
not specified at the outset. If the family of Weibull
distributions is under consideration as a model for lifetime
data, are there any values of the parameters and for
which the corresponding Weibull distribution gives a good
fit to the data?
27
A Probability Plot
If the plot deviates substantially from a straight line, no
member of the family is plausible. When the plot is quite
straight, further work is necessary to estimate values of the
parameters that yield the most reasonable distribution of
the specified type.
28
A Probability Plot
These procedures should generally not be used if the
normal probability plot shows a very pronounced departure
from linearity. The key to constructing an omnibus normal
probability plot is the relationship between standard normal
(z) percentiles and those for any other normal distribution:
29
A Probability Plot
If each observation is exactly equal to the corresponding
normal percentile for some value of , the pairs
( [ z percentile], observation) fall on a 45 line, which has
slope 1.
30
A Probability Plot
A plot of the n pairs
Thus a plot for which the points fall close to some straight
line suggests that the assumption of a normal population
distribution is plausible.
31
Example 30
The accompanying sample consisting of n = 20
observations on dielectric breakdown voltage of a piece of
epoxy resin appeared in the article “Maximum Likelihood
Estimation in the 3-Parameter Weibull Distribution (IEEE
Trans. on Dielectrics and Elec. Insul., 1996: 43–55).
32
Example 30 cont’d
33
A Probability Plot
There is an alternative version of a normal probability plot
in which the z percentile axis is replaced by a nonlinear
probability axis. The scaling on this axis is constructed so
that plotted points should again fall close to a line when the
sampled distribution is normal. Figure 4.36 shows such a
plot from Minitab for the breakdown voltage data of
Example 4.30.
3. It is skewed.
35
A Probability Plot
A uniform distribution is light-tailed, since its density
function drops to zero outside a finite interval.
37
A Probability Plot
The result is an S-shaped pattern of the type pictured in
Figure 4.34.
Figure 4.34
38
A Probability Plot
A sample from a heavy-tailed distribution also tends to
produce an S-shaped plot. However, in contrast to the light-
tailed case, the left end of the plot curves downward
(observed percentile), as shown in Figure 4.37(a).
Figure 4.37(a) 39
A Probability Plot
If the underlying distribution is positively skewed (a short
left tail and a long right tail), the smallest sample
observations will be larger than expected from a normal
sample and so will the largest observations.
In this case, points on both ends of the plot will fall above a
straight line through the middle part, yielding a curved
pattern, as illustrated in Figure 4.37(b).
41
Beyond Normality
42
Beyond Normality
Consider a family of probability distributions involving two
parameters, 1 and 2, and let F(x; 1 and 2) denote the
corresponding cdf’s. The family of normal distributions is
one such family, with 1 = , and 2 = and
F(x; , ) = [(x – )/]. Another example is the Weibull
family, 1 = with 2 = , and
F(x; , ) = 1 –
Instead, P(X 1) = F(1; 1, 2) = 1 – e–1 = .632, and the
density function f(x; 1, 2 ) 5 F (x; 1, 2 ) is negatively
skewed (a long lower tail).
45
Beyond Normality
Similarly, the scale parameter 2 is not the standard
deviation ( = 1 – .57722 and = 1.2832). However,
changing the value of 1 does change the location of the
density curve, whereas a change in 2 rescales the
measurement axis.
46
Beyond Normality
In the usual form, the density function for any member of
either the gamma or Weibull distribution is positive for x > 0
and zero otherwise. A location parameter can be
introduced as a third parameter (we did this for the Weibull
distribution) to shift the density function so that it is positive
if x > y and zero otherwise.
47
Beyond Normality
One first obtains the percentiles of the standard
distribution, the one with 1 = 0 and 2 = 1, for percentages
100(i – .5)/n (i = 1,…, n).
48
Beyond Normality
The key result is that if X has a Weibull distribution with
shape parameter and scale parameter , then the
transformed variable ln(X) has an extreme value
distribution with location parameter 1 = ln() and scale
parameter 1/.
49
Example 31
The accompanying observations are on lifetime (in hours)
of power apparatus insulation when thermal and electrical
stress acceleration were fixed at particular values (“On the
Estimation of Life of Power Apparatus Insulation Under
Combined Electrical and Thermal Stress,” IEEE Trans. on
Electrical Insulation, 1985: 70–78).
51
Example 31 cont’d