You are on page 1of 59

Advanced Statistics

for
Environmental
Professionals

Bernard J. Morzuch
Department of Resource Economics
University of Massachusetts
Amherst, Massachusetts
morzuch@resecon.umass.edu

May 2005
Table of Contents
TOPIC PAGE

How Does A Statistic Like A Sample Mean Behave?................................................................................ 1


The Central Limit Theorem ........................................................................................................................ 3
The Standard Normal Distribution.............................................................................................................. 5
Statistical Estimation................................................................................................................................... 5
The t-distribution....................................................................................................................................... 13
Appearance Of The t-distribution ............................................................................................................. 14
Situation Where We Use t In Place Of z: Confidence Intervals ............................................................... 16
t-table ....................................................................................................................................................... 17
An Upper One-Sided (1 − α) Confidence Interval For µ .......................................................................... 18
Another Confidence Interval Example...................................................................................................... 18
Summary And Words Of Caution When Using t Or z ............................................................................. 20
Treatment Of Outliers And Testing Suggestions ..................................................................................... 20
A Simple Approach For Assessing Data Distribution And The Possibility Of Outliers .......................... 20
A Data Set’s Five-Number Summary And Box-And-Whisker Diagram (Or Boxplot)............................ 20
Interquartile Range (IQR) And Outliers ................................................................................................... 22
Examples ................................................................................................................................................... 22
Hypothesis Testing: The Classical Approach (Test Of One Mean).......................................................... 23
Step 1: State the null and alternative hypotheses. ......................................................................... 23
Step 2: Decide upon a tail probability associated with the null hypothesis being true................. 25
Step 3: Establish a decision rule to assist in choosing between hypotheses. ................................ 25
Step 4: Generate your samples. Calculate the test statistic. .......................................................... 26
Step 5: Apply the decision rule. Make a decision. State your conclusion in words. .................... 27
The P-Value Approach To Hypothesis Testing ........................................................................................ 27
Complementarity Between Hypothesis Testing And Confidence Interval Construction ......................... 28
Testing For Normality: The Shapiro-Wilk Test........................................................................................ 28
Hypothesis Testing: Comparison Between Two Means ........................................................................... 31
Step 1: State the null and alternative hypotheses. ......................................................................... 33
Step 2: Decide upon a tail probability associated with the null hypothesis being true................. 33
Step 3: Establish a decision rule to assist in choosing between hypotheses. ................................ 33
Step 4: Generate your samples. Calculate the test statistic. .......................................................... 34
Step 5: Apply the decision rule. Make a decision. State your conclusion in words. .................... 34
Incorrect Decisions In Hypothesis Testing ............................................................................................... 34
A Calculation For β and 1- β ................................................................................................................... 40
Sample Size Issues .................................................................................................................................... 42
Behavior Of Observations Having A Lognormal Distribution ................................................................. 42
Small Sample Sizes And Parent Distribution Departing From Normality ............................................... 44
An Experimental Design: Set-Up For Generating Lognormal Parameter Estimates ............................... 44
Parameter Estimators For A Lognormal Distribution............................................................................... 45
Getting Parameter Estimates: Probability Plotting ................................................................................... 46
Land’s Approach To Get A Confidence Interval...................................................................................... 48
Dealing With Censored Data Sets............................................................................................................. 49
Getting Parameter Estimates: Censored Data And Probability Plotting................................................... 51
Strategies To Determine The Proper Number Of Samples....................................................................... 51
Sample Size Based on Variance of the Sample Mean .................................................................. 52
Sample Size Based on Margin of Error of the Sample Mean ....................................................... 52
Sample Size Based on Relative Error of the Sample Mean .......................................................... 54
Nonparametric Statistical Tests................................................................................................................. 55
The Mann-Whitney Test ........................................................................................................................... 56
Summary ................................................................................................................................................... 56
References ................................................................................................................................................ 57
How Does A Statistic Like A Sample Mean Behave?

Motivation: You are at a wetland site, and you would like to get an estimate of the true mean level of lead
concentration in the soil. (Unknown to you, suppose that the population mean -- the true overall mean
level of lead concentration in the soil -- is 40 mg/kg. Suppose also that the standard deviation of an
infinite number of measures is 15 mg/kg. And suppose that the distribution of this infinite number of
measures is not normal but skewed to the right).

Q: How do you proceed in generating your estimate?

A: You might rely on an experimental design whereby you walk in a straight line across the site and take
a new soil sample every so many meters. You repeat this process for lines that are parallel to the
original. When sufficient parallel lines are walked off, the process is repeated in the perpendicular
direction. Ultimately, you generate a “sufficient” number of samples that you believe characterize the
soil conditions for this particular wetland site. Suppose that the number of samples that you take is
n=15. Here are their ordered measurements, in mg/kg:

17.1 21.4 23.3 23.4 24.7 25.6 26.9 26.9 27.3 29.2 33.1 37.3 44.9 49.8 58.8

Q: For these 15 samples, what would be an overall representative measure of lead concentration in the
soil?

A: The sample mean X . For these 15 observations, X =31.32 mg/kg.

Now, suppose a colleague was requested to generate 15 soil samples at this same site using the same
experimental design. Assume that she does not know where you walked-off your first line, so that she
starts her walk at a different spot than you.

Q: Would you expect her to get the same 15 numbers for her soil samples as you?

A: No.

Q: Would you expect her sample mean -- based upon her 15 samples -- to be the same as your sample
mean?

A: No.

Q: Why not?

A: Because sampling involves error; i.e., we never incorporate all aspects of the phenomenon that we are
attempting to measure.

Suppose an additional 498 of your colleagues were asked to repeat this experiment.

Q: How many sample means will have been generated in total, beginning with yours?

A: 500

Suppose you were asked to construct a histogram for these 500 sample means. Notice that you are
being asked to construct a histogram for sample means, not for individual observations.
1
Q: Where would you expect the histogram to be centered?

A: Around µ =40 mg/kg, which is the true mean.

Notice that these 500 sample means will have a spread, i.e., a standard deviation.

Q: Will the standard deviation of the sample means be related to the standard deviation of the individual
observations, i.e., to σ =15 mg/kg?

A: This is hard to tell, but the answer is “yes”.

Recall that the distribution of the individual observations was said to be skewed.

Q: What will the shape of the histogram for the sample means look like? Will it be skewed as it is for the
individual observations?

A: You are inclined to say that it will be skewed, because the individual observations upon which it is
based have a skewed distribution. But this is not correct! It will look more normally distributed than
skewed! We will demonstrate this shortly.

Suppose we return to the beginning of the experimental design. Rather than generating 15 soil samples,
each of the 500 individuals is asked to generate 30 soil samples and to calculate sample means, each
based upon 30 samples rather than 15.

Q: Where would the histogram be centered?

A: Around µ =40 mg/kg.

Q: What would be the spread of these 500 new sample means, where each sample mean is based upon a
larger number of observations?

A: Smaller than the spread of the previous 500 sample means, each of which was based on less observations.

Q: Why is this so?

A: Because the sample means that we now calculate are each based on twice the amount of information.
Since each uses more information, each should be closer to the true mean, the item which each is
designed to represent. If they are all collectively closer to µ , they have less spread around µ; i.e.,
they have a smaller standard deviation.

Q: What will be the shape of the histogram for these 500 sample means, each of which is based upon 30
observations? Will the histogram be skewed?

A: No, it will be (approximately) normal! This is guaranteed by something called the Central Limit
Theorem (CLT).

Are you skeptical? We will demonstrate with a computer simulation. But first, a summary.

2
The Central Limit Theorem

Begin with a definition:

Sampling distribution: the probability distribution of a sample statistic like the sample mean.

Central Limit Theorem: If all possible random samples, each of size n, are taken from any population
with a mean µ and standard deviation σ , the sampling distribution of sample means will:

1. Have a mean (µ x ) equal to µ . Note the new notation.

2. Have a standard deviation (σX ) equal to σ / n . From here on, the standard deviation of the
sample mean will go by the special name standard error of the mean.

3. Be normally distributed when the parent population is normally distributed or will be approxi-
mately normally distributed for samples of size 30 or more when the parent population is not
normally distributed. The approximation to the normal distribution improves with samples of
larger size.

In short, the Central Limit Theorem states the following:

1. µ x = µ; the mean of the Xs equals the mean of the Xes.

σ
2. σ x = ; The standard error of the mean equals the standard deviation of the population
n
divided by the square root of the sample size.

3. The sample means are:

— normally distributed if the parent population is normal.

— approximately normally distributed regardless of the shape of the parent population if


n ≥ 30, and it improves as n gets larger.

NOTE: The n referred to in the Central Limit Theorem is the number of items sampled (or the
number of samples taken). It is commonly referred to as sample size.

Let's look at the behavior of X using a computer simulation. A visual example like this provides insight
about what the Central Limit Theorem accomplishes (without having to rely on mathematical proofs).

Consider a parent population consisting of items from an exponential distribution. The mean of all of the
observations is µ X = 4 , and the standard deviation is σ X = 4 .

The simulation will be done using Minitab and is represented with the following schematic diagram.

3
How The Sample Mean Behaves: An Experiment Using Minitab

Spread Sheet XBAR4


Now, the workings Sampling ↓ ⋅10,000 sample means,
Parent Population: of the truly Experiment 1: each based on a
Exponential Distribution → wondrous → 10,000 samples; C11-C14 sample of size n=4. →
Central Limit each sample is ⋅Three items:
P(X=x) Theorem of size n=4: ⋅shape of the distribution;
µ x =4 10,000 rows and ⋅ µ x should be around 4;
σ x =4 4 columns.
⋅ σ x should be around
σx 4
= =2.
Range of XBAR4: n 4
X=x from ____ to ____

Sampling Spread Sheet XBAR40 ⋅10,000 sample means, Sampling Spread Sheet XBAR400 ⋅10,000 sample means,
Experiment 2: ↓ each based on a Experiment 3: ↓ each based on a
→ 10,000 sample of size n=40. → 10,000 sample of size n=400.
samples; C101-C140 ⋅Three items: samples; C201-C600 ⋅Three items:
each sample ⋅shape of the distribution; each sample ⋅shape of the distribution;
is of size ⋅ µ x should be around 4; is of size ⋅ µ x should be around 4;
n=40: n=400:
⋅ σ x should be around ⋅ σ x should be around
10,000 rows 10,000 rows
and 40 σx 4 and 400 σx 4 4 1
= = 0.632 . = = = = 0.2
columns. n 40 columns. n 400 20 5
Range of XBAR40: Range of XBAR400: .
from ____ to ____ from ____ to ____

4
The Standard Normal Distribution

Any normally distributed random variable (like X ) can be transformed into another normally distributed
random variable that always has a mean of zero and a standard deviation of 1. This is called
standardizing, and the transformed variable is called standard normal z. Given a value for the population
mean (µ) , the population standard deviation (σ) , a value for the sample mean ( X ), and the sample size
(n) used in calculating X , we are able to calculate z as:

X−µ
z=
σ
n

The reason for the transformation is that probabilities have been calculated for all possible values of z,
and these are presented in the z-table. So, if one desires to find the probability associated with values of
X , simply transform to z, and use the z table.

From another perspective, look closely at the numerator of the transformation. It measures the distance
between our calculated sample mean and a (hypothetical or otherwise) population mean. It seeks an
answer to the eventual question: Is our sample statistic close to or far from the universal norm µ ? This is
an issue of utmost concern in applied work.

Once the numerator is calculated, the next question ought to be: Is this calculated distance a big number
or is it a small number? The answer to this question is -- “It’s all relative.”

The numerator is large or small relative to some standard. The standard presents itself in the denominator
of z. The numerator is large or small relative to a measure of the spread of all of the data. This measure
is the standard deviation.

So, for example, a large z-value translates into the sample mean being “far” from the population mean.
This has genuine implications for decision making. It may suggest that some sort of corrective action be
taken.

The initial requirement of statistical estimation and statistical inference is a firm handle on all the pieces
to the puzzle -- sample mean, population mean, population standard deviation, z-transformation,
calculating probabilities from the z-table using the z-transformation -- and how they relate to each other.
The next thing to do is to manipulate the value of one of these items and observe what happens to the
values of the rest. Statistical estimation is concerned with these manipulations.

Statistical Estimation
Overview:

We are now in the first phase of statistical inference. The setting for the problems that we solve is as
follows:
⋅ You are given an X , n, and σ , but you do not know (i.e., are not given) µ .
⋅ You want a reliable estimate of µ . (You want this estimate because important decisions are
going to be made on the basis of your result).

5
⋅ What should you do?

Begin by exploring the behavior of X . Recall that its behavior is explained by the Central Limit
Theorem.

What is X ? It is:
⋅ a sample statistic;
⋅ a random variable; i.e., it can take on any value since it is sensitive to sampling;
⋅ an estimator of µ .

Note the picture below. It is a sampling distribution of X , where each of the millions of possible Xs is
based on a sample size of n. (The sample size is the same for each X , but the observations comprising
each X are most likely different). Suppose that the observations that we draw lead to the particular X on
the horizontal axis below.

● X
µ

our X

Unfortunately, we rarely know the location of µ . It is placed in the picture above simply as a point of
reference. Given the probable distance between the unknown population parameter (e.g., µ ) and its
estimator (e.g., X ), it makes sense to distinguish two types of estimators:

⋅ point estimator: one value is specified as the estimate of the population parameter;
e.g., X is a point estimator for µ ; the specific value of X , e.g., 31.32 mg/kg, is a point
estimate. Also, s is a point estimator for σ . The specific value of s, e.g., 11.59 mg/kg, is a
point estimate.

⋅ interval estimator: a range of values that conceivably contains the true population parameter.

The motivation for an interval estimator is as follows. Consider the original picture above.

It is risky to assert with 100% certainty that a particular value of X (a


point estimate) will equal µ

µ • X

our X

some some Why not admit that:


distance distance
64748 64748 our some distance contains the
• point ± from the or population
X estimator point estimator equals parameter
6
Q: What is a good candidate for “some distance”?

A: An excellent candidate is derived as follows:

X −µ X −µ
⋅ Begin with the z-transformation formula: z= =
σ σx
n

⋅ Multiply both sides by σ / n = σ x z ⋅ σx = X − µ

⋅ (Recall that z itself is simply a number from


the z-table, typically 1, 2, or 3, or any number between 0 and 3.90).

The result directly above says that the distance between the sample statistic and the population parameter
can be characterized as an arbitrarily selected number (e.g., z = 1 or perhaps 2 or perhaps 3 or any number
between 0 and 3.90) of standard errors (σ x ), so that the interval presented above, which is:

point + some contains population


estimator distance parameter

takes the form:

X + z ⋅ σx = µ

Thus, the item that brings X into equality with µ is z ⋅ σ x . This quantity is typically referred to as the
margin of error due to sampling or the maximum error of the estimate.

A more formal way of expressing the interval mathematically is:

X − z ⋅ σx < µ < X + z ⋅ σx .

In words, the formula above says that µ is contained in the interval bounded on the left by the calculated
sample mean minus a number (z) of standard errors and on the right by the calculated sample mean plus a
number (z) of standard errors.

An important issue is the following:

Q: Once X and σ x are specified, what determines the size or width of the interval?

A: The z-value.

Q: Who controls the magnitude of z?

A: You--the decision maker--control z.

Q: What does a large, relative to a small, value of z do to the width of the interval?

7
A: It increases the width of the interval.

Q: What is it about a larger z-value that causes the interval to become wider?

A: A larger positive quantity is subtracted from the left lower bound (thus expanding its limit to the
left) and added to the right upper bound (thus expanding its limit to the right).

Q: How would an interval estimate look for a given X , a given σ x , and a small versus large z-value?

A:

● X
µ

our X

●X → Interval with a small z (Note: µ is not contained in this interval.)

● → Interval with a large z (Note: µ is contained in this interval.)


X

The importance of z:

⋅ For a given X and σX , interval width depends on the value of z selected.


⋅ z itself is obtained from the z-table; so it can range from 0 to 3.9.
⋅ As z gets larger, the interval becomes wider.
⋅ Compare the drawings of the two intervals in the previous picture:
⋅ The wider the interval becomes, the more confident we are that the interval contains µ .
⋅ The narrower the interval becomes, the less confident we are that the interval contains µ .
⋅ Beliefs--weak or strong--that the interval contains µ suggest a name for the interval itself:
confidence interval.

A confidence interval represents our belief of plausible values that the unknown population mean can
have.

An important issue: the trade-off between confidence and precision:

Q: If we desire to be totally confident that the interval contains µ , why not make the interval as wide
as possible; i.e., use a z-value of 3.9?

A: Notice that a confidence interval is our statement about conceivable values for the unknown
population parameter. A wide interval permits extreme values as estimates for µ . These extreme
values may be terrible estimates. Terrible estimates do not introduce helpful information to the
decision making process.

8
Notice that, with a wide confidence interval, we are assuring ourselves that the interval contains
µ , but we are not narrowing down--not pinpointing--reasonable estimates. So, increased
confidence comes at the cost of decreased precision.

Also, with a narrow confidence interval, we become less assured that the interval contains µ , but
we are narrowing down--pinpointing--reasonable estimates. So, increased precision comes at the
cost of decreased confidence.

Another important issue: translating z into a confidence level:

⋅ z is an integral part of the confidence interval formula. Confidence increases as z increases.


⋅ z-values are associated with probabilities.
⋅ This relationship between z and probabilities now transfers to confidence intervals.
⋅ Preview: Larger z’s result in both larger probabilities (from the z-table) and larger or wider
confidence intervals. These probabilities are referred to as levels of confidence.
So, a level of confidence is a probability assessment. It is a probability assessment
of our confidence that the interval contains µ .
⋅ Look at the relationship: (a) between positive z and an area under the standard normal curve;
(b) between +z and its corresponding symmetrical, interior area; and (c) between this
symmetrical interior area and its equivalent level of confidence.

± z and its symmetrical level of


z and the area from 0 to z interior area ±z confidence
X

.45 .45 .45


± 1.645 0.90
0 1.645 -1.645 0 1.645
X X

.475 .475 .475


± 1.96 0.95
0 1.96 -1.96 0 1.96
X X

.495 .495 .495


± 2.575 0.99
0 2.575 -2.575 0 2.575

⋅ Thus, a 90% confidence interval uses 1.645 as the z-value in the confidence interval formula, a
95% confidence interval uses 1.96, and a 99% confidence interval uses 2.575.

⋅ After we construct a 90% confidence interval, the proper interpretation is: “We can be 90%
confident that the true mean is within this interval.”
9
Look at the previous diagram, and take note how the normal curves display symmetrical interior areas.
We will now generalize these pictures with the notational convention used by statisticians. We begin by
recognizing that the total area under the normal curve is 1 or 100%.

For each component of the normal curve, the corresponding notation is as follows:

component notational convention and explanation


⋅ symmetrical interior area ⋅ 1- α : level of confidence or confidence level
⋅ total area in both tails combined ⋅ α : tail area = total area - interior area
α = 1 - (1- α )
⋅ area in right tail ⋅ α /2: tails are symmetric, so each is α 2 ;
⋅ area in left tail ⋅ α /2: areas are positive; each α 2 is positive.
⋅ z-value separating the right-half ⋅ z α / 2 : subscript matches right tail area;
symmetric area and the right tail (z is positive because it is to the right of z=0.)

⋅ z-value separating the left-half ⋅ - z α / 2 : subscript matches left tail area;


symmetric area and the left tail (z is negative because it is to the left of z=0.)

The picture that captures everything above is

α|2 1-α α|2


z
-zα|2 zα|2

and the way to rewrite the original confidence interval formula for a (1 − α ) level of confidence is

X − zα 2 ⋅ σx ≤ µ ≤ X + zα 2 ⋅ σx .

The difference between this rewrite and the original confidence interval is the α 2 subscript on z. Don’t
let this confuse you. Nothing different has been done in the construction of the interval. The α 2
subscript on z is presented to accompany the level of confidence, 1 − α . These are nothing more than
notational matches. After all, if 1 − α (i.e., the symmetric interior area or the level of confidence)
changes, the corresponding z-values (which are ± z α 2 ) likewise must change.

Example: Find ± z α 2 for a 90% confidence interval.

Solution: Follow the method under “notational convention and explanation” above.

(1) This is your start. You are given 1 − α = 0.90. Now (2) to (5) below are straightforward.
(2) α = 1 - 0.90 = 0.10

10
(3) α 2 = 0.10/2 = 0.05
(4) z α 2 = z0.05 = 1.645
(5) - z α 2 = -z0.05 = -1.645

Thus, the z-values for a 90% confidence interval are ±1.645.

Let's consider a heavy metal different than lead. Suppose that we have 15 chromium samples. Their
measurements (in mg/kg) are:

3.5199 6.5252 8.4996 6.4097 4.7424


5.5125 5.9328 2.6428 6.7628 5.3015
4.1472 3.4474 5.9564 5.4219 9.4118

On the basis of the 15 samples (n=15), X =5.616 mg/kg and s=1.827 mg/kg. For now, assume that
σ =s=1.827 mg/kg. Given this information, construct a 90% confidence interval for the unknown
population mean and provide an interpretation for this confidence interval.

The interval itself: X − z α 2 ⋅ ( σ / n ) ≤ µ ≤ X + z α 2 ⋅ (σ / n )

Substitutions into our


90% confidence interval: 5.616-1.645 ⋅ (1.827/ 15 ) ≤ µ ≤ 5.616+1.645 ⋅(1.827 / 15)
(n=15) 5.616-0.776 ≤ µ ≤ 5.616 + 0.776
4.84 ≤ µ ≤ 6.392

Interpretation: We can be 90% confident that the true mean is somewhere between
4.84 mg/kg and 6.392 mg/kg.

Alternative Suppose 15 samples are taken, X is calculated, and a 90% confidence interval
Interpretation: is constructed. Suppose that this “experiment” is repeated nine more times, so
that we have constructed a total of ten 90%-confidence intervals, (each one
relying on a new X and each X calculated from 15 new samples). Here is the
alternative interpretation: Of these ten similarly constructed intervals, nine of
these ten intervals (or 90%) can be expected to contain the true mean µ .

The pictorial representation of this interpretation is as follows:

11
Notion of a 90% Confidence Interval

Sampling distribution of
X

X (Note:The standard
error is defined as
0.05 0.90 0.05 σ =σ n.
 X

z.05 σ X z.05 σ X
X
µ
1444424444
3

• In the picture above, a • While each of the point


value of z.05 = 1.645 estimates X does not
implies a 45% area to z.05 σ X z.05 σ X i
either side of the mean. ● equal µ , the intervals
X1 constructed around 9 of the
• 2 x 45% = 90% ●
symmetrical area in X2 10 X s do contain µ . This
i
total. ●
is the meaning of a 90%
X3
• Interpretation of the ●
confidence interval.
picture to the right: X4 • A 100% confidence interval
conduct an experiment ● could be obtained by
and construct 10 X5 changing the z value to 3.9

intervals. Of these 10, (z0 = 3.9 implies 50% of
nine of 10 (or 90%) can
X6
● the area to either side of the
be expected to contain X7 mean). This stretches the
the true mean when we ● band around each X .
use z.05 = 1.645. X8 i
● Now, the band around X7
X9
● will include µ .
X10

True unknown µ located here

One final issue regarding confidence intervals:

Q: Does there have to be a trade-off between confidence and precision? For example, does an
increase in precision ( i.e., a narrower interval) have to come at the cost of a decrease in
confidence?

A: The answer is no. The reason goes something like this:

Q: If you take more samples, would you expect your calculated sample mean to be a better
representation of µ ? Alternatively, would you expect your X to be closer to µ ?

A: Yes! This is a consequence of the Central Limit Theorem.

Q: As you increase the number of samples, what should happen to the width of your confidence
interval for a given level of confidence?

A: It will decrease in width. Alternatively, it becomes more precise.

Q: What is responsible for this increase in precision, given the confidence interval formula?
A: n in the denominator of σ x . (Recall that σ x = σ / n .)
12
Example: Take the previous 90% confidence interval that we calculated. To see what happens, let X
and s (assumed to equal σ ) remain unchanged. However, increase n from 15 to 25, and calculate
the interval.

90% confidence X − z α 2 ⋅ (σ / n ≤ µ ≤ X + z α 2 ⋅ ( σ / n ) 
interval with
increased n: 5.616 - 1.645 ⋅ (1.827/ 25 ) ≤ µ ≤ 5.616 + 1.645 ⋅ (1.827/ 25 )
(n=25)
5.616 - 0.6 ≤ µ ≤ 5.616 + 0.6

5.016 ≤ µ ≤ 6.216

Interpretation: We can be 90% confident that the true mean is somewhere between 5.016 mg/kg and
6.216 mg/kg.

Notice that this interval is slightly narrower than the previous interval. To narrow an interval in this
fashion is to increase its precision. Also notice that we have maintained the same level of confidence (at
90%) and have increased precision entirely by increasing sample size.

On the surface, it appears that the almost unnoticeable increase in precision came at the high cost of
generating 10 additional samples. Unrealistically, we let X and s be the same for the two intervals. They
will obviously be different for different sample sizes. This was done strictly to direct attention to the
relationship between sample size and precision for a given level of confidence.

The t-distribution

Thus far, z has been the driving force for making probability assessments about an unknown population
mean by way of a confidence interval.

Recall the strict definition of z. It is:

X-µ
z= .
σ/ n

The characteristics of z are:

⋅ z has one shape, the standard normal;


⋅ z is centered at zero, and it has standard deviation equal to one;
⋅ As long as the true, population standard deviation ( σ ) is known or given, an X can be
converted to z.

In practical situations, we simply do not have σ . We do, however, have its estimator, which is s. Now,
return to the right-hand side of the formula for z. If we substitute the item that we have (s) for the item
that we do not have ( σ ) in the formula itself, the result is:

X−µ
.
s/ n
13
Appropriate questions now surface:

Q: Is this z anymore?

A: No, because it violates the strict definition of z.

Q: But I have been using z right along to make probability assessments. Do you mean to tell me that
I should not have been using z to make these assessments?

A: I am telling you that, when you are in an applied situation and you must use a calculated standard
deviation in place of the true, unknown standard deviation, the variable with which you are
dealing is no longer distributed as z.

Q: Well, if it isn’t z, what then is it, and how do I use it?

A: It is a well-defined probability distribution. It is called the t-distribution, and it is used just like z.
Its formula is:

X-µ
t= .
s/ n

Q: Is there a relationship between z and t?

A: Yes, there is. It can be appreciated in terms of the following argument. First, pay attention to the
item in each formula that makes z and t different. We expect a sample statistic (like s) to look
more like the population parameter it is designed to estimate ( σ ) the more we sample (as n
increases). Comparing the formulas directly, t starts to look more and more like z as n increases.
This happens because s becomes a better estimate of σ . Ultimately, as n increases s becomes σ ,
and t becomes z.

Appearance Of The t-distribution

The t-distribution has a number of shapes. It has a different shape for each sample size (n). When using t,
sample size is converted to something called degrees of freedom (df). The relationship between n and df
is df=n-1. This is read “Degrees of freedom is equal to sample size minus 1.” Thus, t likewise has a
different shape for each df.

Pictures for three different t-distributions, representing samples of size 2, 20, and "large", are presented
below. The t-distribution with n=2 has df=1; the t-distribution with n=20 has df=19; and the t-distribution
with "large" n has "large" df. As sample size above increases beyond 30, the t-distribution approximates
the normal distribution better and better.

Standard normal with “large” df

t with 19 df
t with 1 df

t
0
14
From these pictures, we see that each t-curve is centered at zero. It is also symmetric about zero. Also, t
is considerably flatter and more spread out than the z curve for small values of n; t effectively becomes z
as n gets large.

Recall when we used z in the confidence interval formula that we attached a subscript on z to match the
tail area associated with z. Thus, if the tail area was of magnitude α 2 , the z-value that corresponded
with this tail area was denoted as z α 2 . We now do the same with t. We will focus on the upper tail area
of t and refer to this tail area as α . The t-value corresponding to the tail area α will be referred to as t α .

The relationship between a tail area (α) and its corresponding t-value ( t α ) is presented in the picture
below for our three t-curves. Notice the t-axis in the picture. With zero as the reference point on the t-
axis, each t α presented on that axis ought to become larger as we move in the positive direction, i.e., as df
decreases. The place to verify this is a statistical table constructed specifically for the t-distribution. Such
a table is the t-table presented on page 17. In addition, notice that the shaded areas in the left portion of
the t-curves below would have t-values that are simply the negatives of the t α -values in the right portion
of the curve.

Standard normal with “large” df

t with 19 df
t with 1 df
α α α

0 tα tα tα

Now, let’s examine how to use the t-table presented on page 17. First, notice that df is presented as the
left-most column. It is reproduced as the right-most column to facilitate reading values from the table.
Notice the top row of the table. The subscripts on the t-values presented represent the magnitudes of the
tail areas. From left to right, these are 0.10, 0.05, 0.025, 0.01, and 0.005. The entries in each interior
column of the table are the t-values for a given tail area (read from the top of the column where the entry
appears) and for a given df (read from the row in which the entry appears). Finding a t-value for a given
α and df simply involves finding the intersection of these row and column entries in the body of the
table.

Example:

If n=2 and α =0.05, find df and t α .

Solution:

Since n=2, df=n-1=2-1=1. Since α =0.05, we are looking for t0.05 corresponding with df=1. To solve this
problem, go down the df column and locate “1". Go across the top row and locate t0.05. The solution is
the intersection of this row and this column in the body of the table; i.e., t0.05=6.314.

On your own, verify that, for α =0.05 and df=19, t0.05=1.729. For α =0.05 and df=1000, verify that
t0.05=1.646.
15
Situation Where We Use t In Place Of z: Confidence Intervals

Let’s look at the first situation where t exhibits its practicality. Return to the 15 chromium samples for
which we calculated a confidence interval for µ . Recall that, on the basis of 15 samples (n=15),
X =5.616 mg/kg and s=1.827 mg/kg. At the same time, we assumed that σ =s=1.827 mg/kg, and we
constructed a 90% confidence interval for the unknown population mean.

Recall the formula for a (1 − α ) confidence interval when using z:

X − z α 2 ⋅ ( σ / n ) ≤ µ ≤ X + z α 2 ⋅ (σ / n )

Since we do not have σ , we should replace σ and z with s and t, respectively, in the confidence interval
formula. This yields:
X − t α 2 ⋅ (s / n ) ≤ µ ≤ X + t α 2 ⋅ (s / n )

Calculating the 90% confidence interval is straightforward once we find the proper values for + t α 2 . To
do so, revisit the suggestions beginning at the bottom of page 10. Be aware that the (1 − α) confidence
interval formula requires us to find + t α 2 . Also notice that df=n-1=15-1=14 for this problem.
(1) 1- α = 0.90
(2) α =1-0.90=0.10
(3) α 2 =0.10/2=0.05
(4) t α 2 =t0.05=1.761 for df=14 (See the t-table).
(5) - t α 2 =-t0.05= -1.761 for df=14.

Making the substitutions, we get:


X − t α 2 ⋅ (s / n ) ≤ µ ≤ X + t α 2 ⋅ (s / n )

5.616 − 1.761⋅ (1.827 / 15) ≤ µ ≤ 5.616 + 1.761⋅ (1.827 / 15)

5.616 − 0.831 ≤ µ ≤ 5.616 + 0.831

4.785 ≤ µ ≤ 6.447

16
t-table

df t0.10 t0.05 t0.025 t0.01 t0.005 df


1 3.078 6.314 12.706 31.821 63.657 1
2 1.886 2.920 4.303 6.965 9.925 2
3 1.638 2.353 3.182 4.541 5.841 3
4 1.533 2.132 2.776 3.747 4.604 4

5 1.476 2.015 2.571 3.365 4.032 5


6 1.440 1.943 2.447 3.143 3.707 6
7 1.415 1.895 2.365 2.998 3.499 7
8 1.397 1.860 2.306 2.896 3.355 8
9 1.383 1.833 2.262 2.821 3.250 9

10 1.372 1.812 2.228 2.764 3.169 10


11 1.363 1.796 2.201 2.718 3.106 11
12 1.356 1.782 2.179 2.681 3.055 12
13 1.350 1.771 2.160 2.650 3.012 13
14 1.345 1.761 2.145 2.624 2.977 14

15 1.341 1.753 2.131 2.602 2.947 15


16 1.337 1.746 2.120 2.583 2.921 16
17 1.333 1.740 2.110 2.567 2.898 17
18 1.330 1.734 2.101 2.552 2.878 18
19 1.328 1.729 2.093 2.539 2.861 19

20 1.325 1.725 2.086 2.528 2.845 20


21 1.323 1.721 2.080 2.518 2.831 21
22 1.321 1.717 2.074 2.508 2.819 22
23 1.319 1.714 2.069 2.500 2.807 23
24 1.318 1.711 2.064 2.492 2.797 24

25 1.316 1.708 2.060 2.485 2.787 25


26 1.315 1.706 2.056 2.479 2.779 26
27 1.314 1.703 2.052 2.473 2.771 27
28 1.313 1.701 2.048 2.467 2.763 28
29 1.311 1.699 2.045 2.462 2.756 29

30 1.310 1.697 2.042 2.457 2.750 30


35 1.306 1.690 2.030 2.438 2.724 35
40 1.303 1.684 2.021 2.423 2.704 40
50 1.299 1.676 2.009 2.403 2.678 50
60 1.296 1.671 2.000 2.390 2.660 60

70 1.294 1.667 1.994 2.381 2.648 70


80 1.292 1.664 1.990 2.374 2.639 80
90 1.291 1.662 1.987 2.369 2.632 90
100 1.290 1.660 1.984 2.364 2.626 100
1000 1.282 1.646 1.962 2.330 2.581 1000

1.282 1.645 1.960 2.326 2.576


z0.10 z0.05 z0.025 z0.01 z0.005

17
A comparison of the intervals using z and t is as follows:

90% confidence interval using z: 4.840 ≤ µ ≤ 6.392


90% confidence interval using t: 4.785 ≤ µ ≤ 6.447

The interval using t is slightly wider than when using z for the same level of confidence. The item
responsible for the increase in width is the t-value. (Compare t=1.761 to z=1.645).

Recall that a wider interval for the same level of confidence means a loss in precision. When comparing
the limits of the two intervals above, the loss in precision when using t is negligible. This is due to the
small value of s in the first place. When n itself is small or s itself is large or when we have a
combination of the two, the differences in widths between the two intervals can be dramatic. This loss in
precision is the penalty for not having the true standard deviation, only its estimate, in the interval. In
fact, this penalty is more severe the smaller is n and less severe the larger is n. To see this, visit the t0.05
column in the t-table. Start with our entry (t0.05=1.761) and notice how its value increases as n decreases
and how its value decreases as n increases. This former set of circumstances results in ever-widening
intervals, the latter in ever-narrowing intervals. When n gets really large, t and z give virtually identical
results.

An Upper One-Sided (1-α) Confidence Interval For µ

We have just seen that a two-sided (1-α) confidence interval for µ is:

X − t α/2 ⋅ (s/ n ) ≤ µ ≤ X + t α/2 ⋅ (s/ n ) .

Frequently, environmental data are skewed to the right, and decisions need to be made about observations
in this end of the distribution. Observations in the left end of the distribution are not the focal point of the
test. Thus, it becomes relevant to construct an upper one-sided (1-α) confidence interval for µ. This is
nothing more than a two-sided confidence interval specified only in terms of its upper limit (UL) and with
the entire α area placed in the right tail (as opposed to α/2 in the two-sided situation). Thus, the upper
one-sided (1-α) confidence interval for µ becomes:

0 ≤ µ ≤ X + t α ⋅ (s / n ) .

It is apparent that the upper limit (UL) is:

UL = X + tα ⋅ (s/ n ) .

The only difference between the upper limits for one-sided and two-sided confidence intervals is the value
of t: tα/2 is used in the upper limit for a two-sided confidence interval, and tα is used in the upper limit for
an upper one-sided confidence interval.

Another Confidence Interval Example

State of Connecticut Regulation of Department of Environmental Protection


(Page 16 of 66: (e) Applying the Direct Exposure and Pollutant Mobility Criteria)

18
“Unless an alternative method for determining compliance with a direct exposure criterion has been
approved by the Commissioner in writing, compliance with a direct exposure criterion is achieved when
(A) the ninety-five percent upper confidence level of the arithmetic mean of all sample results of
laboratory analyses of soil from the subject release area is equal to or less than such criterion, provided
that the results of no single sample exceeds two times the applicable direct exposure or (B) the results of
all laboratory analyses from the subject release area are equal to or less than the applicable direct
exposure criterion.”

Taking Apart The Regulation

Important: Although the regulation is not specified in terms of constructing a two-sided confidence
interval or an upper one-sided confidence interval, let's perform the calculations in terms of the latter.

Suppose we are focusing on arsenic levels, in mg/kg, for a particular site. Suppose that we take 20
samples (i.e., n=20). The observations are as follows:

10.20 4.17 1.92 17.80 6.34 1.55 15.60 3.10 7.81 6.90
2.06 4.72 5.73 14.10 9.18 7.78 4.66 7.63 4.28 10.40

Next, we calculate the sample mean (X = 7.30) , the sample standard deviation (s=4.53), and the standard
error of the sample mean (s X = s / n = 1.01). We use the t-table to get the appropriate t-value for the
required upper 95% confidence interval. Consulting the t-table, we need t.05 for df=n-1=20-1=19. This t-
value is 1.729.

Suppose, further, that the direct exposure criterion for arsenic is set at 10 mg/kg. Reading (A) and (B) of
the regulation, we have the following possible situations:
(1) From (B), if all samples are ≤ 10 mg/kg, compliance results.
(2) From (A), if just one sample is ≥ 20 mg/kg, compliance fails.
(3) From (A), if some samples are between 10 and 20 mg/kg, with some below 10 mg/kg,
compliance may result.

Make a decision about compliance based upon comparing the one-sided 95% upper confidence limit of
the arithmetic mean of all the samples with the direct exposure criterion.
⋅ If this upper confidence limit is ≤ 10 mg/kg, compliance results.
⋅ If this upper confidence limit is >10 mg/kg, compliance fails.

Q: Which of the three situations do we have?


A: Five samples are above 10 mg/kg, so we do not have Situation (1). No sample is above 20 mg/kg,
so we do not have Situation 2. We have Situation 3 because five samples are between 10 and 20
mg/kg and 15 are below 10 mg/kg.

Since we have a situation of possible compliance, we proceed with calculating the upper limit (UL) for
the upper one-sided 95% confidence interval

UL = X + t α ⋅ (s / n ) = 7.30 + 1.729(0.010) = 9.046 mg/kg .

Since UL = 9.046 mg/kg and this value is less than the direct exposure criterion of 10 mg/kg, compliance
has been achieved.

19
Summary And Words Of Caution When Using t Or z

All of the testing that we have done so far has focused on a particular sample statistic, the mean. We
could likewise concentrate on other statistics like a median, proportion, or a percentile. The confidence
intervals that we have constructed so far are based on normality of the sample mean. The Central Limit
Theorem is the vehicle that guarantees the normality of the sample mean irrespective of the distribution of
the observations upon which the sample mean is calculated. If the parent population itself is not highly
skewed, normality of the mean kicks in at small sample sizes. If the parent is extremely skewed,
normality kicks in around a sample size of 30. We have gone through a computer simulation to
demonstrate the behavior of the sample mean based upon a population that was extremely skewed.

We have also made a distinction between when to use the z-distribution and when to use the t-
distribution. We concluded that the more realistic of the two is the t-distribution because we never know
the true population standard deviation. The best we can do is to estimate the population standard
deviation. The t-distribution accommodates this estimate.

Procedures using t are based on the assumption that the observations come from a normal population.
They work reasonably well when the observations are not normally distributed and the sample size is
small (e.g., less than 15) or moderate (between 15 and 30), provided that the observations under
consideration are not too far from being normally distributed. We say that these procedures are robust to
violation of the normality assumption.

Treatment Of Outliers And Testing Suggestions

We must always be cautious of outliers, defined as observations that fall well outside the overall pattern
of the data. An outlier may be the result of error in recording or measurement; it may also be an unusual
and extreme observation. Obviously, outliers call into question the normality assumption. Even when the
sample size is large, outliers may also affect our test procedures because both the sample mean and
sample standard deviation are not resistant to outliers, although less so than when the sample size is small.
For small sample sizes in particular, these procedures are not robust to outliers. It is important to examine
our data before applying test procedures. This will ensure that the test procedures will be appropriate.

If an outlier is present and it is not the result of recording or measurement error, several things can be
done. As a preliminary, apply the procedure to the data set with and without the outlier(s). If the
difference is substantial, take an alternative course of action. Two possibilities are suggested: (1) If the
data can be shown to abide by the characteristics of a different probability distribution, e.g., log-normal,
transform the data to this distribution and conduct the tests according to the parametric assumptions of
this distribution; (2) disregard searching for the correct parametric distribution and implement the
appropriate non-parametric procedure.

A Simple Approach For Assessing Data Distribution And The Possibility Of Outliers

A Data Set’s Five-Number Summary And Box-And-Whisker Diagram (Or Boxplot)

A straightforward method for seeing how sample data are distributed begins with computing their
quartiles. Furthermore, performing an additional calculation between the first and third quartiles (to
obtain the interquartile range) assists with determining whether an observation might be an outlier.

20
Quartiles are numbers that divide the data set into four equal parts, i.e., quarters. In order to calculate
quartiles, the observations must first be ordered by size, from smallest to largest. A data set has three
quartiles, denoted as Q1, Q2, and Q3. The first quartile Q1 is the number that divides the bottom 25% of
the data from the top 75%. The second quartile Q2 is the number that divides the bottom 50% of the data
from the top 50%. This is also the median. Notice that the median is a measure of the middle of the
ordered data set. The third quartile Q3 is the number that divides the bottom 75% of the data from the top
25%. By definition, quartiles depend strictly on observation order. They are not sensitive to observation
size. The interquartile range (or IQR) is defined as the difference between the first and third quartiles;
i.e., IQR = Q3 – Q1. It gives the range of the middle 50% of the data and is the preferred measure of
spread when the median is used as the measure of center.

Obtaining quartiles is quite easy. Most importantly, begin by ordering the data. Then, obtain the median.
At this point, the data are divided into two parts: a lower 50% and an upper 50%. Next, find the median
of the lower 50%; this will be Q1. Finally, find the median of the upper 50%; this will be Q3. The
resulting three numbers divide the data set into four parts that each contains 25% of the data.

Pictures of four common continuous distributions are presented in the top panels of the figure below.
Quartiles are plotted on the horizontal axis of each parent distribution. The upper-left top panel shows
that distance between pairs of Qis (and also between the minimum value and Q1 and between Q3 and the
maximum value) is the same. This provides information that the distribution is uniform. Visual
inspection of the remaining top panels suggests a different distribution as these horizontal segments
change in size. In particular, notice that the shorter the horizontal segment, the taller the respective
portion of the continuous distribution.

Four Parent Populations And Accompanying Box-And-Whisker Diagrams (Or Boxplots)

Famous Princeton University statistics professor John Tukey took this concept and applied it to sample
data. His creation was the box-and-whisker diagram, or boxplot. It makes use of a data set’s minimum,
Q1, Q2, Q3, and maximum (also called a data set’s five-number summary) to provide a graphical display of
21
the center (i.e., the median) and variation in a data set. Boxplots corresponding to the four different
parent distributions in the pictures above are presented directly beneath each continuous distribution.

In each boxplot, the vertical line that divides the overall rectangle into two boxes represents the center of
the ordered data set. This vertical line is the median or Q2. Notice that the quartiles establish the
horizontal length of the boxes. In addition, a horizontal line connecting Q1 with the minimum value and a
horizontal line connecting Q3 with the maximum value result in the whiskers. Horizontal distances of the
whiskers and the boxes provide a preliminary idea of the distribution of the data. In particular, for a given
boxplot, notice the horizontal lengths of the segments. The shorter the horizontal segment in the boxplot,
the taller is the respective portion of the continuous distribution.

Upon constructing a boxplot for a given body of data, it is common to use the results to get a preliminary
idea about the data’s distribution. The figure's boxplots provide insight for the possible patterns that a
data set may have.

Interquartile Range (IQR) And Outliers

Quartiles and IQR can be used together to develop a general rule that is useful for identifying potential
outliers. First define the following lower limit and upper limit:

Lower limit = Q1 − 1.5 ⋅ IQR


Upper limit = Q3 + 1.5 ⋅ IQR.

The benchmark rule is as follows: an observation that lies1.5 IQRs below the first quartile or 1.5 IQRs
above the third quartile is a potential outlier. If this happens, further data analysis should be done to
determine the reason, if possible.

Examples

So far, we have three bodies of data: 15 lead concentration levels; 15 chromium concentration levels; and
20 arsenic concentration levels. All are expressed in mg/kg. The 15 lead concentration levels were
presented on page 1; the 15 chromium concentration levels were presented on page 11; and 20 arsenic
levels were presented on page 19. Each ordered data set, a comment about its size, and its five-number
summary follow.

Lead (rounded to one decimal):


17.1 21.4 23.3 23.4 24.7 25.6 26.9 26.9 27.3 29.2 33.1 37.3 44.9 49.8 58.8
n = 15 (Sample size is considered moderate).
minimum = 17.15 Q1 = 23.41 Q2 = 26.95 Q3 = 37.28 maximum = 58.77

Chromium (rounded to one decimal):


2.6 3.4 3.5 4.1 4.7 5.3 5.4 5.5 5.9 6.0 6.4 6.5 6.8 8.5 9.4
n = 15 (Sample size is considered moderate).
minimum = 2.643 Q1 = 4.147 Q2 = 5.513 Q3 = 6.525 maximum = 9.412

Arsenic (rounded to two decimals):


n = 20 (Sample size is considered moderate).
1.55 1.92 2.06 3.10 4.17 4.28 4.66 4.72 5.73 6.34
6.90 7.63 7.78 7.81 9.18 10.20 10.40 14.10 15.60 17.80
minimum = 1.55 Q1 = 4.20 Q2 = 6.62 Q3 = 9.95 maximum = 17.80
22
We now calculate the Upper Limit for each data set. We do not bother with the Lower Limit because
anything below this limit is not a problem.

Heavy Metal Upper Limit = Q3 + 1.5 ⋅ IQR


Lead 37.28 + 1.5(37.28-23.41) = 58.09
Chromium 6.525 + 1.5(6.525-4.147) = 10.09
Arsenic 9.95 + 1.5(9.95 - 4.20) = 18.58

None of the observations for any of the data sets exceeds its Upper Limit. Strictly, outliers are not
present.

The computer program Minitab was applied to all three data sets to glean information about each data
set’s distribution and about the possibility of outliers. The boxplot for lead revealed a pattern consistent
with a right-skewed distribution. Boxplots for chromium and arsenic favored a normal appearance.
Minitab likewise flagged the last lead observation as an outlier with no outliers detected for the other two
data sets. For all three data sets, each maximum was close to its IQR.

Note that confidence intervals were previously constructed using the chromium and arsenic data. The
boxplots were constructed for these data sets prior to the analysis. An important parametric (i.e., distri-
butional) assumption when using z or t is that the random variable under consideration (e.g., X in this
case) be normally distributed. The boxplots resulted in a normal appearance, suggesting validity to the
approach taken. A formal test for normality is the Shapiro-Wilk test. Since it involves hypothesis testing,
it is not presented unless hypothesis testing is explained.

Hypothesis Testing: The Classical Approach (Test Of One Mean)

Hypothesis testing is important because of its contribution to the decision making process. It is
complementary to confidence interval construction. We explain the classical approach to hypothesis
testing in terms of five steps. Each step will be explained in full with an example before proceeding to the
next step. After the classical approach, we explain the p-value approach to hypothesis testing.

Step 1: State the null and alternative hypotheses.

A hypothesis is simply a statement that something is true. Effectively, hypothesis testing involves making
a choice between two competing states of nature. The hypotheses are designed to be mathematical
opposites. Each hypothesis purports to represent the way things really are, i.e., the true state of nature.

Unfortunately, we never know which state of nature is the true state of nature. All that we can do is to
offer evidence supporting one state of nature or the other. Our evidence comes in the form of a statistical
calculation performed on a body of data. Once we have statistical evidence, we are in a position to
support one hypothesis or the other.

The situation that best promotes an understanding of hypothesis testing without a lot of mathematics is a
jury trial. The competing states of nature are the prosecution’s allegation that the defendant is guilty and
the defense’s allegation that the defendant is innocent. (Notice that these are opposite states of nature).
The jury listens to the (statistical) evidence and decides, i.e., infers, which state of nature is appropriate
based upon the evidence. Upon digesting the evidence, the chairperson of the jury reports that the
defendant is either guilty or not guilty.
23
In hypothesis testing, the two competing states of nature are represented by a null hypothesis and an
alternative hypothesis.

Null hypothesis: H0
⋅ The proposition being challenged
⋅ Expressed in terms of a specific value of a population parameter: µ 0 = .

Alternative Hypothesis: Ha
⋅ The opposite of (or alternative to) the null
⋅ Expressed in terms of one or several values of the population parameter, different from the
value given to µ 0 : µ a = ; µ a < ; µ a > ; µ a ≠ .

Example: You are at a wetland site. You have been instructed to test for chromium contamination.
Before doing any testing, you realize that the two competing states of nature are:

State of nature 1: soil is not contaminated


State of nature 2: soil is contaminated.

You desire to come up with values for µ that characterize each state of nature. Suppose that a
regulatory agency has determined that a population mean chromium concentration equal to 7
mg/kg characterizes the no-contamination state. Likewise, a population mean chromium
concentration significantly greater than 7 mg/kg characterizes the contamination state. Thus,
in terms of values for µ, the hypotheses are:

H0: µ 0 = 7 mg/kg
Ha: µ a > 7 mg/kg

Pictures that characterize the sampling distribution of sample means for each state of nature are
presented below. Notice that the picture of the sampling distribution representing the alternative
hypothesis is positioned to the right of the picture representing the null hypothesis. Simply
comparing the centers of the two provides a reason for this occurrence.

Sampling Distribution of X
Under the Assumption That H 0 Is True

X
µ0 = 7
Sampling Distribution of X
Under the Assumption That H a Is True

X
µa > 7

24
Step 2: Before gathering evidence, decide upon a tail probability (α) associated with H0 true.
This tail probability is your admission that an eventual X in this tail is so remote from
µ 0 that X does not support this hypothesis as true.

(a) α is referred to as the “level of significance”.

(b) It is a tail-area probability, typically one of five common magnitudes: 0.001, 0.01, 0.025, 0.05, or
0.10. (In this example, let α = 0.05.)

(c) α is placed in that tail of H0 which would intersect a corresponding tail of Ha if the pictures of the
two sampling distributions were superimposed on each other.

(d) α reflects the notion that a value of X (i.e., your eventual sample result) in this region is so
distant from µ 0 that the “home” for your X must be the other state of nature.

(e) α always goes with the picture corresponding with “H0 true.”

Example: α = 0.05 is placed in the upper tail of the “H0 true” sampling distribution.

Sampling Distribution of X
Under the Assumption That H 0 Is True

α = 0.05

X
µ0 = 7

Step 3: Establish a decision rule to assist in choosing between hypotheses.

(a) α is used to derive a t α . (Since α = 0.05, t α =t0.05. We will get a specific value for t α from the t-
table once we know the degrees of freedom).

(b) t α is a point of demarcation used for making a decision between the competing hypotheses. The
two possible decisions are stated in terms of H0. The proper words for these two possible
decisions are: "Fail to Reject H0" and "Reject H0".

(i) To one side of t α , we will go with H0. In this example, fail to reject H0 if our eventual X
converted to t falls to the left of t0.05.

(ii) To the other side of t α , we will not go with H0; i.e., we will reject H0. In this example,
reject H0 if our eventual X converted to t falls to the right of t0.05.)

(iii) t α is referred to as the “critical value.”

25
X H 0 true

α = 0.05

X
µ0 = 7

Decision
Fail to Reject H0 Reject H0 t
Rule
t 0.05

Example: We know that a sample result to the left of t α will lend support to the null hypothesis. A
sample result to the right of t α lends support to the alternative hypothesis. Notice that we
have set up this decision rule before peeking at any data!

Step 4: Generate your samples. Using these data, calculate X and s. Convert X to t. Place t on
the decision rule line.

(i) Site data were generated within budget. Suppose that nine samples were taken. (These
chromium measurements were taken from a different location than the 15 previously used.)
Results, in mg/kg, are:

10.1548 17.8599 10.2117


13.0761 17.0871 10.9996
13.5646 14.0418 13.1423

(ii) From the data, we calculate X =13.349 mg/kg and s=2.751 mg/kg. The question is: Is our
X sufficiently far from µ 0 =7 mg/kg to warrant rejection of H0?

(iii) To answer the question in (ii), convert X to t.

X − µ 0 13.349 − 7.00
t= = = 6.923
s/ n 2.751/ 9

This sample result (X) is approximately seven standard deviations to the right of the
hypothesized mean of 7 mg/kg. (Wow! that’s really far!)

(iv) In order to place t on the decision rule line, we must first get t α . Having generated our
samples, we know that n=9. Thus, df=n-1=9-1=8. Consulting the t-table on page 17, we
see that, for df=8 and α =0.05, t0.05=1.860. Our sample result (X), converted to t=6.923,
can now be placed on the decision-rule line.

Example: The results are presented as follows:

26
H 0 true

α = 0.05

X
µ0 = 7

Decision
Fail to Reject H0 Reject H0 t
Rule
t 0.05 =1.860
t = 6.923

Step 5: Apply the decision rule by comparing t and t α . Make a decision. State your conclusion
in words.

(i) Apply the decision rule by comparing t and t α . Notice that t=6.923 lies far to the right
of t .05 =1.860. That is, t> t α .

(ii) Make a decision. Since t> t α , reject H0. (Rejecting H0 means that we cannot support
the mean chromium concentration level to be 7 mg/kg).

(iii) State the conclusion in words. Evidence is sufficient to show that the mean chromium
concentration level is not 7 mg/kg. Evidence suggests that it is some level significantly
greater than 7 mg/kg.

The P-Value Approach To Hypothesis Testing

Revisit the diagram in Step 4 of the classical approach to hypothesis testing. Notice the one-to-one
correspondence between the level of significance (α = 0.05) in the picture of the sampling distribution
under H0 true and the critical value (t0.05 = 1.860) depicted on the decision- rule line. Notice, further, that
each of these has a vertical bar; also, the vertical bar of one lines up precisely with the vertical bar of the
other. When it comes to α and tα, one implies the other, so they ought to line up pictorially in this fashion.
Specifically, given α, I will be able to find tα. Likewise, given tα, I will be able to find α.

Now, consider the other t-value presented on the decision-rule line. This represents an X value from the
sampling distribution picture converted to t by way of the t formula. Given what we said above, this
suggests that this calculated t-value must have a corresponding probability value from the sampling
distribution picture. The probability value associated with this calculated t-value (or test statistic) goes by
the name p-value.

Locating the calculated t-value on the decision rule line and moving upward to the sampling distribution
picture, it appears that there is no p-value that corresponds with the calculated t-value. Take note,
however, that the tails of the sampling distribution never touch the horizontal axis; rather, they are
asymptotic to the horizontal axis. There is a p-value in this case, but it is extremely small; i.e., it is in the
vicinity of 0.00001 and is typically written p < 0.00001.

The attractiveness of the p-value is in its interpretation. It indicates how likely observing the particular
value of our test statistic would be if, in fact, the null hypothesis were true. Small p-values provide

27
evidence against the null hypothesis; larger p-values do not provide evidence against the null hypothesis.
The closer that the p-value is to zero, the stronger is the evidence against the null hypothesis. What is a
small versus a large p-value is frequently put into perspective by comparing the p-value to a chosen level
of significance. Once this is done, decisions regarding a hypothesis test are made just like they were
when using the classical approach. That is, reject the null hypothesis when the p-value is smaller than the
level of significance. Do not reject the null hypothesis when the p-value is larger than the level of
significance.

For our hypothesis test example using the nine chromium observations, we stated that the p-value
corresponding to the test statistic (t = 6.923) is < 0.00001. Since this value is so small, we have strong
evidence against the null hypothesis. Said another way, if the null hypothesis of no contamination were,
in fact, the true state of nature for the situation that we are testing, the result that we actually observe, X =
13.349 mg/kg and translated to t = 6.923, is extremely unlikely. This result definitely does not support
the null hypothesis.

Complementarity Between Hypothesis Testing And Confidence Interval Construction

Suppose that the direct exposure criterion for chromium is set at 14 mg/kg. (This level is chosen for
illustrative purposes). To get an idea of how the regulation makes use of an upper one-sided 95%
confidence interval (similar to the one constructed for the arsenic example) to reach a conclusion about
compliance, we begin with the chromium results: X =13.349, s=2.751, n=9, df=8, and t.05 = 1.860. Next,
we use the formula for the upper limit (UL) of an upper one-sided confidence interval:

UL = X + t α (s / n )

Making the appropriate substitutions, we have:

UL = 13.349 + 1.860 (2.751/ 9 ) = 15.05 .

Since UL = 15.05 mg/kg and this value exceeds the direct exposure criterion of 14 mg/kg, compliance has
not been achieved.

Notice that the conclusion using the confidence interval (i.e., compliance has not been achieved) is
consistent with the results of the hypothesis test (i.e., Evidence supports the contamination state of
nature.).

Testing For Normality: The Shapiro-Wilk Test

One of the most powerful statistical tests for normality is the W-test developed by Shapiro and Wilk. The
mathematics involved with deriving the test statistic is quite involved. Basically, the test works as
follows. First, the observations are arranged in order, from smallest to largest value. Next, a probability
is calculated for each observation under the assumption that it comes from a normal distribution. This is
done by first computing the mean and standard deviation for the data set. Using these two measures and
each data value, a z-score is then calculated. These z-scores are identified with probabilities under the
normal curve. The probabilities are then plotted against the data values. Since there is curvature in the
normal curve, the probabilities are converted to a log scale prior to plotting.

The null hypothesis for the test is that the data have a normal distribution. The alternative hypothesis is
that the data do not have a normal distribution. If the data follow a normal distribution, we would expect
that the data values match the probabilities assumed under normality. This suggests that there should be a
28
high correlation (i.e., a high degree of linear association) between the scaled probabilities and the original
observations.

The W-test for normality involves the calculation of a correlation coefficient (R). Given the null and
alternative hypotheses, the correlation coefficient is transformed to the W- statistic. A test of significance
is performed. Computer packages like Minitab typically report the correlation coefficient and the p-value
for the test.

Normality tests were performed on the four heavy metals. Normality was rejected for the lead (n = 15)
data but not for the chromium (n = 15) data, not for the chromium (n = 9) data, and not for the arsenic (n
= 20)data. P-values for the specific data sets are reported below. Recall that small p-values lead to
rejection of the null hypothesis of normally distributed data.

Heavy Metal P-Value Test Result


Lead (n = 15) 0.0339 Reject Normality
Chromium (n = 15) >0.1000 Do Not Reject Normality
Chromium (n = 9) >0.1000 Do Not Reject Normality
Arsenic (n = 20) >0.1000 Do Not Reject Normality

The following four figures are Minitab's way of doing the W-test. Notice that p-values presented in the
lower right-hand corner of each figure are summarized in the table above.

29
30
Hypothesis Testing: Comparison Between Two Means

Beyond testing one population mean, as in the previous example, the five-step hypothesis-testing
procedure can be adapted to a variety of situations. It is particularly useful for making comparisons.

An example is the comparison between site and background data. Here we make use of the previous site
data (n=9). It is to be compared against background data (n=6). The data are presented as follows (in
mg/kg):
Site Background
10.1548 6.3252
17.8599 7.6762
10.2117 8.0639
13.0761 3.6461
17.0871 6.8746
10.9996 9.7481
13.5646
14.0418
13.1423

The appropriate issue is whether or not site concentrations of chromium are significantly higher than
background concentrations. From an inferential statistics perspective, the issue is whether the data sets
come from the same population or from different populations. Alternatively, this is equivalent to asking
if there is a difference between the population means of the two bodies of data.

Expanding these statements with notation, if the site concentrations (S) and the background
concentrations (B) are the same, then:

(1) S and B come from the same population, or

(2) µS = µ B (the means of the two populations are the same) or

(3) subtracting µ B from each side, µS − µ B =0; i.e., the difference in means is zero.

If the site concentrations (S) are greater than the background concentrations (B), then

(4) S and B come from different populations, or

(5) µS > µ B (the mean of S exceeds the mean of B) or

(6) subtracting µB from each side, µS − µ B >0; i.e, the difference in means is positive.

Notice that any one of (1)-(3) represents one state of nature and any of (4)-(6) represents the competing
state. When making inferences between these two states of nature, we set up the hypotheses in terms of
the difference between two means. Hence, we formulate (3) as the null hypothesis and (6) as the
alternative hypothesis. We make a judgment about which state of nature to support by looking at the
behavior of the sampling distribution of the difference between the two sample means; i.e., we look at the
behavior of XS − X B .

31
As indicated previously, the five steps to hypothesis testing are the same when testing one mean or two
means. The only distinction relates to the form of the test statistic, i.e., t, in each situation. Recall, when
testing one mean, we needed to convert X to t in order to apply the decision rule. Also recall the
relationship between X and t:

X − µ0
t= .
s/ n

When testing for the difference between two means, we must consider not one X , but the hypothesized
relationship between two Xs . We likewise must convert this difference XS − X B to t in order to apply the
decision rule. The relationship between XS − X B and t is

(XS − X B ) − (µS − µ B )
t= .
sS2 s B2
+
nS n B

This form of t may seem bizarre, but it abides by the same logic as t for the one sample case. Specifically,
the numerator addresses the degree to which the difference between sample means (i.e., XS − X B )
deviates from the hypothesized difference between population means (i.e., µS-µB) when H0 is true. That
is, we are looking at the degree to which XS − X B deviates from zero (where zero is the value of
µS − µ B when H0 is true). This numerator is judged large or small relative to a measure of the standard
error of the two data sets combined. This combined standard error is the item appearing in the
denominator of the t statistic. There is a very complicated formula for df when implementing this test.
That complicated formula is not presented here. The following one works just about as well:

df = (nS-1) + (nB-1) = nS + nB - 2 .

We now carry through the example to test for the difference in two means.

Example: You are at a wetland site. You have been instructed to test for chromium contamination. To
conduct a proper test, you gather site and background samples as presented on page 31.

Summary statistics generated for each data set are as follows:

Site Background

XS = 13.349 X B = 7.056
SS = 2.751 SB = 2.042
nS = 9 nB = 6

Determine, at the 0.025 level of significance, whether the site observations have a significantly
higher mean level of chromium concentration than the background observations. (Note:
Previously, we have not provided evidence that chromium concentration levels are not
normally distributed).

32
This is a situation where we are testing for the difference in two means. The steps are as follows:

Step 1: State the null and alternative hypotheses.


H0: µS − µ B = 0 (There is no difference in means.)
HA: µS − µ B > 0 (The difference in means is positive.)
Pictures of the sampling distribution are as follows:

X
H 0 true

XS -X B
µS − µ B = 0

H a true

XS -X B
µS − µ B > 0

Step 2: Decide upon a level of significance: α .

H 0 true
α = 0.025

XS -X B
µ S -µ B = 0

Recall that α reflects then notion that a value of XS − X B (i.e., your eventual sample result) in this region
is so distant from µS − µ B = 0 that the “home” for your XS − X B must be the other state of nature. The
problem states that α=0.025.

Step 3: Establish a decision rule to assist in choosing between hypotheses.

α = 0.025

XS -X B
µ S -µ B = 0

Decision
Fail to Reject H0 Reject H0 t
Rule
t 0.025 = 2.160

33
Since α =0.025, t α = t0.025. We can find t0.025 once we calculate df for this problem. So, df = nS - 1 + nB -
1 = nS + nB - 2 = 9 + 6 - 2 = 13. Consulting the t-table, t0.025 for df=13 is 2.160.

Our sample result XS − X B will eventually be converted to t. If this t ≥ t0.025, we reject H0. If t < t0.025, we
fail to reject H0.

Step 4: Convert sample statistics to t. Place t on the decision rule line.

(XS − X B ) - (µS - µ B ) (13.349 - 7.056) - 0 6.293


t= = = = 5.08 .
s2
s 2
(2.751) 2
(2.042) 2
1.5359
S
+ B
+
nS nB 9 6

Decision
Fail to Reject H0 Reject H0 t
Rule
t 0.025 = 2.160
t = 5.08

Step 5: Apply the decision rule by comparing t and t α . Make a decision. State your conclusion
in words.

Since t > t0.025, we reject H0. Evidence suggests that the difference in means is positive; that is, the mean
concentration of chromium at the site locations is significantly greater than at background.

Incorrect Decisions In Hypothesis Testing

Hypothesis testing is the state of the art when it comes to utilizing statistical information for decision
making purposes. The results of a hypothesis test are, however, never infallible. Keep the following in
mind when conducting a hypothesis test. We never know which state of nature is true. On top of this, we
use imperfect and incomplete information to make an inference about the unknown state of nature. Better
information should promote a correct decision. But even with the best of information, we run the risk of
making an incorrect decision.

First, let’s return to our nonmathematical jury example to illustrate the decisions that can be made. These
go beyond simply finding the defendant guilty or not guilty. There are, in fact, four decisions that can be
made in a jury trial. Two of them are correct, and two of them are incorrect.

The jury can find the defendant:


⋅ guilty, when in fact the defendant is guilty
⋅ not guilty, when in fact the defendant is guilty
⋅ guilty, when in fact the defendant is not guilty
⋅ not guilty, when in fact the defendant is not guilty.

First, review these in your mind, and you will determine that these really are four different decisions.
Furthermore, two of these are correct decisions, and two are incorrect. The first and last decisions are
correct; the second and third decisions are incorrect. Now, think about the consequences of the two
34
incorrect decisions. Each one represents large personal and societal costs. Yet, these incorrect decisions
are inescapable. All that we can try to do is to minimize their probability of occurrence.

Relate the above example to the test of the difference in two means that was previously presented.
Another way of stating the null hypothesis is that the soil is not contaminated. Another way of stating the
alternative hypothesis is that the soil is contaminated. The four possible decisions that can be made are as
follows:

The Environmental Professional can find the soil to be:


⋅ contaminated, when in fact the soil is contaminated
⋅ not contaminated, when in fact the soil is contaminated
⋅ contaminated, when in fact the soil is not contaminated
⋅ not contaminated, when in fact the soil is not contaminated.

As in the previous example, the second and third decisions are incorrect. Each one has a hefty cost
associated with it. Decision two results in a heavy societal cost and an environmental cost that is
perpetuated as a result of the area not being cleaned up when it should have been. In addition, word
eventually gets out that you made a mistake, and the firm you represent gets a bad reputation. Decision
three results in unnecessary clean-up costs and a fear contagion that something is wrong with the area
when in fact nothing is wrong.

These incorrect decisions are exacerbated by trying to discover the limits of contamination before this
study begins. For example, placing a sample into a “hot spot” camp when it really doesn’t belong there
can really change your analysis and the conclusion of your hypothesis test.

Again, there is no way to eliminate completely the possibility of these incorrect decisions. All that we can
do is to try to minimize the probability of their occurrence. If we are in the unfortunate situation of being
forced to choose between incorrect decisions, we should choose that incorrect decision which has the least
drastic consequences.

All of these issues can be illustrated with the pictures that we developed for our hypothesis test problems.

We begin by generalizing the decisions that the jury and the Environmental Professional can make. For
any hypothesis test, we can:
⋅ fail to reject the null hypothesis, when in fact the null hypothesis is true
⋅ reject the null hypothesis, when in fact the null hypothesis is true
⋅ fail to reject the null hypothesis, when in fact the alternative is true
⋅ reject the null hypothesis, when in fact the alternative is true.

Notice that the first and last decisions are correct; the second and third decisions are incorrect.

You may be wondering why we choose these specific phrases (before and after the word “when”) in
stating the four possible decisions above. The reason is that these are the identical phrases presented in
the pictorial representation of both the competing states of nature and the decision rule in our previous
hypothesis test situations.

35
For example, previous pictures had this appearance:
X

H 0 true
α

Decision
Fail to Reject H0 Reject H0 t
Rule

X

H a true

Clearly, the following phrases are presented in the picture above: fail to reject H0; reject H0; H0 true, and
Ha true.

We now can relate these phrases by focusing on α in the picture. First, notice that α is associated with
both a particular state of nature (i.e., the state of nature that H0 is true) and a particular course of action
(i.e., α instructs us to reject H0).

Second, notice that α is an area or a probability. In fact, it is the probability of rejecting H0 when H0 is
true. Notice what we have just done. We have assigned a probability to one of the four specific decisions
that we can make. Truly, this is the spirit of risk assessment!

If we have identified the probability associated with one of the four decisions, surely we must be able to
do the same for the remaining three decisions. This becomes possible by providing symbols to the three
remaining areas left unlabelled in the previous picture. It is reproduced below. Unlabelled areas are now
labelled as 1-α, β, and 1-β.

H 0 true
α
1-α

Decision
Fail to Reject H0 Reject H0 t
Rule

H a true
β
1-β

36
Regarding the top normal curve, if α is a tail area, the remaining area must be 1-α (because the total area
under the curve must be 1). Associating 1-α with a specific state of nature and a course of action, 1-α is
the probability of failing to reject H0 when H0 is true. Thus, we have assigned a probability to the first
possible decision. (Notice that this is a probability attached to a correct decision).

The next probability is labeled β . Through similar reasoning, β is the probability of failing to reject H0
when Ha is true. (This is a probability attached to the other wrong decision that can be made). Finally, if
β is a tail area associated with the bottom normal curve, the remaining area of that curve must be 1-β.

Associating 1-β with a specific state of nature and a course of action, 1-β is the probability of rejecting H0
when Ha is true. (This is a probability attached to the second correct decision). We summarize the
decisions, their correctness, and their probabilities as follows:

Was this the


correct thing
Decision to do? Probability
Fail to reject H0 when H0 is true yes 1-α
Reject H0 when H0 is true no α
Fail to reject H0 when Ha is true no β
Reject H0 when Ha is true yes 1-β

Several of the items in the table above have common statistical names. They are:

Item Name
Reject H0 when H0 is true Type I error
Fail to reject H0 when Ha is true Type II error
α Probability of a Type I error
β Probability of a Type II error
1-β Power of a test; the probability of not
making a Type II error

It is important for a risk assessor to evaluate the magnitude of the probabilities presented above. The best
of all worlds is to have 1-α and 1-β be as large as possible and to make α and β as small as possible.
Unfortunately, manipulating one of these items affects the remaining three. Consequently, before we
proceed with calculating either the probability of a Type II error or the power of a test, we will
demonstrate the tradeoffs resulting from these manipulations. A firm understanding of the consequences
of these manipulations sets the stage for appreciating any calculation we eventually make.

We begin with that probability over which the decision maker has control. This item is α. What does it
mean to say that a decision maker has control over α?

The easiest way to understand this is to consider that a regulatory agency can set a regulatory threshold
(RT) that effectively is a point of demarcation between concluding that a site is or is not contaminated.
Suppose this point is in terms of a mean, and it is denoted as X RT (in the one-sample case) or as X RT,DIF
(in the two-sample case). We will proceed with using the X RT notation. Similar logic applies to X RT,DIF .

37
Now, let X RT be introduced to the following customary picture:

H 0 true
α

No Contamination Contamination t
Decision tα
Rule No Contamination Contamination X
X RT

Notice that the regulatory agency’s decision rule line is comparable to the decision rule line that we
formulated as Step 3 in hypothesis testing. Now, however, the decision rule is specified in terms of
X rather than t. For this situation, we notice that X RT and tα line up vertically. This means that tα is
nothing more than a transformation of X RT . Specifically, since

X-µ
t= ,
s/ n

it follows that

X RT - µ 0
tα = .
s/ n

Importantly, we can calculate tα once we have: (1) X RT ; (2) a value of µ under the null hypothesis true,
i.e., µ0; (3) our calculated standard deviation s, and; (4) n.

Suppose that we follow this procedure using our nine samples and calculate t α to be 1.862. (We use a
hypothetical value X RT =8.707 to make things "work.") What do we do with t α ?

We know that tα is associated with a tail area α. Once we know tα, we can find α if we are given the
degrees of freedom. Next, we use the t-table in reverse. We locate df = n - 1 = 9 - 1 = 8 under the df
column of the t-table. Locating “8" we go across until we find a number that is close to 1.862 (our tα
value). We see that the number matching this is 1.860 in the body of the table. We go up the column
identified with the 1.860 entry and see that the title is t0.05. Thus, tα = 1.860 = t0.05. This means that
α = 0.05. Thus, the regulatory threshold is identified with an α of 0.05. This is the meaning of the
decision maker having control over α.

The next question ought to be:

Q: If α is the probability of making an incorrect decision, and if I have control over α, doesn’t it pay
for me to make its value even smaller? By doing so, I not only lower α, but I also increase 1-α,
which is the probability of a correct decision!

38
A: If the regulatory agency is iron-fisted and sets X RT itself, which translates into α being fixed, you
have no leeway for manipulating α. But, realistically, each situation is unique, and you can
exercise control over X RT , which means you can adjust α.

Now if you do make α smaller, you get yourself into the following dilemma. Notice the picture
below. It is our familiar set-up for the two states of nature. The decision-rule line is purposely
omitted.

H 0 true
α

H a true
β

α is initially set at a level consistent with the solid vertical line. Notice that setting  at this level sets β
according to this same vertical constraint. Now, decrease α as suggested above. This translates into
shifting the solid vertical line rightward. This shift is represented by the broken vertical line. Notice the
new magnitude of β. It becomes larger as a result of this move.

A new question arises.

Q: Are there negative consequences for making α smaller?

A: Yes, by making α smaller, we make β larger. That is, by making the probability of a Type I error
smaller, we make the probability of a Type II error larger. Not only this, but we also reduce 1-β,
which itself is the probability of another correct decision and is referred to as the power of the test.
Thus, the power of the test is reduced.

Hence, there is no free lunch for manipulating α in this fashion! There are always negative
consequences.

Q: Isn’t there anything I can do to lessen α and β?

A: Yes, there is. But it will cost you. Specifically, recall the workings of the Central Limit Theorem.
Suppose you change the number of samples that you take from “small” to “large”. What will
happen to the pictures of the sampling distributions on the previous page? They will become more
peaked around their means and less variable (i.e., have smaller tails) as indicated below. Smaller
tails mean α and β are lessened. This is reasonable, because the more information we have, the
more informed our decisions will be. With more information, we begin to avoid making incorrect
decisions. The power of our test increases.

39
H 0 = true For small sample, α = 0.0475

For large sample, α = 0.0087

For small sample, β = 0.1056


H a = true
For large sample, β = 0.0367

Final Question:

Q: Due to budgetary reasons, suppose I cannot increase the number of samples I take. Also, suppose
I exercise control over α. What should I do?

A: Recall that there is an inverse relationship between α and β. Both α and β are bad things, but you
are able to judge which one is worse than the other. In this situation, there is no way to avoid α
and β. Thus, you should minimize the level of that error which is the more detrimental of the two.

A Calculation For β and 1-β

We now calculate β, the probability of a Type II error, or the probability of failing to reject H0 when Ha is
true. This translates into concluding that a site is not contaminated when in fact it is.

We will perform the calculation in the context of our two-sample test. The first thing we will do is
determine the items needed to perform the calculations. This is accomplished by observing where β is
located in the pictorial framework of a hypothesis test problem.

On page 36 we see that β is identified with the sampling distribution of the test statistic under the
assumption that the alternative hypothesis is true. That situation is presented again in the picture on page
41.

40
H 0 true
1-α α = 0.025

XS - X B
µ S -µ B = 0

Decision
Fail to Reject H0 Reject H0 t
Rule
t 0.025 = 2.160

H a true
β 1-β

XS - X B
µ S -µ B > 0

We illustrate the mechanics of calculating β with the following steps:

(1) We find that value of XS − X B that matches up with α=0.025. Notice that the value of tα
associated with α=0.025 is 2.160. Also, the value of µS − µ B under H0 true is zero. The pooled
standard error is 1.5359 = 1.239 (from page 34). Inserting these values into the formula for t,
we get

(XS − X B ) − 0
2.160 =
1.239

which, after rearranging, yields XS − X B = (2.160)(1.239) = 2.676.

(2) This value of XS − X B is the same one represented by the vertical bar separating β and 1-β in the
bottom part of the picture that is presented above. To calculate β, all we need is a value of
µS − µ B postulated by the alternative hypothesis. So, the choice is ours.

(3) Select µS − µ B = 7. The t-value corresponding with XS − X B = 2.676, µS − µ B = 7, and a pooled


standard error of 1.239 is

2.676-7
t= = -3.49 .
1.239

(4) We can find a probability that gets paired with this t-value by consulting the t-table. Locate 13
degrees of freedom; then go across the row that corresponds with this entry. Find that value of t
that is closest to the absolute value of -3.49. The closest number to 3.49 is 3.012, and this is
associated with a probability of 0.005. Our result, then, is associated with a probability less than
0.005.

(5) Thus, the probability of a Type II error, or β, is less than 0.005 for a µS − µ B of 7. The power of
the test, or 1-β, is greater than 0.995.

41
(6) A power curve can be developed using different values of µS − µ B under HA true. This entire
process can be reworked using different levels of α.

(7) The items in (6) provide information about the consequences of a hypothesis test over wide ranges
of values for the alternative hypothesis parameter.

Sample Size Issues

Suppose that you can specify levels of both α and β with which you feel comfortable. If you can do this,
you will be able to find the sample size n that results in these specific values of α and β . We will address
this when we deal with strategies to determine the proper sample size.

Behavior Of Observations Having A Lognormal Distribution

The standard assumption when working with data is that they are normally distributed (Figure A).
Environmental data are such that they frequently abide by a distribution that is anything but normal.
More specifically, the distribution may be a member of the skewed distribution family (Figure B). One
such distribution is the lognormal.

XE XE
Figure A Figure B

Knowledge about the underlying parent population from which data are generated is important because
this assists in assessing the probability of an apparently extreme observation (XE) and thus whether the
observation is indeed probable or should be treated as a possible outlier.

For example, the extreme XE has a low probability of occurrence in Figure A and thus appears to be an
aberration if it is assumed that the data are normally distributed. If the parent population is truly
lognormal (Figure B), we see that the same XE has a much higher probability of occurrence and should
not be treated as an aberration.

To get an idea of the relation between tail areas and corresponding values of X for each type of
distribution above, I used Minitab to store 89 arsenic observations in C1. Next, I took the natural
logarithm of each value in C1 and stored the entire set in C2. Thus, C2 contains the logs of the original
82 observations. Finally, I calculated the descriptive statistics for C1 and C2. These are presented as
follows:

Variable N Mean Median TrMean StDev SE Mean


C1 89 7.855 6.430 7.294 6.032 0.639
C2 89 1.7857 1.8610 1.7982 0.7807 0.0828

42
Recall the definition of standard normal z:

X−µ
z=
σ

Suppose we desire that value of z identified with 0.025 in the right tail of the z-curve. This is z.025 =1.96.
Assume that C1 is normally distributed and that the mean value 7.855 reported for C1 and the standard
deviation 6.032 are fair approximations for µ and σ , respectively. Substituting these items into the
equation directly above, we have

X − 7.855
1.96 =
6.032

In order to find that value of X identified with 0.025 in the right tail of the normal curve, we simply solve
for X in the equation directly above. Thus

X = µ + z σ = 7.855 + 1.96(6.032) = 19.677

which means that the X value equal to 19.68 has 0.025 of the area under the normal curve to its right, if,
in fact, X is normally distributed.

Now, suppose that the variable under discussion is not normally distributed but lognormally distributed.
This translates into ln X, not X, being normally distributed. We now desire to find that value of ln X with
0.025 of the area under the normal curve to its right. Given the descriptive statistics for C2, we make the
substitutions:

ln X − 1.7857
1.96 =
0.7807

Solving for ln X, we have

ln X = µ + z σ = 1.7857 + 1.96(0.7807) = 3.3158


or
X = exp (3.3158) = 27.544

which means that the X value equal to 27.544 has 0.025 of the area under the lognormal curve to its right.
This also means that a value, say, 19.677 (chosen to match the result of the first example) has an area
greater than 0.025 under the lognormal curve to its right. In fact, the area to the right of 19.677 under the
assumption of lognormality is calculated to be 0.0630. Thus, a value of 19.677 is more of an extreme
occurrence under the assumption of normality compared to lognormality. Information about the
underlying distribution of the observations definitely has an impact on one’s assessment of “extreme”
values.

Incidentally, a boxplot of the original data corroborated a distribution skewed to the right. A boxplot of
the lognormal transformation corroborated a normal shape. The more formal W-test supported these
findings.

43
Small Sample Sizes And Parent Distribution Departing From Normality

The confidence intervals that we constructed in previous sections relied on inserting the proper t α / 2 or
z α / 2 values (for a two-sided confidence interval) or the proper t α or z α values (for either an upper- or
lower-tailed, one-sided confidence interval) once the (1 − α) level of confidence was specified. In our
computer experiment that explored the behavior of the sample mean, we saw that normality of the sample
mean kicked in, no matter what was the underlying distribution of the parent population, once the sample
size reached 30; i.e., when n=30. Normality is also robust to smaller sample sizes provided that the
departure of the underlying parent population from normality is not too extreme.

Regarding environmental contaminant data, the underlying parent populations frequently are non-normal.
In addition, the practitioner’s data set usually consists of a small number of samples; i.e., n<10.
Moderately-sized data sets (from 10 to 15 samples may be considered a luxury.

We have presented two aids for gleaning information about the underlying distribution of the data. A
boxplot provided a visual pertaining to the data’s distribution, and the W-test was a formal test of
normality of the data.

The lognormal distribution is perhaps the most popular of the skewed distributions used by environmental
practitioners to represent their data. Other distributions like the Weibull, Gamma, or Beta have a similar
appearance to the lognormal and may even be more appropriate for a given situation. The conventional
wisdom when it comes to testing a data set’s distribution is to test the null hypothesis that the original
observations are normally distributed. If the null is rejected, transform the data by taking the natural
logarithm. Test to see if the transformed data have a normal distribution. If the null is not rejected,
continue the analysis using log transformed data. Unfortunately, not rejecting the null does not
necessarily mean that the data are lognormally distributed. It simply means that the null cannot be
rejected. Other distributions might be more appropriate, and a further test could be constructed between
pairs of competing distributions.

In what follows, we emphasize the lognormal framework. We do not get into the issue of estimating the
parameters of competing parent distributions and testing which of these distributions is best for our
situation.

An Experimental Design: Set-Up For Generating Lognormal Parameter Estimates

Obtaining the parameter estimates that eventually get inserted into the limits for the confidence interval is
quite involved. An extensive literature has developed relating to the pros and cons of the different
methods. It is easy to get confused regarding the precise items that make up the desired confidence
interval once the issue of parameter estimation is addressed. In addition, the literature cautions about the
potential poor performance of the confidence interval itself, even after taking the purest approach.

To illustrate the level of involvement, we use an example borrowed from Gilbert (1987, p.166). He
generated values for a lognormally distributed random variable X, with population mean µ X = 6.126 and
population standard deviation σ X = 8.667. He then did a logarithmic transformation on X to get Y =
ln(X). It has population mean µ Y = 1.263 and population standard deviation σ Y = 1.048. So, up to this
point, the shape of X is a smooth lognormal curve and skewed to the right. The shape of Y is normal.

44
Next, he took a random sample of 10 observations from X. Each value of X has an accompanying value
of Y. These are presented below. They appear in his Table 13.1.

X: 3.161 4.151 3.756 2.202 1.535 20.76 8.42 7.81 2.72 4.43
Y: 1.1509 1.4233 1.3234 0.7894 0.4285 3.033 2.1306 2.0554 1.0006 1.4884

Summary statistics are:

X = 5.89 s X = 5.69 Y = 1.48235 s Y = 0.75385 .

This sample information – both sample observations and sample statistics – mimics what we as
practitioners have at our disposal. In this artificial setting, we have the additional luxury of knowing how
the data were generated. We have the true probability distribution, the true mean, and the true standard
deviation. We know what should be the form of our confidence interval. (This added information is most
helpful for judging our eventual estimation results).

Parameter Estimators For A Lognormal Distribution

In the absence of population information about the variable of interest and prior to constructing a
confidence interval, we would first explore the behavior of the data. Following advice contained in
previous sections, we’d look at the boxplot and perform the W-test. Results for the present example
would reveal that X is not normally distributed but that Y is. (We know that these results must hold
because X was generated to be lognormal).

Our task is to construct an upper one-sided 95% confidence interval for the mean of original variable X.
We refer to the population mean of variable X specifically as µ X . Again, the formula for an upper one-
sided (1-α) confidence interval for µ X is:

0 ≤ µ X ≤ X + t α ⋅ (s / n )

We might think about applying the formula above to X , but because of the small sample size, the Central
Limit Theorem cannot assure a sample mean that is normal, and the confidence interval will probably not
work well. An alternative and perhaps more immediate inclination is first to construct the interval around
µ Y because we know that Y is normally distributed. It would appear as follows:

0 ≤ µ Y ≤ Y + t α ⋅ (s Y / n ) .

Next, it seems reasonable to exponentiate the confidence limit to produce a confidence interval for the
mean in terms of the original scale. However, this method actually produces a confidence interval for the
median of the distribution, not the mean.

There is a solution, but it is linked to making the proper transformations on Y and s Y that will make them
unbiased and minimum variance estimators of µ X and σX , respectively.

45
Gilbert (1987) summarizes several methods for obtaining estimators. He also proposes a simple method
of estimating µ X and σX , resulting in:

1
 s2 
X = exp  Y + Y  and s X = X  exp ( s Y2 ) − 1 2 , respectively.
 2

By way of these formulas, notice, for example, that obtaining the estimator X requires more than simply
exponentiating Y. Similar reasoning applies to s X .

To see how these equations performed, we tried them on the original data. Notice that calculating X and
s X requires only the substitution of Y = 1.48235 and s Y = 0.75385 into the two equations directly above.
Our calculations were:

1
 0.753852 
X = exp 1.48235 +  = 5.85 and s X = 5.85 exp ( 0.75385 ) − 1 = 5.117 .
 
2 2

 2 

Notice that these values are different than the sample statistics that were presented with the data. Also
note that while X=5.85 compares well with µ X = 6.126 , s X = 5.117 is quite a bit smaller than
σ X = 8.667 . Gilbert states (1987, p. 167) that this estimator is simple to calculate, but it is not efficient
(i.e., does not have the standard deviation with the best properties).

Getting Parameter Estimates: Probability Plotting

A second approach for obtaining parameter estimates begins by looking at the possible relationship
between the ordered log values of the variable of interest and the cumulative probability for each of these
values. On the basis of a plot of these two variables, the mean and variance of the variable in question can
be estimated. It should be noted that, although presentation of this approach might seem redundant given
that the procedure above already demonstrates a methodology for obtaining parameter estimates, this
approach is useful for dealing with censored data, which will be addressed shortly. Thus, laying out the
mechanics here will be useful when analyzing censored data later.

As we proceed, the terms quantile and percentile are used interchangeably. Recall that the pth quantile of
a population is the number such that a fraction p of the population is less than or equal to this number.
The pth quantile is the same as the 100pth percentile; for example, the 0.5 quantile is the same as the 50th
percentile.

Again, the variable X is lognormally distributed. So, the variable Y=ln(X) is normally distributed. It is
the variable Y with which be begin the analysis. Also, our sample size is small. (If the sample size were
large, we could appeal to the Central Limit Theorem).

This procedure is presented in Gilbert (1987, p. 168). Its steps are as follows.

(1) Order the n untransformed X observations from smallest to largest. The result is n order
statistics: x [1] ≤ x [2] ≤ ⋅⋅⋅ ≤ x [ n ] . (At the same time, bring along the ordered set of logarithms
y[1] ≤ y[2] ≤⋅⋅⋅ ≤ y[n] ).

46
(2) For each value or order statistic x [i] , calculate ( i-0.5 ) ⋅100 / n. This item represents the percentile
associated with the respective order statistic.

(3) Gilbert recommends plotting the order statistics against their percentiles on log-probability
paper and then fitting a straight line by eye. He says this for two reasons. First, if a straight line
can be fit among the plotted points, this corroborates normality. Second, once the line is fit, for
any percentage value that is specified (e.g., 16, 50, and 84), the corresponding percentile (e.g.,
x16 , x 50 , and x 84 ) can be read from the plot after taking the appropriate antilog. It turns out that
these three percentiles in particular are used to get estimates for µ Y and σ Y2 . This is addressed
in Step (5).

(4) Rather than using log-probability paper, fitting a line by eye (as suggested by Gilbert) and, in
turn, obtaining estimates for x16 , x 50 , and x 84 by eye, a linear function can be estimated and fitted
directly by regressing the ordered values of Y=ln(X) on the calculated percentiles. Notice that
values for these two variables are obtained in Step (2). The result will be a fitted linear
regression equation of the form ŷi = a+b ( percent i ) , where a is the estimated intercept from the
regression, b is the estimated slope, percenti = ( i-0.5 ) ⋅100 / n from Step (2), and ŷi is the
calculated or fitted value of yi obtained once all three items on the right-hand side are substituted
into the equation. Also, the fit of the regression equation can be judged by its coefficient of
determination R2. The closer that R2 is to 1, the better the fit.

(5) As Gilbert (1987, p.168) shows, estimates for µ Y and σ Y2 can be obtained as follows,
respectively:
2
  1  x x   
Y=ln(x 50 ) and s = ln   50 + 84    .
2
Y
  2  x16 x 50   

The former equation suggests that the estimated mean ought to be located at the 50th
percentile. This is expected for a normally distributed variable. The latter equation
provides a measure of spread based on data between the 16th and the 84th percentiles. It has a
spread interpretation similar to interquartile range.

(6) The mean and standard deviation of the original distribution are then estimated by taking the
calculations above and entering them in the same equations that appeared at the end of the
previous section. Again, they are:

1
 s 2Y 
X = exp  Y +  and s X = X  exp ( s Y ) − 1 2 , respectively.
 2

 2

To get an idea of the performance of this approach, we applied it to the 10 observations that we
introduced for this example on page 45.

(1) All of the X and Y observations are ordered as follows:


X: 1.535 2.202 2.72 3.161 3.756 4.151 4.43 7.81 8.42 20.76
Y: 0.4285 0.7894 1.0006 1.1509 1.3234 1.4233 1.4884 2.0554 2.1306 3.033

47
(2) All of the ( i-0.5 ) ⋅100 / n percent calculations are as follows:
Percent: 5 15 25 35 45 55 65 75 85 95

(3) Implementing Minitab, we can plot Y against Percent and notice a linear relationship. (This is not
presented here).

(4) The fitted regression equation is ŷi = 0.294+0.023767 ( percent i ) . Also, R2=0.911. When we
successively let percenti = 16, 50, and 84 and substitute each of these into the regression equation,
we get ŷi = 0.6743, 1.4824, and 2.2904, respectively. Notice that each ŷi is shorthand notation for
ln(xi); i.e., each ŷi is already a logarithm. To get the estimated percentile x̂ i , we must take the
antilog of ŷi . These are 1.9626, 4.4035, and 9.8788, respectively.

(5) Estimates for µ Y and σ Y2 are:

2
  1  4.4035 9.8788   
Y=1.4824 and s = ln  
2
Y +    = 0.65296 .
  2  1.9626 4.4035   

(6) Estimates for the mean and standard deviation of the original distribution are:

 0.65296  1
X = exp  1.4824 +  = 6.1036 and s X = 6.1036 
 exp ( 0.65296 ) − 1
 = 5.858 .
2
 2 
X=6.1036 compares well with µ X = 6.126 ; s X = 5.858 is quite a bit smaller than σ X = 8.667. Both
estimates are better than with the previous technique.

Land’s Approach To Get A Confidence Interval

Once the estimates have been obtained, the next step is to construct the upper limit (UL) for insertion into
the upper one-sided (1 − α) confidence interval. Land (1971, 1975) showed that this upper limit is:

 s H 
UL=exp  Y + 0.5s 2Y + Y α  ,
 n-1 
where:

1 n 1 n
∑ ∑ ( Yi − Y ) , and s Y = s Y2
2
Y= Yi , s 2Y =
n i=1 n-1 i=1

Notice that this upper limit has components similar to those of the conventional confidence interval when
using z or t. Specifically, it has one component to capture the estimate of the mean, one to capture the
estimate of the standard error, and one to capture the potential distance between the sample mean and the
population mean. This last component is H α , and it has an interpretation that parallels t α or z α . Values
for H α can be obtained from tables provided by Land (1975). A subset of these tables is presented as
Tables A10-A13 in Gilbert (1987).

48
Gilbert does not follow through with the example, but we do; i.e., we construct UL for a one-sided upper
confidence interval. Items for insertion into UL are Y = 1.48235 , s Y = 0.75385 , n = 10, and H α = 2.621.
This results in:


UL=exp  1.48235 + 0.5 ( 0.75385 ) +
2 ( 0.75383 ⋅ 2.621)  = 11.303.

 9 

He points out that Land’s method works provided that one is confident that the underlying distribution is
lognormal. Millard and Neerchal (2001, p. 244) point out that, while Land’s method is exact and has
optimal properties, it is extremely sensitive to the assumption that the data come from a lognormal
distribution. Ginevan and Splitstone (2004, p. 45) point out that lack of fit to a lognormal distribution
may be nearly impossible to detect. They take the strong position that Land’s procedure should never be
used with environmental contamination data. In their view, bootstrapping is the best approach for
constructing confidence intervals for means when the data are not normally distributed.

Dealing With Censored Data Sets

There are situations where the true concentration of the sample being measured may be close to zero.
Under these circumstances, the actual measured value may be less than the measurement limit of
detection (LOD). When this happens, laboratories may report these values as not detected (ND), as less-
than (LT), or as zeroes. When data values below the LOD are unavailable, we say that the data are
“censored to the left.”

The dilemma created by NDs, LTs, or artificial zeroes is that they taint the data set. After all, we are
trying to characterize correctly the distribution that drives our confidence intervals. If observations are
manipulated in this fashion, we end up biasing our estimates of X and s x . It would be ideal if the
laboratory had reported the actual value below LOD, if this were possible. In the absence of this course
of action, we must use a strategy that avoids bias or at least holds it to a minimum.

The table below lists four simple approaches that could be taken but that lead to biased estimates of X and
s x . Regarding the reporting of data, we assume that only LT values are reported when a measurement is
below the LOD.

Censoring Approaches That Result in Biased Estimates of X and s X


1. Use all measurements, including LT values.
2. Use only “detect” measurements. Ignore LT values.
3. Replace LT values with zeroes. Proceed with computations.
4. Replace LT values with some number between zero and LOD. Proceed
with computations.

Three approaches are suggested as possible alternatives when the data are censored. These are
summarized in the table below.

Preferred Censoring Approaches When Trying to Find the Middle


1. Compute the sample median.
2. Compute the trimmed mean.
3. Winsorize the data. Compute the Winsorized mean and Winsorized
standard deviation.

49
The first approach in the table above is appropriate when the distribution is symmetric because the mean
and median will be the same for symmetric distributions. If the distribution is asymmetric and skewed to
the right, then the sample median will tend to be smaller than the true mean. If the distribution is skewed
to the left, then the sample median will tend to be larger than the true mean.

Regarding the second approach, let n be the sample size and p be a percentage of observations in the data
set to be eliminated or trimmed from the data set. Define the limits of p as 0<p<0.50. The 100p%
trimmed mean is defined as the arithmetic mean computed on the n(1-2p) data values remaining after the
largest np data values and np smallest data values have been trimmed from the data set. The number of
data values trimmed at each end is the integer part of the product pn. The most extreme case is when all
but the middle two values are trimmed. For asymmetric distributions, the literature reports that a 15%
trimmed mean is safe.

Finally, Winsorizing is a technique used with symmetric distributions. It involves removing the trouble
values, e.g., the NDs at the lower end, and replacing each with the next largest and available value. At the
other end, remove the same number of largest values and replace each with the next smallest and available
data value. This revised set of observations is the Winsorized data set.

Suppose that there are n observations in total. Compute the sample mean based on the n Winsorized
observations. This is the Winsorized mean. Call it X W . Compute the sample standard deviation based
on the same n Winsorized observations. Call it s. (Note: This is not the Winsorized standard deviation).
Let v be the number of observations not replaced during the Winsorization process. The Winsorized
standard deviation ( s W ) is defined as:

s ( n-1)
sW = .
v-1

When constructing a confidence interval using t, the quantity (v-1) will be the number of degrees of
freedom and s W is the proper standard deviation to use for insertion into the confidence interval formula.
It turns out the X W is an unbiased estimator of µ , and s W is an approximately unbiased estimator of σ .

Suppose the data happen to be skewed to the right. If a logarithmic transformation shows that the
transformed data are from a normal distribution, suggesting that the original data might be lognormal,
Winsorization can be used on the transformed data to get their mean and standard deviation. The
respective Winsorized estimators YW and s 2YW can then be inserted into the following formulas to get
estimates of the population mean and standard deviation for the original lognormal distribution,

 s2  1

X = exp  YW + Y W  and s X = X  exp ( s YW


2
) − 1 2 , respectively.
 2 

50
Getting Parameter Estimates: Censored Data and Probability Plotting

We return to the artificial data first introduced for this example. Suppose that, of the 10 observations, the
first two were missing. We will implement the six-step probability-plotting procedure that we used
earlier on the full set of data, with the exception that now the first two observations are missing.

(1) Ordered observations 3-10 for X and Y are:


X: 2.72 3.161 3.756 4.151 4.43 7.81 8.42 20.76
Y: 1.0006 1.1509 1.3234 1.4233 1.4884 2.0554 2.1306 3.033

(2) For ordered observations 3-10, the ( i-0.5 ) ⋅100 / n percent calculations are:
Percent: 25 35 45 55 65 75 85 95

(3) Implementing Minitab, we can plot Y against Percent and notice a linear
relationship. (This is not presented here).

(4) The fitted regression equation is ŷi = 0.1731+0.02546 ( percent i ) . Also, R2=0.866.
When we successively let percenti = 16, 50, and 84 and substitute each of these into
the regression equation, we get ŷi = 0.5805, 1.4461, and 2.3118, respectively.
Notice that each ŷi is shorthand notation for ln(xi); i.e., each ŷi is already a
logarithm. To get the estimated percentile xˆ i , we must take the antilog of ŷi .
These are 1.7869, 4.2465, and 10.0925, respectively.

(5) Estimates for µ Y and σ Y2 are:

2
  1  4.2465 10.0925   
Y=1.4461 and s = ln  
2
Y +    = 0.74935 .
  2  1.7869 4.2465   

(6) Estimates for the mean and standard deviation of the original distribution are:

 0.74935  1
X = exp  1.4461 +  = 6.1766 and s X = 6.1766 
 exp ( 0.74935 ) − 1
 = 6.524 .
2
 2 
X=6.1766 compares well with µ X = 6.126 ; s X = 6.524 is smaller than
σ X = 8.667 , but it is surprisingly closer to the population standard deviation than is
the one based on the uncensored information.

Strategies To Determine The Proper Number Of Samples

Underlying everything that we have done so far is the tacit concern that the estimates that we generate are
good enough to make reliable inferences. We require that each be an accurate representative of some
unknown population parameter. To be an accurate representative, it must be based on a sufficient number
of observations in order to provide useful information. So the question becomes: What is considered a
sufficient number of observations? Fortunately, we have developed a full set of tools (represented by key
formulas) that are able to guide us in choosing the sample size n so that our sample statistics achieve a
prescribed level of accuracy. We present three methods for determining the proper sample size. Each
method is based on a different rule. Successive rules make use of the rule that precede it.
51
Sample Size Based on Variance of the Sample Mean

We begin with the formula for the variance of the sample mean. It is defined as:

σ2
var X = .
n

Suppose it is mandated that var X must be no larger than some prespecified level L. Substituting L for
var X into the equation above results in:

σ2
L= .
n

If we solve the equation above for n, we obtain the sample size that assures us that var X is no larger than
L.

σ2
n= .
L

Example: Recall the hypothesis test for one mean making use of the nine chromium observations.
Descriptive statistics were: n = 9, X=13.349 mg/kg , and s=2.751 mg/kg. Given s, we calculate
s 2 = 7.568 . Suppose that a new study is to be conducted in the same area as the one from which these
descriptive statistics were generated. One key objective of the new study is that enough samples be taken
so that the variance of the sample mean, i.e., var X , is no larger than 0.5 mg/kg.

In calculating the required sample size, the formula directly above requires both L, which is specified as
0.5, and σ 2 , which we do not have. However, from the previous chromium study, we have the estimate
s 2 = 7.568 which we can use in place of σ 2 . Making the substitutions, we get:

σ2 7.568
n= = = 15.136 ≅ 15 .
L 0.5

Thus, we need 15 samples in total to assure us that var X will be no larger than 0.50. Since we already
have generated n=9 from the previous study, we simply need 15-9=6 new samples.

Sample Size Based on Margin of Error of the Sample Mean

We begin with the formula for the (1-α ) two-sided confidence interval using z:

σ σ
X − zα 2 ⋅ ≤ µ ≤ X + zα 2 ⋅ .
n n

σ
In the formula, notice the quantity that brings X into equality with µ . This quantity is z α 2 ⋅ , and we
n
will call it E. In our previous treatment of confidence intervals, we referred to E as the margin of error or

52
σ
the maximum error of the estimate. (Also, recall the shorthand notation: σ X = ). Thus:
n
 σ 
E = zα 2 ⋅   = z α 2 ⋅ σX .
 n

Because E is developed in the framework of a confidence interval, it can be given a probability


interpretation. This is the result of z α 2 being included in the formula for E. Specifically, E is an absolute
margin of error. Associated with it is an acceptably small probability α of that error being exceeded.
Thus, we are interested in choosing n so that

Probability  X-µ ≥ E  ≤ α .

In this probability formula, both E and α are specified beforehand. Building on the chromium example,
let E = 1 mg/kg and α =0.05. The interpretation of the probability statement is as follows. We desire to
find the sample size n so that there is only a 100 α = 5% chance that the positive or negative difference
between the X obtained from the n samples to be collected and the true mean µ is greater than or equal to
1 mg/kg.

Finding the value of n simply requires rearranging the formula for E above and solving for n:
2
 zα 2 ⋅ σ 
n=   .
 E 

Notice that actually implementing the formula requires having a value for σ. Since we rarely would have
this value, we must use the sample standard deviation s in its place. In addition, since we are now using s
rather than σ , we must use t α 2 in place of z α 2 . Thus, the formula for n becomes:

2
 tα 2 ⋅ s 
n=   .
 E 

We now face a dilemma. Recall that specifying any t α 2 value requires knowledge of the degrees of
freedom (df), which translates into the need to know n. But n is the item for which we are solving in the
first place. A plausible solution, after specifying a value for α , is to use the corresponding value for z α 2 .
After inserting values for E and s into the formula, obtain a first round estimate for n; call it n1 . Now,
having a value for sample size, calculate the corresponding degrees of freedom. Find the corresponding
value for t α 2 and insert it and the values for E and s into the formula to get a second round estimate; call it
n 2 . Compare n 2 to n1 . If they are the same, no further iteration is needed. If not, do another iteration.
After a few rounds, the value of n will stabilize.

Example: Again, the nine chromium observations yielded the following descriptive statistics: n = 9,
X=13.349 mg/kg , and s=2.751 mg/kg. Suppose that a new study is to be conducted in the same area as
the one from which these descriptive statistics were generated. One key objective of the new study is to
estimate the mean concentration of chromium. We are willing to accept a 10% chance (i.e., α = 0.10 ) of

53
getting a data set for which E= X-µ ≥ 1 mg/kg. We begin by using z α 2 = 1.645, s = 2.751, and E = 1. So,

2
 zα 2 ⋅ σ  1.645 ⋅ 2.751 
2

n1 =   =   = 20.48 ≅ 21 .
 E   1

Now, we can proceed with using t. Since n1 =21, df = 20. For α = 0.10 and df = 20, t α 2 = 1.725. Our
second round estimate for n is n 2 , calculated as:

2
 tα 2 ⋅ s  1.725 ⋅ 2.751 
2

n2 =   =   = 22.51 ≅ 23 .
 E  1

Repeating this procedure for n 3 results in t α 2 = 1.714 with n 3 = 22.23 ≅ 23. Since n 2 = n 3 , we are through.
In conclusion, n = 23 samples are needed in total. Since we already have 9, we must generate 23-9 = 14
additional samples.

Sample Size Based on Relative Error of the Sample Mean

It may be that a reliable estimate for σ is not available, but the practitioner might have an idea about what
is a desirable measure of σ relative to the population mean µ . Notice that σ/µ, which is a relative
standard deviation, also goes by the name coefficient of determination, and its symbol is η . This measure
is appealing because it is less variable than σ .

Interest in relative standard deviation suggests that we should likewise focus on the relative error (RE) of
our estimator rather than on E. To get RE, we simply divide E by µ . That is,

X-µ E
RE = = .
µ µ

We now substitute η and RE for σ and E, respectively, into the previous formula for n. Thus, the new
formula for obtaining the desired sample size is:

2
 zα 2 ⋅ η 
n=   .
 RE 

Notice how the “relative” formula is related to the “unit-driven” formula:


2
 σ
2 z ⋅ 2
 zα 2 ⋅ η   α 2 µ   zα 2 ⋅ σ 
n=   =  =  .
 RE   E   E 
 µ 
The middle-bracketed portion of the equation above has a common µ in both its numerator and
denominator. Its canceling effect results in the “unit-driven” formula.

54
Example: Again, the nine chromium observations yielded the following descriptive statistics: n = 9,
X=13.349 mg/kg , and s=2.751 mg/kg. Suppose that a new study is to be conducted in the same area as
the one from which these descriptive statistics were generated. One key objective of the new study is to
estimate the mean concentration of chromium. We are willing to accept a 10% chance (i.e., of getting a
data set for which the relative error exceeds 20%. (So, we would like to determine the α = 0.10 ) sample
size for which Probability  RE ≥ 0.20  ≤ 0.10 ). Suppose also that the practitioner views η=0.50 as
reasonable.

We begin by using z α 2 = 1.645, s = 2.751, RE = 0.20, and η=0.50 . We solve for n1 as follows:

2
 zα 2 ⋅ η  1.645 ⋅ 0.50 
2

n1 =   =  = 16.912 ≅ 17 .
 RE   0.20 

This provides our first round estimate for n. We could proceed in the fashion presented in the previous
section to find the final value for n.

Nonparametric Statistical Tests

Whenever we use z or t, in hypothesis testing or confidence interval construction, for example, we are
making a parametric assumption that our observations come from a normal distribution. The word
parametric implies that a distribution is assumed for the population under consideration. It is entirely
possible that this type of assumption is too strong and therefore inappropriate for the situation. If it is, the
consequence is that our test results may be misleading. We risk making an incorrect decision using this
particular test.

More specifically, with environmental contaminant data, test results might reveal that both the variable
under consideration and its logarithmic transformation are not normally distributed. Or, it may be that the
number of samples available for analysis is so small that it would be a disaster to assume a particular
parent distribution and to conduct statistical tests under this assumption.

Each of these dilemmas has led to the development of a branch of statistics known as nonparametric
statistics. The word nonparametric implies that no distributional assumption is made for the population
under consideration. Nonparametric tests tend to provide results that are robust when the usual parametric
assumptions are violated.

Previously, we conducted a test for the difference between two means using the (parametric) two-sample
t-test on Chromium Site and Background information (measured in mg/kg). We were justified in using
the test because we had a sufficient number of observations to test for normality.
Suppose that only the first four Site observations and the first three Background observations were
available. The amended data set is presented as follows.
Site Background
10.1548 6.3252
17.8599 7.6762
10.2117 8.0639
13.0761

Under these circumstances, objections certainly could be raised about conducting the two-sample t-test
because of the extremely small number of samples.
55
The Mann-Whitney Test

The Mann-Whitney test is the counterpart to the two-sample t-test. The gist of this nonparametric
procedure is to compare the central location of the two data sets; i.e., it seeks to determine if the centers
differ from each other. The test proceeds to make a statement about their medians.

The set-up of the Mann-Whitney test is as follows. Let ηS (eta for S) and η B (eta for B) denote the
median Chromium levels for the site samples and the background samples, respectively. Then, the null
( H 0 ) and alternative ( H a ) hypotheses are:

H 0 : ηS = ηB (Median level for S and median level for B are the same).
H a : ηS > ηB (Median level for S is greater than median level for B).

Because the hypotheses are based on each data set’s median, we will focus on ordered observations. So,
to apply the test, we first rank the data from both data sets combined, from smallest to largest. We then
keep track of where the observation originated; i.e., whether it came from S or B. This information is
summarized as follows:

Ordered Data: 6.32 7.67 8.06 10.15 10.21 13.07 17.85


Rank: 1 2 3 4 5 6 7
Source (S or B): B B B S S S S .

Now, pay attention to the sum of the rankings for each group. For B, the sum is 6. For S, the sum is 22.
The idea behind the test is simple. If the sum of the ranks for the site observations is “large” relative to
the sum of the ranks for the background observations, the alternative hypothesis is supported.

A test statistic called W is calculated for the sum of the ranks for the first group referred to in the null
hypothesis. (In this example, that group is S, and W = 4+5+6+7 = 22). Probability values are calculated
for the particular value of W, sample size of the first group, and sample size of the second group. These
results are reported routinely with computer statistical packages. For example, Minitab reported W = 22,
and the p-value was 0.0259. Thus, we have strong evidence to suggest that the median level for S is
greater than the median level for B.

In this particular example, there is an obvious distinction in terms of the rankings. Even prior to
conducting the test, it is reasonable to expect the alternative hypothesis to be supported simply because of
the rankings themselves. In most situations, the delineations are not as crisp. Yet, the Mann-Whitney test
is quite powerful even under these circumstances.

Summary

In the previous introductory statistics course, we covered the basics of statistical analysis. In this
advanced course, we have built on the past and covered several topics that permit the environmental
professional to examine data in a more complete fashion and to make decisions using a relatively
sophisticated set of statistical tools.

In one sitting, no one can become an expert in the use of these tools. And, in fact, what we have
addressed in this course is merely a sampler as to what is available when it comes to conducting statistical
analyses.

56
Nevertheless, the true purpose of this course is to impart on the user that data do behave! Our charge as
analysts is to be familiar with patterns of behavior under a variety of conditions. After discovering and
characterizing this behavior, we become much better equipped to prescribe a procedure for analyzing the
data.

More specifically, beyond simply generating sample statistics and naively using them, we have
emphasized the importance of knowing something about the behavior patterns of sample statistics. The
reason is that we will be using them to make decisions, so it is imperative that we know how they behave.
We have seen that the Central Limit Theorem provides an explanation about the behavior of the sample
mean and that this behavior is orderly and depends on the number of observations that go into its
calculation.

Ultimately, we use these statistics to make inferences about their population counterparts.
We have seen that these inferences (either through the formal hypothesis test approach or through the
calculation of a confidence interval) depend heavily upon parametric assumptions about how the data
were generated; e.g., that they come from a normal distribution. Under these circumstances, we use the z-
distribution or t-distribution when we proceed with hypothesis testing or confidence interval construction.

We have emphasized the importance of testing for normality of the data as a prerequisite for actually
using z or t. If the data test not to be normally distributed, we have proposed a logarithmic
transformation. We then test these transformed observations for normality. If normality of the
transformed observations holds, we can proceed with devising appropriate sample statistics and, in turn,
conduct hypothesis tests or construct confidence intervals using z or t. If the transformed observations
test not to be normally distributed or if our data set is so small in the first place that the parametric
assumptions cannot be tested, we can resort to nonparametric procedures.

Because generating data is so costly, we have presented procedures for calculating the sample size needed
to achieve desired levels of precision. We have also showed how to calculate and to minimize the
probabilities of making incorrect decisions in a hypothesis-testing context.

References

Gilbert, Richard O., 1987. Statistical Methods for Environmental Pollution Monitoring.
Van Nostrand Reinhold, New York.

Ginevan, Michael E. and Douglas E Splitstone, 2004. Statistical Tools for Environmental
Quality Measurement. Chapman & Hall, New York.

Land, C.E., 1971. Confidence intervals for linear functions of the normal mean and
variance, Annals of Mathematical Statistics 42:1187-1205.

Land, C.E., 1975. Tables of confidence limits for linear functions of the normal mean
and variance, in Selected Tables in Mathematical Statistics, vol. III. American
Mathematical Society, Providence, R.I., pp. 385-419.

Millard, Steven P. and Nagaraj K. Neerchal, 2001. Environmental Statistics with S-Plus.
CRC Press, Boca Raton, Florida.

57

You might also like