You are on page 1of 38

MATH 403- ENGINEERING DATA ANALYSIS

Chapter 6
Sampling Distributions and Point Estimation of Parameters

Introduction

Statistical methods are used to make decisions and draw conclusions about

populations. This aspect of statistics is generally called statistical inference. These

techniques utilize the information in a sample for drawing conclusions. This chapter

covers the study of the statistical methods used in decision making.

Statistical inference has one major areas which is the parameter estimation. In

practice, the engineer will use sample data to compute a number that is in some sense a

reasonable value (a good guess) of the true population mean. This number is called a

point estimate. In this chapter, we will see that procedures are available for developing

point estimates of parameters that have good statistical properties.

Intended Learning Outcomes

At the end of this module, it is expected that the students will be able to:

1. Explain and understand the general concepts of estimating the parameters of a

population or a probability distribution.

2. Calculate and explain the important rule of the normal distribution as a sampling

distribution and the central limit theorem.

3. Solve and explain important properties of point estimators, including bias, variance,

standard error and mean square error.


MATH 403- ENGINEERING DATA ANALYSIS

6.1 Point Estimation

Statistical inference always focuses on drawing conclusions about one or more

parameters of a population. An important part of this process is obtaining estimates of the

parameters. Suppose that we want to obtain a point estimate (a reasonable value) of a

population parameter. We know that before the data are collected, the observations are

considered to be random variables, say, X1, X2, …, Xn. Therefore, any function of the

observations, or any statistic, is also a random variable.

For example, the sample mean X and the sample variance 𝑆 2 are statistics and

random variables. A simple way to visualize this is as follows. Suppose we take a sample

of n = 10 observations from a population and compute the sample average, getting the

result x = 10.2. Now we repeat this process, taking a second sample of n = 10

observations from the same population and the resulting sample average is 10.4. The

sample average depends on the observations in the sample, which differ from sample to

sample because they are random variables. Consequently, the sample average (or any

other function of the sample data) is a random variable.

Because a statistic is a random variable, it has a probability distribution. We call the

probability distribution of a statistic a sampling distribution. The notion of a sampling

distribution is very important and is discussed and illustrated later in the chapter.

When discussing inference problems, it is convenient to have a general symbol to

represent the parameter of interest. We use the Greek symbol θ (theta) to represent the

parameter. The symbol θ can represent the mean μ, the variance σ2, or any parameter of

interest to us. The objective of point estimation is to select a single number based on

sample data that is the most plausible value for θ. The numerical value of a sample
MATH 403- ENGINEERING DATA ANALYSIS

statistic is used as the point estimate. In general, if X is a random variable with probability

distribution f(x), characterized by the unknown parameter θ, and if X1, X2, …, Xn is a

random sample of size n from X, the statistic = h(X1, X2,…, Xn) is called a point estimator

of θ. Note that is a random variable because it is a function of random variables. After

the sample has been selected, Θ̂ takes on a particular numerical value θ̂ called the point

estimate of θ.

 A point estimate of some population parameter θ is a single numerical

value of a statistic . The statistic is called the point estimator.

Point estimation is the process of using the data available to estimate the unknown value

of a parameter, when some representative statistical model has been proposed for the

variation observed in some chance phenomenon.

As an example, suppose that the random variable X is normally distributed with an

unknown mean μ. Sample mean is a point estimator of the unknown population mean μ.

That is, .After the sample has been selected, the numerical value is the point

estimate of μ. Thus, if x1 = 25, x2 = 30, x3 = 29, and x4 = 31, the point estimate of μ is

25 + 30 + 29 + 31
= 4
= 28.75

Similarly, if the population variance σ2 is also unknown, a point estimator for σ 2 is the

sample variance S2, and the numerical value s2 = 6.9 calculated from the sample data is

called the point estimate of σ2.


MATH 403- ENGINEERING DATA ANALYSIS

Estimation problems occur frequently in engineering. We often need to estimate

• The mean μ of a single population

• The variance σ 2(or standard deviation σ) of a single population

• The proportion p of items in a population that belong to a class of interest

• The difference in means of two populations, μ1 − μ2

• The difference in two population proportions, p1 − p2

Practice Problem:

1. Let X be the height of a randomly chosen individual from a population. In order to

estimate the mean and variance of X, we observe a random sample X1, X2,⋯⋯, X7. We

obtain the following values (in centimeters):

166.8,171.4,169.1,178.5,168.0,157.9,170.1166.8,171.4,169.1,178.5,168.0,157.9,170.1

Find the values of the sample mean, the sample variance, and the sample standard

deviation for the observed sample.

6.2 Sampling Distributions and the Central Limit Theorem

The field of statistical inference is basically concerned with generalizations and

predictions. For example, we might claim, based on the opinions of several people

interviewed on the street, that in a forthcoming election 60% of the eligible voters in the

city of Detroit favor a certain candidate. In this case, we are dealing with a random sample

of opinions from a very large finite population. As a second illustration we might state that

the average cost to build a residence in Charleston, South Carolina, is between $330,000

and $335,000, based on the estimates of 3 contractors selected at random from the 30
MATH 403- ENGINEERING DATA ANALYSIS

now building in this city. The population being sampled here is again finite but very small.

Finally, let us consider a soft-drink machine designed to dispense, on average, 240

millilitres per drink. A company official who computes the mean of 40 drinks obtains =

236 millilitres and, on the basis of this value, decides that the machine is still dispensing

drinks with an average content of μ = 240 millilitres. The 40 drinks represent a sample

from the infinite population of possible drinks that will be dispensed by this machine.

 Random Sample

The random variables are usually assumed to be independent and identically

distributed. These random variables are known as a random sample. The random

variables X1, X2, … , Xn are a random sample of size n if (a) the Xi ’s are independent

random variables and (b) every Xi has the same probability distribution.

 Statistic

Such a random variable is called statistic. A statistic is any function of the

observations in a random sample. We have encountered statistics before. For example,

if X1, X2, … , Xn is a random sample of size n, the sample mean ,the sample variance

S2, and the sample standard deviation S are statistics. Because a statistic is a random

variable, it has a probability distribution.

 Sampling distribution

The probability distribution of a statistic is called a sampling distribution. The

sampling distribution of a statistic depends on the distribution of the population, the size

of the samples, and the method of choosing the samples. The probability distribution of

is called the sampling distribution of the mean.


MATH 403- ENGINEERING DATA ANALYSIS

Consider determining the sampling distribution of the sample mean . Suppose

that a random sample of size n is taken from a normal population with mean μ and

variance σ2. Now each observation in this sample, say, X1, X2, … , Xn, is a normally and

independently distributed random variable with mean μ and variance σ2. Then because

linear functions of independent, normally distributed random variables are also normally

distributed as discussed in the previous chapters, we conclude that the sample mean

𝑋1 +𝑋2 ……𝑋𝑛
=
n

has a normal distribution with mean

μ+μ+μ……μ
μ = = μ
n

and variance

σ2 +σ2 +σ2 ……σ2 σ2


σ2 = =
𝑛2 𝑛

Central Limit Theorem

If we are sampling from a population that has an unknown probability distribution,

the sampling distribution of the sample mean will still be approximately normal with mean

μ and variance σ2/n if the sample n is large. This is one of the most useful theorems in

statistics, called the central limit theorem.


MATH 403- ENGINEERING DATA ANALYSIS

If X1, X2, … , Xn is a random sample of size n taken from a population (either

finite or infinite) with mean μ and finite variance σ2 and if is the sample mean,

the limiting form of the distribution of

−μ
𝑍= σ
√𝑛

as n → ∞ is the standard normal distribution

.
Figure1. Illustration of the Central Limit Theorem (distribution of for n =1,

moderate n, and large n)

Figure1 illustrates how the theorem works. It shows how the distribution of

becomes closer to normal as n grows larger, beginning with the clearly nonsymmetric

distribution of an individual observation (n = 1). It also illustrates that the mean of

remains μ for any sample size and the variance of gets smaller as n increases.

Example 1. An electrical firm manufactures light bulbs that have a length of life that is

approximately normally distributed, with mean equal to 800 hours and a standard

deviation of 40 hours. Find the probability that a random sample of 16 bulbs will have an

average life of less than 775 hours.


MATH 403- ENGINEERING DATA ANALYSIS

Solution:

The sampling distribution of will be approximately normal, with = 800 and

= 40/√16 = 10. The desired probability is given by the area of the shaded region

shown in the figure.

Figure 2. Area for Example1

Corresponding to = 775, we find that

755 − 800
𝑍= = −2.5
10

and therefore

P ( < 775) = P (Z < −2.5) = 0.0062.

Practice Problem:

1. An electronics company manufactures resistors that have a mean resistance of 100

ohms and a standard deviation of 10 ohms. The distribution of resistance is normal. Find

the probability that a random sample of n = 25 resistors will have an average resistance

of fewer than 95 ohms.


MATH 403- ENGINEERING DATA ANALYSIS

Approximate Sampling Distribution of a Difference in Sample Means

If we have two independent populations with means μ 1 and μ2 and variances σ21

and σ22 and if 1 and 2 are the sample means of two independent random samples of

sizes n1 and n2 from these populations, then the sampling distribution of the equation

below is approximately standard normal if the conditions of the central limit theorem

apply. If the two populations are normal, the sampling distribution of Z is exactly standard

normal.

1 − 2 − (μ1 − μ2 )
𝑍=
√σ21 /𝑛1 + σ22 /𝑛2

Example 1. Two independent experiments are run in which two different types of paint

are compared. Eighteen specimens are painted using type A, and the drying time, in

hours, is recorded for each. The same is done with type B. The population standard

deviations are both known to be 1.0.

Assuming that the mean drying time is equal for the two types of paint, find

P( 𝐴 − 𝐵 > 1.0), where 𝐴 and 𝐵 are average drying times for samples of size

nA = nB = 18.

Solution:

From the sampling distribution of 𝐴 − 𝐵, we know that the distribution is

approximately normal with mean

μ = μ𝐴 − μ𝐴 = 0
𝐴− 𝐵

and variance

σ2𝐴 σ2𝐵 1 1 1
σ2 = + = 18 + 18 = 9
𝐴− 𝐵 n𝐴 n𝐵
MATH 403- ENGINEERING DATA ANALYSIS

Figure 3. Area for Example2

The desired probability is given by the shaded region in Figure 3. Corresponding


to the value 𝐴 − 𝐵, = 1.0, we have

1 − 2 − (μ1 − μ2 )
𝑍=
√σ21 /𝑛1 + σ22 /𝑛2

1 − (μ𝐴 − μ𝐵 ) 1−0
𝑍= =𝑍= =3
√1 √1
9 9

Therefore, P (Z > 3.0) = 1 – P (Z < 3.0) = 1 − 0.9987 = 0.0013.

Practice Problem:

1. The television picture tubes of manufacturer A have a mean lifetime of 6.5 years and

a standard deviation of 0.9 year, while those of manufacturer B have a mean lifetime of

6.0 years and a standard deviation of 0.8 year. What is the probability that a random

sample of 36 tubes from manufacturer A will have a mean lifetime that is at least 1 year

more than the mean lifetime of a sample of 49 tubes from manufacturer B? Given the

following information.
MATH 403- ENGINEERING DATA ANALYSIS

6.3 General Concepts of Point Estimation

A point estimate of some population parameter θ is a single value of a statistic

. For example, the value of the statistic , computed from a sample of size n, is a point

estimate of the population parameter μ. Similarly, = x/n is a point estimate of the true

proportion p for a binomial experiment.

6.3.1 Unbiased Estimator

An estimator should be “close” in some sense to the true value of the unknown

parameter. Formally, we say that is an unbiased estimator of θ if the expected value of

is equal to θ. This is equivalent to saying that the mean of the probability distribution of

(or the mean of the sampling distribution of ) is equal to θ.

Bias of an Estimator

The point estimator is an unbiased estimator for the parameter θ if

E( )=θ

If the estimator is not unbiased, then the difference

E( )−θ

is called the bias of the estimator .

When an estimator is unbiased, the bias is zero; that is, E ( ) - θ = 0

Example 1. Let X1, X2, X3, ......, Xn be a random sample. Show that the sample mean

below is an unbiased estimator of θ = EXi


MATH 403- ENGINEERING DATA ANALYSIS

Solution:

B( )=E( )=θ

=E( )–θ

= EXi – θ

=0

Note that is an estimator is unbiased, it is not necessarily a good estimator. In the

above example 1 = 𝑋1 .

B( 1) =E( 1) –θ

= EX1 – θ

=0

Practice Problem:

1. Suppose that X is a random variable with mean μ and variance σ 2. Let X1, X2, … , Xn

be a random sample of size n from the population represented by X. Show that the sample

mean and sample variance S2 are unbiased estimators of μ and σ2, respectively.

6.3.2 Variance of Point Estimator

Suppose that 1 and 2 are unbiased estimators of θ. This indicates that the

distribution of each estimator is centered at the true value of zero. However, the variance

of these distributions may be different. Figure 4 illustrates the situation. Because 1 has

a smaller variance than 2, the estimator 1 is more likely to produce an estimate close
MATH 403- ENGINEERING DATA ANALYSIS

to the true value of θ. A logical principle of estimation when selecting among several

unbiased estimators is to choose the estimator that has minimum variance.

Figure 4 Sampling Distributions of Two Unbiased Estimators 1 and 2

Minimum Variance Unbiased Estimator

If we consider all unbiased estimators of θ, the one with the smallest variance is

called the minimum variance unbiased estimator (MVUE).

If X1, X2, … , Xn is a random sample of size n from a normal distribution with mean μ

and variance σ2, the sample mean X is the MVUE for μ.

When we do not know whether an MVUE exists, we could still use a minimum

variance principle to choose among competing estimators. Suppose, for example, we

wish to estimate the mean of a population (not necessarily a normal population). We

have a random sample of n observations X1, X2, … , Xn, and we wish to compare two

possible estimators for μ: the sample mean and a single observation from the sample,

say, Xi . Note that both and Xi are unbiased estimators of μ; for the sample mean, we

have V ( ) =σ2 ∕ n from previous Chapters and the variance of any observation is V (Xi)

= σ2. Because V ( ) < V (Xi) for sample sizes n ≥ 2, we would conclude that the sample

mean is a better estimator of μ than a single observation Xi.


MATH 403- ENGINEERING DATA ANALYSIS

6.3.3 Standard Error

When the numerical value or point estimate of a parameter is reported, it is usually

desirable to give some idea of the precision of estimation. The measure of precision

usually employed is the standard error of the estimator that has been used.

 Standard Error of an Estimator

The standard error of an estimator is its standard deviation given by σ =√𝑉( ). If

the standard error involves unknown parameters that can be estimated, substitution

of those values into σ produces an estimated standard error, denoted by

Sometimes the estimated standard error is denoted by S or SE ( ).

Suppose that we are sampling from a normal distribution with mean μ and variance

σ2. Now the distribution of is normal with mean μ and variance σ2/n, so the standard

error of is
σ
σ =
√𝑛

If we did not know σ but substituted the sample standard deviation S into the

preceding equation, the estimated standard error of would be

S
SE ( ) = =
√𝑛

Table 1. present’s standard errors for some sample statistics with its standard

error formula. Sampling distributions for these statistics, or at least their means and

standard deviations (standard errors), can often be found. Some of these, together with

ones already given, are shown in Table 1.


MATH 403- ENGINEERING DATA ANALYSIS

Table1. Standard Errors for Some Sample Statistics

Example 1. An article in the Journal of Heat Transfer (Trans. ASME, Sec. C, 96, 1974,

p. 59) described a new method of measuring the thermal conductivity of Armco iron. Using

a temperature of 100°F and a power input of 550 watts, the following 10 measurements

of thermal conductivity (in Btu/hr-ft-°F) were obtained:

41.60, 41.48, 42.34, 41.95, 41.86,

42.18, 41.72, 42.26, 41.81, 42.04


MATH 403- ENGINEERING DATA ANALYSIS

A point estimate of the mean thermal conductivity at 100 °F and 550 watts is the sample

mean or

= 41.924 Btu ∕ hr-ft-°F

The standard error of the sample mean is = σ ∕ √𝑛, and because σ is unknown, we

may replace it by the sample standard deviation s = 0.284 to obtain the estimated

standard error of as

S 0.284
SE ( ) = = = √10 = 0.0898
√𝑛

6.3.4 Mean Square Error of an Estimator

Sometimes it is necessary to use a biased estimator. In such cases, the mean

squared error of the estimator can be important. The mean squared error of an estimator

is the expected squared difference between and θ.

Figure 5 A biased estimator 1 that has smaller variance than the unbiased estimator 2

 Mean Squared Error of an Estimator

The mean squared error of an estimator of the parameter θ is defined as

MSE ( ) = E ( − θ) 2
MATH 403- ENGINEERING DATA ANALYSIS

The mean squared error can be rewritten as follows:

MSE ( ) = E [ −E ( )] 2 + [θ – E ( )] 2

= V ( ) + (bias) 2

That is, the mean squared error of is equal to the variance of the estimator

plus the squared bias. If is an unbiased estimator of θ, the mean squared error

of is equal to the variance of .

The mean squared error is an important criterion for comparing two

estimators. Let 1 and 2 be two estimators of the parameter θ, and let MSE ( 1)

and MSE ( 2) be the mean squared errors of 1 and 2. Then the relative

efficiency of 2 to 1 is defined as

MSE( 1 )
MSE( 2 )

If this relative efficiency is less than 1, we would conclude that 1 is a more efficient

estimator of θ than 2.

REFERENCES:

Walpole, Ronald E., et al., Probability and Statistics for Engineers and Scientists, 9th ed.,

Pearson Education Inc., 2016

Montgomery, Douglas C., et al., Applied Statistics and Probability for Engineers, 7th ed.,

John Wiley & Sons (Asia) Pte Ltd, 2018

Murray, Spiegel R., et al., Probability and Statistics, 4th ed., McGraw Hill Companies Inc.,

2013 https://www.probabilitycourse.com/chapter8/8_2_5_solved_probs.php
MATH 403- ENGINEERING DATA ANALYSIS

CHAPTER TEST

Solve the following problems completely.

1. A population consists of the four numbers 3, 7, 11, 15. Consider all possible

samples of size two that can be drawn with replacement from this population.

Find

(a) The population mean, (b) the population standard deviation, (c) the mean of

the sampling distribution of means, (d) the standard deviation of the sampling

distribution of means. Verify (c) and (d) directly from (a) and (b) by use of suitable

formulas.

2. The mean score of students on an aptitude test is 72 points with a standard

deviation of 8 points. What is the probability that two groups of students,

consisting of 28 and 36 students, respectively, will differ in their mean scores by

(a) 3 or more points, (b) 6 or more points, (c) between 2 and 5 points?

3. A normal population has a variance of 15. If samples of size 5 are drawn from

this population, what percentage can be expected to have variances (a) less

than 10, (b) more than 20, (c) between 5 and 10?

4. Measurements of a sample of weights were determined as 8.3, 10.6, 9.7, 8.8,

10.2, and 9.4 lb, respectively. Determine unbiased estimates of (a) the population

mean, and (b) the population variance


MATH 403- ENGINEERING DATA ANALYSIS

5. X has a continuous distribution:

Find the distribution of the sample mean of a random sample of size n=40?
1
𝑓(𝑥, 𝑦) = { 2 (2𝑥 + 3𝑦), 4 ≤ 𝑥 ≤ 6
0,

Figure 5. The distributions of X and in problem 5.


MATH 403- ENGINEERING DATA ANALYSIS

Chapter 7
STATISTICAL INTERVALS

Introduction

Engineers are often involved in estimating parameters. Statistical intervals

represent an uncertainty that exists in the data because we work with samples that are

obtained from a larger population or process. Statistical intervals are staples of the quality

and validation practitioner’s statistical tool box. Statistical intervals can manifest as plus-

or-minus limits on test data, represent a margin of error in a scientific poll, or indicate the

level of confidence associated with a predicted value. This chapter will discussed a three-

part series written to help validation and understand the three most common intervals;

namely, the confidence interval, the prediction interval, and the tolerance interval. In this

part, confidence intervals are discussed.

Intended Learning Outcomes

At the end of this module, it is expected that the students will be able to:

1. Construct confidence intervals using single sample and multiple sample

2. Construct a prediction for a future observation

3. Construct a tolerance interval for a normal distribution

4. Explain the three types of interval estimates; confidence intervals, prediction

intervals and tolerance intervals


MATH 403- ENGINEERING DATA ANALYSIS

7.1 Single Sample: Estimating the Mean

A way to avoid this is to report the estimate in terms of a range of plausible values

called a confidence interval. A confidence interval always specifies a confidence level,

usually 90%, 95%, or 99%, which is a measure of the reliability of the procedure. An

interval estimate for a population parameter is called a confidence interval. Information

about the precision of estimation is conveyed by the length of the interval. A short interval

implies precise estimation. We cannot be certain that the interval contains the true,

unknown population parameter—we use only a sample from the full population to

compute the point estimate and the interval. However, the confidence interval is

constructed so that we have high confidence that it does contain the unknown population

parameter. Confidence intervals are widely used in engineering and the sciences.

The basic ideas of a confidence interval (CI) are most easily understood by initially

considering a simple situation. Suppose that we have a normal population with unknown

mean μ and known variance σ2.This is a somewhat unrealistic scenario because typically

both the mean and variance are unknown. However, in subsequent sections, we present

confidence intervals for more general situations.

 Confidence Interval on the Mean of a Normal Distribution, Variance Known

If is the sample mean of a random sample of size n from a normal population

with known variance σ2, a 100(1 − α) % confidence interval on μ is given by


𝜎 𝜎
- 𝑍𝛼/2 ( )≤μ≤ + 𝑍𝛼/2 ( )
√𝑛 √𝑛

where 𝑍𝛼/2 is the upper 100α/2 percentage point of the standard normal distribution.
MATH 403- ENGINEERING DATA ANALYSIS

For small samples selected from non-normal populations, we cannot expect our

degree of confidence to be accurate. However, for samples of size n ≥ 30, with the shape

of the distributions not too skewed, sampling theory guarantees good results.

Example 1.ASTM Standard E23 defines standard test methods for notched bar impact

testing of metallic materials. The Charpy V-notch (CVN) technique measures impact

energy and is often used to determine whether or not a material experiences a ductile-to-

brittle transition with decreasing temperature. Ten measurements of impact energy (J) on

specimens of A238 steel cut at 60∘C are as follows: 64.1, 64.7, 64.5, 64.6, 64.5, 64.3,

64.6, 64.8, 64.2, and 64.3. Assume that impact energy is normally distributed with σ = 1

J. We want to find a 95% CI for μ, the mean impact energy. The required quantities are

𝑍𝛼/2 = 𝑍0.025 = 1.96, n = 10, σ = 1, and = 64.46.

Solution:

Using the equation above the resulting 95% CI is as follows:

𝜎 𝜎
- 𝑍𝛼/2 ( )≤μ≤ + 𝑍𝛼/2 ( )
√𝑛 √𝑛

1 1
64.46 -1.96( ) ≤ μ ≤ 64.46 + 1.96( )
√10 √10

63.84 ≤ μ ≤ 65.08

Based on the sample data, a range of highly plausible values for mean impact

energy for A238 steel at 60∘C is 63.84 J ≤ μ ≤ 65.08 J.


MATH 403- ENGINEERING DATA ANALYSIS

Practice Problem:

1. The average zinc concentration recovered from a sample of measurements taken

in 36 different locations in a river is found to be 2.6 grams per milliliter. Find the 95% and

99% confidence intervals for the mean zinc concentration in the river. Assume that the

population standard deviation is 0.3 gram per milliliter. Ans. 2.47 <μ< 2.73.

 Choice of Sample Size


𝜎
The precision of the confidence interval in the equation above is 2𝑍𝛼/2 ( ).
√𝑛

This means that in using to estimate μ, the error E = | − μ| is less than or equal
𝜎
to 𝑍𝛼/2 ( ) with confidence 100(1 − α). This is shown graphically in Figure 1.
√𝑛

Figure1. Error in Estimating μ with

In situations whose sample size can be controlled, we can choose n so that we are

100(1 − α) % confident that the error in estimating μ is less than a specified bound on
𝜎
the error E. The appropriate sample size is found by choosing n such that 𝑍𝛼 ( )=E
2 √𝑛

If is used as an estimate of μ, we can be 100(1 − α) % confident that the error |x–

μ| will not exceed a specified amount E when the sample size is

𝑍𝛼
𝜎
n =( 2
)2
𝐸
MATH 403- ENGINEERING DATA ANALYSIS

Example 1.Consider the CVN test described in Example1 and suppose that we want to

determine how many specimens must be tested to ensure that the 95% CI on μ for

A238 steel cut at 60°C has a length of at most 1.0 J. Because the bound on error in

estimation E is one-half of the length of the CI.

Solution:

E = 0.5, σ = 1, and 𝑍𝛼/2 = 1.96.

The required sample size is

𝑍𝛼
𝜎
n=( 2
)2
𝐸

(1.96)(1)
n=( )2 = 15. 37
0.5

and because n must be an integer, the required sample size is n = 16.

 One-Sided Confidence Bounds on Mean of a Normal Distribution, Variance

Known

The confidence interval in Equation 8.5 gives both a lower confidence bound and

an upper confidence bound for μ. Thus, it provides a two-sided CI. It is also possible

to obtain one-sided confidence bounds for μ by setting either the lower bound l= −∞

or the upper bound u = ∞ and replacing 𝑍𝛼/2 by 𝑍𝛼 .

A 100(1 − α) % upper-confidence bound for μ is

𝜎
+ 𝑍𝛼 ( )
√𝑛
MATH 403- ENGINEERING DATA ANALYSIS

and a 100(1 − α) % lower-confidence bound for μ is

𝜎
- 𝑍𝛼 ( )≤μ
√𝑛

Example 1.The same data for impact testing from Example 1 are used to construct a

lower, one-sided 95% confidence interval for the mean impact energy. Recall that x =

64.46, σ = 1J, and n = 10. What is the interval?

Solution:

𝜎
- 𝑍𝛼 ( )≤μ
√𝑛

1
64. 46 – 1.64 ( )≤μ
√10

63.94 ≤ μ

The lower limit for the two sided interval in Example1 was 63.84. Because 𝑍𝛼 <

𝑍𝛼/2, the lower limit of a one-sided interval is always greater than the lower limit of a two-

sided interval of equal confidence. The one-sided interval does not bound μ from above

so that it still achieves 95% confidence with a slightly larger lower limit. If our interest is

only in the lower limit for μ, then the one-sided interval is preferred because it provides

equal confidence with a greater limit. Similarly, a one-sided upper limit is always less than

a two-sided upper limit of equal confidence.


MATH 403- ENGINEERING DATA ANALYSIS

Practice Problem:

1. In a psychological testing experiment, 25 subjects are selected randomly and their

reaction time, in seconds, to a particular stimulus is measured. Past experience suggests

that the variance in reaction times to these types of stimuli is 4 sec2 and that the

distribution of reaction times is approximately normal. The average time for the subjects

is 6.2 seconds. Give an upper 95% bound for the mean reaction time. Ans: 6.858

seconds.

 Large-Sample Confidence Interval on the Mean

When n is large, the quantity

−μ
S
√𝑛

has an approximate standard normal distribution. Consequently,

𝑆 𝑆
- 𝑍𝛼/2 ( )≤μ≤ + 𝑍𝛼/2 ( )
√𝑛 √𝑛

is a large-sample confidence interval for μ, with confidence level of approximately

100(1 − α) %.

Example 1. An article in the 1993 volume of the Transactions of the American Fisheries

Society reports the results of a study to investigate the mercury contamination in

largemouth bass.

A sample of fish was selected from 53 Florida lakes, and mercury concentration in

the muscle tissue was measured (ppm). Find an approximate 95% CI on μ.


MATH 403- ENGINEERING DATA ANALYSIS

The mercury concentration values were

Solution:

The summary statistics for these data are as follows:

The required quantities are n = 53, x = 0.5250,s = 0.3486, and 𝑍0.025 = 1.96. The

approximate 95% CI on μ is

𝑆 𝑆
- 𝑍𝛼/2 ( )≤μ≤ + 𝑍𝛼/2 ( )
√𝑛 √𝑛

𝑆 𝑆
- 𝑍0.025 ( )≤μ≤ + 𝑍0.025 ( )
√𝑛 √𝑛

0.3486 0.3486
0.5250 - 1.96 ( ) ≤ μ ≤ 0.05250 + 1.96 ( )
√53 √53

0.4311 ≤ μ ≤0.6189
MATH 403- ENGINEERING DATA ANALYSIS

This interval is fairly wide because there is substantial variability in the mercury

concentration measurements. A larger sample size would have produced a shorter

interval.

7.2 Confidence Interval on the Mean of a Normal Distribution, Variance Unknown

If and S are the mean and standard deviation of a random sample from a normal

distribution with unknown variance σ2, a 100(1 − α) % confidence interval on μ is given

by

𝑆 𝑆
- 𝑡𝛼,𝑛−1 ( )≤μ≤ + 𝑡𝛼,𝑛−1 ( )
2 √𝑛 2 √𝑛

where 𝑡𝛼,𝑛−1 is the upper 100α/2 percentage point of the t distribution with n − 1 degrees
2

of freedom.

Example 1. An article in the Journal of Materials Engineering [“Instrumented Tensile

Adhesion Tests on Plasma Sprayed Thermal Barrier Coatings” (1989, Vol. 11(4), pp.

275–282)] describes the results of tensile adhesion tests on 22 U-700 alloy specimens.

Find the confidence interval (CI).

The load at specimen failure is as follows (in mega pascals):


MATH 403- ENGINEERING DATA ANALYSIS

Solution:

The sample mean is = 13.71, and the sample standard deviation is s = 3.55.

Figures 8.6 and 8.7 show a box plot and a normal probability plot of the tensile adhesion

test data, respectively. These displays provide good support for the assumption that the

population is normally distributed. We want to find a 95% CI on μ. Since n = 22, we have

n − 1 = 21 degrees of freedom for t, so 𝑡0.025,21 = 2.080. The resulting confidence interval

(CI) is

𝑆 𝑆
- 𝑡𝛼,𝑛−1 ( )≤μ≤ + 𝑡𝛼,𝑛−1 ( )
2 √𝑛 2 √𝑛

3.55 3.55
13.71 - 2.080 ( ) ≤ μ ≤ 13.71 + 2.080 ( )
√22 √22

13.71 -1.57 ≤ μ ≤ 13.71 +1.57

12.14 ≤ μ ≤ 15.28

The CI is fairly wide because there is a lot of variability in the tensile adhesion test

measurements. A larger sample size would have led to a shorter interval.

 t - distribution

Let X1, X2,…, Xn be a random sample from a normal distribution with

unknown mean μ and unknown variance μ2. The random variable

−μ
𝑇=
S
√𝑛

has a t distribution with n − 1 degrees of freedom.


MATH 403- ENGINEERING DATA ANALYSIS

7.3 Confidence Interval on the Variance and Standard Deviation of a Normal

Distribution

Sometimes confidence intervals on the population variance or standard deviation

are needed. When the population is modelled by a normal distribution, the tests and

intervals described in this section are applicable. The following result provides the basis

of constructing these confidence intervals.

 X2 Distribution

Let X1, X2,…, Xn be a random sample from a normal distribution with mean μ

and variance σ2, and let S2 be the sample variance.

Then the random variable

(n − 1)𝑆 2
𝑋2 =
σ2

has a chi-square (χ2) distribution with n − 1 degrees of freedom.

 Confidence Interval on the Variance

If s2 is the sample variance from a random sample of n observations from a

normal distribution with unknown variance σ2, then a 100(1 − α) % confidence interval

on σ2 is

(n−1)𝑆 2 (n−1)𝑆 2
( ) ≤ σ2 ≤ ( )
𝑋2𝛼 𝑋 2 1−𝛼
,𝑛−1 ,𝑛−1
2 2

where 𝑋 2 𝛼,𝑛−1 and 𝑋 2 1−𝛼,𝑛−1 are the upper and lower 100α/2 percentage points of the
2 2

chi-square distribution with n − 1 degrees of freedom, respectively. A confidence


MATH 403- ENGINEERING DATA ANALYSIS

interval for σ has lower and upper limits that are the square roots of the

corresponding limits in the above equation.

 One-Sided Confidence Bounds on the Variance

The 100(1 − α) % lower and upper confidence bounds on σ2 are

respectively

(n−1)𝑆 2 (n−1)𝑆 2
( ) ≤ σ2 and σ2 ≤ ( )
𝑋 2 𝛼,𝑛−1 𝑋 2 1−𝛼,𝑛−1

Example 1. An automatic filling machine is used to fill bottles with liquid detergent. A

random sample of 20 bottles results in a sample variance of fill volume of s2 = 0.01532

(fluid ounce). If the variance of fill volume is too large, an unacceptable proportion of

bottles will be under- or overfilled. We will assume that the fill volume is approximately

normally distributed. Find the confidence bound.

Solution:

A 95% upper confidence bound is found.

(n−1)𝑆 2
σ2 ≤ ( )
𝑋 2 1−𝛼,𝑛−1

(20−1)0.0153
σ2≤( ) = 0.0287(fluid ounce)2
𝑋 2 0.95,19

This last expression may be converted into a confidence interval on the standard

deviation σ by taking the square root of both sides, resulting in

σ =0.17

Therefore, at the 95% level of confidence, the data indicate that the process

standard deviation could be as large as 0.17 fluid ounce. The process engineer or
MATH 403- ENGINEERING DATA ANALYSIS

manager now needs to determine whether a standard deviation this large could lead to

an operational problem with under- or over-filled bottles.

7.4 Two Samples: Estimating the Difference between Two Means

If we have two populations with means μ1 and μ2 and variances σ21 and σ22 ,

respectively, a point estimator of the difference between μ1 and μ2 is given by the statistic

1 − 2. Therefore, to obtain a point estimate of μ1 − μ2, we shall select two independent

random samples, one from each population, of sizes n1 and n2, and compute 1 − 2, the

difference of the sample means. Clearly, we must consider the sampling distribution

of 1 − 2.

 Confidence Interval for Difference between Two Means, Variances Known

If 1 and 2 are means of independent random samples of sizes n1 and n2

from populations with known variances σ21 andσ22 , respectively, a 100(l − α) %

confidence interval for μ1 -μ2 is given by

σ2 σ22 σ2 σ22
( 1 − 2 )- 𝑍𝛼 ( √ 𝑛 1 + ) < 𝜇1 -𝜇2 < ( 1 − 2) + 𝑍𝛼 ( √ 𝑛 1 + )
2 1 𝑛2 2 1 𝑛2

where 𝑍𝛼 is the z-value leaving an area of α/2 to the right.


2
MATH 403- ENGINEERING DATA ANALYSIS

7.5 Large-Sample Confidence Interval for a Population Proportion

It is often necessary to construct confidence intervals on a population proportion.

 Normal Approximation for a Binomial Proportion

If n is large, the distribution of

𝑋−𝑛𝑝 −𝑝
𝑍= =
√𝑛𝑝−(1−𝑝) √𝑛𝑝−(1−𝑝)

is approximately standard normal.

 Approximate Confidence Interval on a Binomial Proportion

If is the proportion of observations in a random sample of size n that

belongs to a class of interest, an approximate 100(1 − α) % confidence interval on

the proportion p of the population that belongs to this class is

(1− ) (1− )
- 𝑍𝛼 (√ )≤ p ≤ + 𝑍𝛼 (√ )
2 𝑛 2 𝑛

where 𝑍𝛼 is the upper α/2 percentage point of the standard normal distribution.
2

 Sample Size for a Specified Error on a Binomial Proportion

In situations when the sample size can be selected, we may choose n to be

100(1− α) % confident that the error is less than some specified value E. If we set

𝑝(1 − 𝑝)
E = 𝑍𝛼 √ and solve for n, the appropriate sample size is
2 𝑛

𝑍𝛼
𝜎
n=(
2
)2 p (1-p)
𝐸
MATH 403- ENGINEERING DATA ANALYSIS

 Approximate One-Sided Confidence Bounds on a Binomial Proportion

The approximate 100(1 − α) % lower and upper confidence bounds are

respectively.

(1− ) (1− )
- 𝑍𝛼 (√ )≤ p and + 𝑍𝛼 (√ )
𝑛 𝑛

7.6 Prediction Interval for Future Observation

In some problem situations, we may be interested in predicting a future observation

of a variable. This is a different problem than estimating the mean of that variable, so a

confidence interval is not appropriate. In this section, we show how to obtain a 100(1 − α)

% prediction interval on a future value of a normal random variable.

A prediction interval provides bounds on one (or more) future observations from

the population. For example, a prediction interval could be used to bound a single, new

measurement of viscosity—another useful interval.

For a normal distribution of measurements with unknown mean μ and known

variance σ2, a 100(1 − α) % prediction interval of a future observation 𝑋0 is

1 1
- 𝑍𝛼 𝜎 ( √1 + ) < 𝑋0 ≤ + 𝑍𝛼 𝜎 ( √1 + )
2 𝑛 2 𝑛

where 𝑍𝛼 is the z-value leaving an area of α/2 to the right.


2

For a normal distribution of measurements with unknown mean μ and unknown

variance σ2, a 100(1 − α) % prediction interval of a future observation 𝑋0 is


MATH 403- ENGINEERING DATA ANALYSIS

1 1
- 𝑡𝛼 𝑆 ( √1 + ) < 𝑋0 ≤ + 𝑡𝛼 𝑆 (√1 + )
2 𝑛 2 𝑛

where 𝑡𝛼 is the t-value with v = n − 1 degrees of freedom, leaving an area of α/2 to the
2

right.

Example 1. A meat inspector has randomly selected 30 packs of 95% lean beef. The

sample resulted in a mean of 96.2% with a sample standard deviation of 0.8%. Find a

99% prediction interval for the leanness of a new pack. Assume normality.

Solution:

1 1
- 𝑡𝛼 𝑆 ( √1 + ) < 𝑋0 ≤ + 𝑡𝛼 𝑆 (√1 + )
2 𝑛 2 𝑛

1 1
96.2 - (2.756)(0.8) √1 + ) < 𝑋0 ≤ 96.2 + ( 2.756) (0.8) √1 + )
30 30

Notice that the prediction interval is considerably longer than the CI. This is because the

CI is an estimate of a parameter, but the PI is an interval estimate of a single future

observation.

Practice Problem:

Due to the decrease in interest rates, the First Citizens Bank received a lot of

mortgage applications. A recent sample of 50 mortgage loans resulted in an average loan

amount of $257,300. Assume a population standard deviation of $25,000. For the next

customer who fills out a mortgage application, find a 95% prediction interval for the loan

amount.
MATH 403- ENGINEERING DATA ANALYSIS

7.7 Tolerance Interval

A tolerance interval is another important type of interval estimate. For example,

the chemical product viscosity data might be assumed to be normally distributed. We

might like to calculate limits that bound 95% of the viscosity values.

A tolerance interval for capturing at least γ% of the values in a normal distribution

with confidence level 100(1 − α) % is

-ks , + ks or ± ks

where k is a tolerance interval factor found in Table I. Values are given for

γ = 90%, 95%, and 99%, and for 90%, 95%, and 99% confidence.

This interval is very sensitive to the normality assumption. One-sided tolerance

bounds can also be computed. The tolerance factors for these bounds are also given in

Table I.
MATH 403- ENGINEERING DATA ANALYSIS

Table1. Tolerance Factors for Normal Distributions


MATH 403- ENGINEERING DATA ANALYSIS

Example 1. Consider Example 7. With the information given, find a tolerance interval that

gives two-sided 95% bounds on 90% of the distribution of packages of 95% lean

beef. Assume the data came from an approximately normal distribution.

Recall from Example 7 that n = 30, the sample mean is 96.2%, and the sample

standard deviation is 0.8%. From Table I., k = 2.14. Using

Solution:

± ks

96.2 ± (2.14 )(0.8)

we find that the lower and upper bounds are 94.5 and 97.9. We are 95% confident that

the above range covers the central 90% of the distribution of 95% lean beef packages.

REFERRENCES:

Walpole, Ronald E., et al., Probability and Statistics for Engineers and Scientists,9th ed., Pearson
Education Inc., 2016

Montgomery, Douglas C., et al., Applied Statistics and Probability for Engineers, 7th ed., John
Wiley & Sons (Asia) Pte Ltd, 2018

Murray, Spiegel R., et al., Probability and Statistics, 4th ed., McGraw Hill Companies Inc., 2013

You might also like