You are on page 1of 9

MATH 105 ENGINEERING DATA ANALYSIS FOR CE

Chapter 6

FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTION


In this chapter, we focus on sampling from distributions or populations and study
such important quantities as the sample mean and sample variance, which will be of
vital importance in future chapters. In addition, we attempt to give the reader an
introduction to the role that the sample mean and variance will play in statistical
inference in later chapters. The use of modern high-speed computers allows the scientist
or engineer to greatly enhance his or her use of formal statistical inference with graphical
techniques. Much of the time, formal inference appears quite dry and perhaps even
abstract to the practitioner or to the manager who wishes to let statistical analysis be a
guide to decision-making.

Objectives:
After careful study of this chapter, you should be able to do the following:
1. Differentiate sample and population.
2. Define the concept of random sample.
3. Understand the concept of taking inference using sampling distributions.
4. Understand sampling distribution of means and the central limit theorem

6.1 RANDOM SAMPLING


The outcome of a statistical experiment may be recorded either as a numerical
value or as a descriptive representation. When a pair of dice is tossed and the total is the
outcome of interest, we record a numerical value. However, if the students of a certain
school are given blood tests and the type of blood is of interest, then a descriptive
representation might be more useful. A person’s blood can be classified in 8 ways: AB, A,
B, or O, each with a plus or minus sign, depending on the presence or absence of the Rh
antigen.
Population and Samples
The totality of observations with which we are concerned, whether their number
be finite or infinite, constitutes what we call a population. There was a time when the
word population referred to observations obtained from statistical studies about people.
Today, statisticians use the term to refer to observations relevant to anything of interest,

ENGR. JOSHUA C. JUNIO 1


MATH 105 ENGINEERING DATA ANALYSIS FOR CE

whether it be groups of people, animals, or all possible outcomes from some complicated
biological or engineering system.

Definition 6.1 Population


A population consists of the totality of the observations with which we are
concerned.

In the field of statistical inference, statisticians are interested in arriving at


conclusions concerning a population when it is impossible or impractical to observe the
entire set of observations that make up the population. For example, in attempting to
determine the average length of life of a certain brand of light bulb, it would be
impossible to test all such bulbs if we are to have any left to sell. Exorbitant costs can also
be a prohibitive factor in studying an entire population. Therefore, we must depend on a
subset of observations from the population to help us make inferences concerning that
same population. This brings us to consider the notion of sampling.

Definition 6.2 Sample


A sample is a subset of a population.

If our inferences from the sample to the population are to be valid, we must obtain
samples that are representative of the population. All too often we are tempted to
choose a sample by selecting the most convenient members of the population. Such a
procedure may lead to erroneous inferences concerning the population. Any sampling
procedure that produces inferences that consistently overestimate or consistently
underestimate some characteristic of the population is said to be biased. To eliminate
any possibility of bias in the sampling procedure, it is desirable to choose a random
sample in the sense that the observations are made independently and at random.
Definition 6.3 Random Sample
Let 𝑋1, 𝑋2, . . . , 𝑋𝑛 be 𝑛 independent random variables, each having the same probability
distribution 𝑓(𝑥). Define 𝑋1, 𝑋2, . . . , 𝑋𝑛 to be a random sample of size 𝑛 from the population
𝑓(𝑥) and write its joint probability distribution as
𝒇(𝒙𝟏, 𝒙𝟐, . . . , 𝒙𝒏) = 𝒇(𝒙𝟏)𝒇(𝒙𝟐) · · · 𝒇(𝒙𝒏).

ENGR. JOSHUA C. JUNIO 2


MATH 105 ENGINEERING DATA ANALYSIS FOR CE

6.2 SOME IMPORTANT STATISTICS


Our main purpose in selecting random samples is to elicit information about the
unknown population parameters. Suppose, for example, that we wish to arrive at a
conclusion concerning the proportion of coffee-drinkers in the United States who prefer
a certain brand of coffee. It would be impossible to question every coffee drinking
American in order to compute the value of the parameter 𝑝 representing the population
proportion. Instead, a large random sample is selected and the proportion 𝑝̂ of people
in this sample favoring the brand of coffee in question is calculated. The value 𝑝̂ is now
used to make an inference concerning the true proportion 𝑝.
Now, 𝑝̂ is a function of the observed values in the random sample; since many
random samples are possible from the same population, we would expect 𝑝̂ to vary
somewhat from sample to sample. That is, 𝑝̂ is a value of a random variable that
we represent by P. Such a random variable is called a statistic.

The Sample Mean, median, and Mode


Let 𝑋1, 𝑋2, . . . , 𝑋𝑛 represent 𝑛 random variables,
a. Sample Mean

b. Sample Median

c. Sample Mode
- the value of the sample that occurs most often.

The Sample Variance, Standard Deviation, and Range


Let 𝑋1, 𝑋2, . . . , 𝑋𝑛 represent 𝑛 random variables,
a. Sample Variance

ENGR. JOSHUA C. JUNIO 3


MATH 105 ENGINEERING DATA ANALYSIS FOR CE

b. Sample Standard Deviation

c. Sample Range

Example 6.1
1. Suppose a data set consists of the following observations:

Solve for sample mean, sample median and sample mode.


2. A comparison of coffee prices at 4 randomly selected grocery stores in San Diego
showed increases from the previous month of 12, 15, 17, and 20 cents for a 1 pound
bag. Find the variance of this random sample of price increases.
3. Find the variance of the data 3, 4, 5, 6, 6, and 7, representing the number of trout
caught by a random sample of 6 fishermen on June 19, 1996, at Lake Muskoka.

6.2 SAMPLING DISTRIBUTIONS


The field of statistical inference is basically concerned with generalizations and
predictions.
For example, we might claim, based on the opinions of several people interviewed
on the street, that in a forthcoming election 60% of the eligible voters in the city of Manila
favor a certain candidate. In this case, we are dealing with a random sample of opinions
from a very large finite population.
As a second illustration we might state that the average cost to build a residence
in Dagupan, Philippines, is between 1 million and 3 million pesos, based on the estimates
of 3 contractors selected at random from the 30 now building in this city. The population
being sampled here is again finite but very small.

ENGR. JOSHUA C. JUNIO 4


MATH 105 ENGINEERING DATA ANALYSIS FOR CE

Finally, let us consider a soft-drink machine designed to dispense, on average, 240


milliliters per drink. A company official who computes the mean of 40 drinks obtains 𝑥̅ =
236 milliliters and, on the basis of this value, decides that the machine is still dispensing
drinks with an average content of 𝜇 = 240 milliliters. The 40 drinks represent a sample from
the infinite population of possible drinks that will be dispensed by this machine.
Inference about the Population from Sample Information
In each of the examples above, we computed a statistic from a sample selected
from the population, and from this statistic we made various statements concerning the
values of population parameters that may or may not be true. The company official
made the decision that the soft-drink machine dispenses drinks with an average content
of 240 milliliters, even though the sample mean was 236 milliliters, because he knows from
sampling theory that, if 𝜇 = 240 milliliters, such a sample value could easily occur. In fact,
if he ran similar tests, say every hour, he would expect the values of the statistic 𝑥̅ to
fluctuate above and below 𝜇 = 240 milliliters. Only when the value of 𝑥̅ is substantially
different from 240 milliliters will the company official initiate action to adjust the machine.
Since a statistic is a random variable that depends only on the observed sample,
it must have a probability distribution.

Definition 6.4 Sampling Distribution


The probability distribution of a statistic is called a sampling distribution.

The sampling distribution of a statistic depends on the distribution of the


population, the size of the samples, and the method of choosing the samples. In the
remainder of this chapter, we study several of the important sampling distributions of
frequently used statistics. Applications of these sampling distributions to problems of
statistical inference are considered throughout most of the remaining chapters. The
probability distribution of 𝑋̅ is called the sampling distribution of the mean.

What is the Sampling Distribution of 𝑿


̅?

We should view the sampling distributions of 𝑋̅ and 𝑆 2 as the mechanisms from


which we will be able to make inferences on the parameters 𝜇 and 𝜎 2 .

ENGR. JOSHUA C. JUNIO 5


MATH 105 ENGINEERING DATA ANALYSIS FOR CE

The sampling distribution of 𝑋̅ with sample size 𝑛 is the distribution that results when
an experiment is conducted over and over (always with sample size 𝑛) and the many
values of 𝑿
̅ result. This sampling distribution, then, describes the variability of sample

averages around the population mean 𝜇.


In the case of the soft-drink machine, knowledge of the sampling distribution of 𝑋̅
arms the analyst with the knowledge of a “typical” discrepancy between an observed
𝑥̅ value and true 𝜇. The same principle applies in the case of the distribution of 𝑆 2 . The
sampling distribution produces information about the variability of 𝑠 2 values around 𝜎 2 in
repeated experiments.

6.4 SAMPLING DISTRIBUTION OF MEANS AND THE CENTRAL LIMIT THEOREM


The first important sampling distribution to be considered is that of the mean 𝑋̅ .
Suppose that a random sample of 𝑛 observations is taken from a normal population with
mean 𝜇 and variance 𝜎 2 . Each observation 𝑋𝑖, 𝑖 = 1, 2, . . . , 𝑛, of the random sample will
then have the same normal distribution as the population being sampled. Hence,

has a normal distribution with mean of,

And variance of,

The Central Limit Theorem

Theorem 8.1 Central Limit Theorem


If 𝑋̅ is the mean of a random sample of size n taken from a population with
mean μ and finite variance 𝜎 2 , then the limiting form of the distribution of

As 𝑛 → ∞, is the standard normal distribution 𝑛(𝑧; 0,1).

ENGR. JOSHUA C. JUNIO 6


MATH 105 ENGINEERING DATA ANALYSIS FOR CE

The normal approximation for 𝑋̅ will generally be good if 𝑛 ≥ 30, provided the
population distribution is not terribly skewed. If 𝑛 < 30, the approximation is good only if
the population is not too different from a normal distribution and, as stated above, if the
population is known to be normal, the sampling distribution of 𝑋̅ will follow a normal
distribution exactly, no matter how small the size of the samples. The sample size 𝑛 = 30 is
a guideline to use for the Central Limit Theorem.

Figure 6.1 Illustration of the Central Limit Theorem (distribution of 𝑋̅ for 𝑛 = 1, moderate
𝑛, and large 𝑛)

Example 6.2
An electrical firm manufactures light bulbs that have a length of life that is approximately
normally distributed, with mean equal to 800 hours and a standard deviation of 40 hours.
Find the probability that a random sample of 16 bulbs will have an average life of less
than 775 hours.

Inferences on the Population Mean


One very important application of the Central Limit Theorem is the determination
of reasonable values of the population mean 𝜇. Topics such as hypothesis testing,
estimation, quality control, and many others make use of the Central Limit Theorem.

ENGR. JOSHUA C. JUNIO 7


MATH 105 ENGINEERING DATA ANALYSIS FOR CE

Case Study 6.1


Automobile Parts: An important manufacturing process produces cylindrical component
parts for the automotive industry. It is important that the process produce parts having a
mean diameter of 5.0 millimeters. The engineer involved conjectures that the population
mean is 5.0 millimeters. An experiment is conducted in which 100 parts produced by the
process are selected randomly and the diameter measured on each. It is known that the
population standard deviation is 𝜎 = 0.1 millimeter. The experiment indicates a sample
average diameter of 𝑥̅ = 5.027 millimeters. Does this sample information appear to
support or refute the engineer’s conjecture?

Example 6.3
Traveling between two campuses of a university in a city via shuttle bus takes, on
average, 28 minutes with a standard deviation of 5 minutes. In a given week, a bus
transported passengers 40 times. What is the probability that the average transport time
was more than 30 minutes? Assume the mean time is measured to the nearest minute.

Sampling Distribution of the Difference between Two Means


The illustration in Case Study 8.1 deals with notions of statistical inference on a
single mean 𝜇. The engineer was interested in supporting a conjecture regarding a single
population mean. A far more important application involves two populations. A scientist
or engineer may be interested in a comparative experiment in which two manufacturing
methods, 1 and 2, are to be compared. The basis for that comparison is 𝜇1 − 𝜇2, the
difference in the population means.
The Central Limit Theorem can be easily extended to the two-sample, two-
population case.

ENGR. JOSHUA C. JUNIO 8


MATH 105 ENGINEERING DATA ANALYSIS FOR CE

Theorem 8.2
If independent samples of size 𝑛1 and 𝑛2 are drawn at random from two populations,
discrete or continuous, with means 𝜇1 and 𝜇2 and variances 𝜎12 and 𝜎22 , respectively, then
the sampling distribution of the differences of means, 𝑋̅ 1 − 𝑋̅ 2 , is approximately normally
distributed with mean and variance given by

Hence,

is approximately a standard normal variable.

Case Study 6.2


Paint Drying Time: Two independent experiments are run in which two different types of
paint are compared. Eighteen specimens are painted using type A, and the drying time,
in hours, is recorded for each. The same is done with type B. The population standard
deviations are both known to be 1.0. Assuming that the mean drying time is equal for the
two types of paint, find 𝑃(𝑋̅ 𝐴 − 𝑋̅ 𝐵 > 1.0), where 𝑋̅ 𝐴 and 𝑋̅ 𝐵 are average drying times for
samples of size 𝑛𝐴 = 𝑛𝐵 = 18.

Example 6.4
The television picture tubes of manufacturer A have a mean lifetime of 6.5 years and a
standard deviation of 0.9 year, while those of manufacturer B have a mean lifetime of
6.0 years and a standard deviation of 0.8 year. What is the probability that a random
sample of 36 tubes from manufacturer A will have a mean lifetime that is at least 1 year
more than the mean lifetime of a sample of 49 tubes from manufacturer B?

ENGR. JOSHUA C. JUNIO 9

You might also like