You are on page 1of 6

Chapter 9: Parameters, Statistics, Estimates, Hypothesis Testing, Inference

Definitions

 Population: set of all subjects or elements about which we are interested in making
inferences.
 Frame: a list containing all members of the population.
 Census: a survey that includes all the elements or units in the frame.
 Population Parameters: facts about the population. Since parameters are descriptions
of the population, a population can have many parameters.
 Sample: a subset of the population which is used to gain insight about the population.
Samples are used to represent a larger group, the population.
 Statistic: a fact or characteristic about a sample.
 Descriptive Statistics: the collection, organization, analysis, and presentation of data.
 Inferential Statistics: make reasonable estimates about population characteristics using
sample data

 Population (parameters, estimators (point or interval))


 Sample (statistics)
 Central tendency (mean, median, mode)
 Dispersion or Spread (e.g., range, mean, absolute deviation, variance, standard
deviation
 Probability Theory:
 Normal Distribution + Random Samples
 Equal Probability of Selection Method
 Inference

Normal Distribution

Properties of the Normal Distribution


1. The normal distribution is symmetric. That is, the curve’s shape to the left of the mean is
the mirror image of the curve’s shape to the right of the mean.
2. The highest point on the normal curve is located at the mean, which is also the location
of the median and the mode of the distribution.
3. The area under the curve of the normal distribution equals 1.
4. Due to symmetry, the area to the right of the mean equals the area to the left of the
mean, and each of these areas equals 0.5.
5. The shape of the normal distribution is defined by its two parameters, the mean (μ) and
the standard deviation (σ).

Population Mean
Confidence Interval of the Population Mean
 A 100(1 – α)% confidence interval for the population mean is given by:
x ± za 2
n
if either of the following conditions are true:
1. n ≥ 30 (uses as an approximation for σ), or
2. if σ is known and the population being studied is normally distributed.

Example: Construct 80%, 90%, 95%, and 99% confidence intervals for the population mean if
the standard deviation of the population is 900. Use the following sample data.
n =100
x =425

Solution:
900
4 2 5 ± 1 .2 8 × o r 3 1 0 to 5 4 0
80% Confidence Interval: 100
900
4 2 5 ± 1 .6 4 5 × o r 2 7 7 to 5 7 3
90% Confidence Interval: 100
900
4 2 5 ± 1 .9 6 × o r 2 4 9 to 6 0 1
95% Confidence Interval: 100
900
4 2 5 ± 2 .5 7 5 × o r 1 9 3 to 6 5 7
99% Confidence Interval: 100

Degrees of Freedom
The degrees of freedom for any t-distribution are computed in the following manner:
df = number of sample observations – 1 = n – 1

Confidence Interval for the Population Mean: Small Samples, σ Unknown


If σ is unknown and the sample is drawn from a normal population, a 100(1 – α)% confidence
s
x ±t 2 ,d f ,
interval for the population mean is given by n

A manufacturing company is interested in the amount of time it takes to complete a certain stage
of the production process. The project manager randomly samples 10 products as they come
from the production line and notes the time of completion. The average completion time is 23.45
minutes with a sample standard deviation of 4.32 minutes. Based on this sample, construct a 95%
confidence interval for the average completion time for that stage in the production process.
Assume that the population distribution of the completion times is approximately normal.
s
x ± t 2 ,d f
n
4 .3 2
2 3 .4 5 ± 2 .2 6 2
10
Thus, we are 95% confident that the true average completion time of that stage of the process is
between 20.36 minutes and 26.54 minutes.
Chapter 10

Example 10.5: Testing a Hypothesis about a Population Mean


Suppose that the average amount of money a student spends on textbooks per semester at college
campuses in the U.S. is $500 with a standard deviation of $100. A local university wants to know
if its students are spending that amount on textbooks. The level of the test is to be set at 0.05. a
random sample of 75 college students has been selected and the resulting average is $540 spend
per semester on textbooks.

Step 1: Define the hypothesis in plain English.


The hypotheses are fairly straightforward.
 Null Hypothesis: The local university’s students are spending an average of $500 per
semester on textbooks. (like the status que)
 Alternative Hypothesis: The students are not spending an average of $500 per semester
on textbooks. (the default)
Step 2: Select the appropriate statistical measure. Since the problem states that the university
wants to know how much its students spend on textbooks per semester in relation to the national
average, the hypotheses will concern the population mean. Let μ = average amount spend on
textbooks per semester by college students at the local university.

Step 3: Determine whether the hypothesis should be one-sided or two-sided

Step 4: State the hypotheses using the appropriate statistical measure.


Ho: μ = $500
Ha: μ = $500
 The null hypothesis states that the average amount spend per semester on textbooks at
the local university is equal to the national average (the standard value). The alternative
states that it is not. This is an application of testing against the standard value.

Step 5: Specify the level of the test.


 The level of the test was given in the problem statement as the α = 0.05 level. If it were
not specified in the problem statement, then it would be necessary to select the value.
Typical values for the level of the test are 0.10, 0.05, and 0.01. It is important to
remember that the level of the test specifies the probability of a Type I error.

Step 6: Select the appropriate test statistic.


 What we know about the population will affect the methods used to perform the test of
the hypothesis. There are three key questions:
1. Is the standard deviation of the population known?
2. Is the distribution of the population normal?
3. Is the sample size sufficiently large?

Is the standard deviation of the population known?


In most instances, the standard deviation of the population will not be known.
Is the distribution of the population normal?
If we have a large sample size (n ≥ 30), we will not have to be concerned about the distribution
of the population. If the sample size is small (n < 30), then the underlying population will need to
be normally distributed. When the population is normal, the histogram of the sample data should
be bell-shaped.

However, should the frequency distribution show substantial deviation from normality, a
nonparametric procedure may be warranted. These methods are discussed in Chapter 16.
Is the sample size sufficiently large?
Fortunately, the central limit theorem enables us to define the following rule to make this
decision.
We will continue with steps 7 through 10 after we discuss the test statistic.

Formula
z-Test Statistic
If n ≥ 30 or if the standard deviation of the population, σ, is known and the sample is drawn from
a normal population, then by the central limit theorem the test statistic is approximately given by
x -
z = 0
, w h e re x = .
x n
 z has a standard normal distribution, and if σ is unknown and n is greater than or equal
to 30, s can be used as an approximation of σ.

Step 7: Determine the critical value.


 In determining the critical value of the test statistic, we must take into account whether
the alternative hypothesis is one-sided or two-sided, the level of the test, and the
distribution of the test statistic.

Table 10.1 – Critical Values of the z-Test Statistic for Two-Sided Alternatives

Level of the Test Definition of Ordinary Variability zα/2

0.20 80% interval around hypothesized mean 1.28


0.10 90% interval around hypothesized mean 1.645
0.05 95% interval around hypothesized mean 1.96
0.01 99% interval around hypothesized mean 2.575
 Notice that the rejection region is divided into two parts. The right-hand rejection
region indicates above average amounts of money spent on textbooks per semester
and, conversely, the left-hand rejection region indicates below average amounts of
money spent on textbooks per semester. Also observe that the probability associated
with the level of the test is divided equally between the two rejection regions, each
region receiving 0.025. If the probabilities in these two regions are added together,
0.025 + 0.025 = 0.05, the sum equals the level of the test (α).
 The level of the test can be thought of as a tolerance for rareness. The level of the test
defines a rejection region which can be thought of as an intolerance zone. If the test
statistic falls in this “intolerance” region, then the data have produced a sample mean
that has in turn produced a test statistic that is too rare to have occurred by chance if
the null were true.
 Essentially, we lose faith in the null hypothesis; the data have cast too much doubt.
Consequently, the null hypothesis is rejected.
 The “fail to reject” zone can be interpreted as a zone of ordinary sampling variation
given the null is true. For two-tailed tests with the level of the test set to
α = 0.05, the investigator is stating that any value of the test statistic which falls in a 95%
interval around the hypothesized mean represents ordinary sampling variability. For the
z-test statistic, this corresponds to a critical value of 1.96 standard deviation units (see
Table 10.1). If the test statistic falls into the ordinary sampling variation zone, the null
hypothesis will not be rejected.

Step 8: Compute the test statistic.


A random sample of 75 students revealed a mean of $540 per semester spent on textbooks at the
x - 0 = 540 - 500
z = 100
local university. As discussed earlier, the z-statistic is given by x 7 5 » 3 .4 6 .

 The sample mean is 3.46 standard deviations from the hypothesized value. Is the sample
mean too far away from the value of the mean specified in the null hypothesis for us to
believe that the null is true?

Step 9: Make the decision.


 The critical values of the test statistic are ±1.96. If the null were true, observing a value of
z larger in absolute value than 1.96 would occur only 5% of the time.

 The test statistic z = 3.46 implies x̄ is 3.46 standard deviation units from the mean which
is substantially more than 1.96 standard deviations from the hypothesized value. The
decision must be to reject H0. The sample mean is too far from the hypothesized value
for us to believe the difference is caused by ordinary sampling variation. Essentially, x̄
exceeds the tolerance for rareness that we have imposed by setting α= 0.05.

Step 10: State the conclusion in terms of the original question. There is significant evidence at
the 0.05 level that the students at the local university do not spend, on average, $500 per
semester on textbooks. It would be tempting to conclude that the students at the local university
spend more since the sample mean is greater than the national average. However, we did not test
a hypothesis for spending more; we tested whether the students at the local university were
spending more or less than the national average, and the conclusion must be consistent with the
hypothesis tested.

You might also like