You are on page 1of 11

BUSINESS STATISTICS o Transactions occurring at a

bank
CHAPTER 7 o Telephone calls arriving at a
technical help desk
o Customers entering a store
• The sample results provide only
estimates of the values of the • In the case of an infinite population, we
population characteristics. must select a random sample in order
• The reason is simply that the sample to make valid statistical inferences
contains only a portion of the about the population from which the
population. sample is taken.
• With proper sampling methods, the • A random sample from an infinite
sample results can provide “good” population is a sample selected such
estimates of the population that the following conditions are
characteristics. satisfied.
1. Each element selected comes
from the population of interest.
Sampling from a Finite Population 2. Each element is selected
independently.
• Finite populations are often defined by
lists such as:
o Organization membership Point Estimation
roster
• Point estimation is a form of statistical
o Credit card account numbers
inference.
o Inventory product numbers
• In point estimation we use the data
from the sample to compute a value of
• A simple random sample of size n from
a sample statistic that serves as an
a finite population of size N is a sample
estimate of a population parameter.
selected such that each possible sample
of size n has the same probability of
o ̅ is the point estimator of the
𝒙
being selected.
population mean µ.
o s is the point estimator of the
• Sampling with replacement = Replacing
population sd .
each sampled element before selecting
o ̅ is the point estimator of the
𝒑
subsequent elements
population proportion p.
• Sampling without replacement =
procedure used most often.
• To estimate the value of the population
parameter, we can compute the
• In large sampling projects, computer-
corresponding characteristic of the
generated random numbers are often
sample, referred to as sample statistic.
used to automate the sample selection
• Note: Different random numbers will
process.
identify different sample which would
result different point estimates.

Sampling from an Infinite Population

• Sometimes we want to select a sample,


but obtaining a list of all elements in the
population is impossible.

• Populations are often generated by an


ongoing process where there is no
upper limit on the number of units that
can be generated.

• Some examples of ongoing processes,


with infinite populations, are:
o Parts being manufactured on a • The target population is the population
production line we want to make inferences about.
• The sampled population is the • In most applications, the sampling
population from which the sample is distribution of 𝑥̅ can be approximated
actually taken. by a normal distribution whenever the
sample is size 30 or more.

• Whenever a sample is used to make


inferences about a population, we • In cases where the population is highly
should make sure that the targeted skewed or outliers are present, samples
population and the sampled population of size 50 may be needed.
are in close agreement.

• The sampling distribution of 𝑥̅ can be


Sampling Distribution of 𝑥̅
used to provide probability information
about how close the sample mean 𝑥̅ is
to the population mean m .

Central Limit Theorem

• When the population from which we


are selecting a random sample does not
have a normal distribution, the central
limit theorem is helpful in identifying
• The sampling distribution of 𝑥̅ is the the shape of the sampling distribution
probability distribution of all possible of 𝑥̅ .
values of the sample mean 𝑥̅ .
• In selecting random samples of size n
• Expected Value of 𝑥̅ from a population, the sampling
distribution of the sample mean 𝑥̅ can
be approximated by a normal
E(𝑥̅ ) = m distribution as the sample size becomes
large.
where: m = the population mean

• When the expected value of the point Use of Excel to Compute Sampling Distribution
estimator equals the population of 𝑥̅
parameter, we say the point estimator
Function used - NORM.DIST
is unbiased.
• We do not have to make separate
computation of z value.
• Evaluating NORM.DIST function at each
end point of the interval provides
cumulative probability at the specified
end point of the interval.
• The result obtained using NORM.DIST is
̅
Standard Deviation of 𝒙 more accurate.

Sampling Distribution of 𝑝̅

• When the population has a normal


distribution, the sampling distribution
of 𝑥̅ is normally distributed for any
sample size.
• The sampling distribution of 𝑝̅ is the Cluster Sampling
probability distribution of all possible
• The population is first divided into
values of the sample proportion 𝑝̅ .
separate groups of elements called
clusters.
• Expected Value of 𝑝̅
• Ideally, each cluster is a representative
small-scale version of the population
E(𝑝̅ ) = 𝑝 (i.e. heterogeneous group).
• A simple random sample of the clusters
where: p = the population proportion is then taken.
• All elements within each sampled
(chosen) cluster form the sample.

o Advantage: The close proximity


of elements can be cost-
effective (i.e. many sample
observations can be obtained in
a short time)
o Disadvantage: This method
▪ The sampling distribution of 𝑝̅ can be
generally requires a larger total
approximated by a normal distribution
sample size than simple or
whenever the sample size is large
stratified random sampling.
enough to satisfy the two conditions:
o Example: A primary application
np > 5 and n(1 – p) > 5 is area sampling, where clusters
are city blocks or other well-
▪ When these conditions are satisfied, the defined areas.
probability distribution of x in the
sample proportion, 𝑝̅ = x/n, can be
approximated by a normal distribution
Systematic Sampling
(because n is a constant, the sampling
distribution of 𝑝̅ can also be • This method has the properties of a
approximated by a normal distribution). simple random sample, especially if the
list of the population elements is a
random ordering.
Stratified Random Sampling
• If a sample size of n is desired from a
• The population is first divided into
population containing N elements, we
groups of elements called strata.
might sample one element for every
• Each element in the population belongs n/N element in the population.
to one and only one stratum.
• Best results are obtained when the • We randomly select one of the first n/N
elements within each stratum are as elements from the population list.
much alike as possible (i.e. a
• We then select every n/Nth element
homogeneous group).
that follows in the population list.
• A simple random sample is taken from o Advantage: The sample usually
each stratum. will be easier to identify than it
• Formulas are available for combining would be if simple random
the stratum sample results into one sampling were used.
population parameter estimate. o Example: Selecting every 100th
o Advantage: If strata are listing in a telephone book after
homogeneous, this method is the first randomly selected
as “precise” as simple random listing
sampling but with a smaller
total sample size.
o Example: The basis for forming
Convenience Sampling
the strata might be
department, • It is a nonprobability sampling
location, age, industry type, and technique. Items are included in the
so on. sample without known probabilities of
being selected.
• The sample is identified primarily by than random sampling are referred to
convenience. as nonsampling errors.
o Example: A professor • Nonsampling error can occur in a
conducting research might use sample or a census.
student volunteers to
• Reasons for Nonsampling Errors
constitute a sample.
o Coverage error
o Advantage: Sample selection
and data collection are o Non-response error
relatively easy. ▪ Interviewer error
o Disadvantage: It is impossible
to determine how ▪ Processing error
representative of the Steps to minimise nonsampling Errors
population the sample is.
• Carefully define the target population
and design the data collection
Judgment Sampling
procedure.
• The person most knowledgeable on the • Carefully design the data collection
subject of the study selects elements of process and train the data collectors
the population that he or she feels are • Pretest the data collection procedure
most representative of the population. • Use stratified random sampling when
• It is a nonprobability sampling population-level information about an
technique. important qualitative characteristic is
o Example: A reporter might available.
sample three or four senators, • Use systematic sampling when
judging them as reflecting the population-level information about an
general opinion of the senate. important quantitative characteristic is
o Advantage: It is a relatively available.
easy way of selecting a sample.
o Disadvantage: The quality of
the sample results depends on CHAPTER 8
the judgment of the person
Margin of Error and the Interval Estimate
selecting the sample.

Recommendation • A point estimator cannot be expected


to provide the exact value of the
• It is recommended that probability population parameter.
sampling methods (simple random, • An interval estimate can be computed
stratified, cluster, or systematic) be by adding and subtracting a margin of
used. error to the point estimate.
• For these methods, formulas are
available for evaluating the “goodness” Point Estimate +/- Margin of Error
of the sample results in terms of the
closeness of the results to the • The purpose of an interval estimate is
population parameters being to provide information about how close
estimated. the point estimate is to the value of the
• An evaluation of the goodness cannot parameter.
be made with non-probability • The general form of an interval
(convenience or judgment) sampling estimate of a population mean
methods.
𝑥̅ + Margin of Error

Errors in Sampling
Interval Estimate of a Population Mean: 
• The difference between the value of
Known
sample statistic and the corresponding
value of the population parameters is • In order to develop an interval estimate
called the sampling error. of a population mean, the margin of
• Deviations of the sample from the error must be computed using either:
population that occur for reasons other
o the population standard Interval Estimate of a Population Mean: s
deviation  Unknown
o the sample standard
• If an estimate of the population
deviation 
standard deviation s cannot be
developed prior to sampling, we use the
•  is rarely known exactly, but often a
sample standard deviation s to estimate
good estimate can be obtained based
s.
on historical data or other information.
• This is the s unknown case.
• We refer to such cases as the  known
o In this case, the interval
case.
estimate for m is based on the t
distribution.
o (We’ll assume for now that the
• Interval estimate of µ population is normally
distributed.)

t Distribution

• William Gosset, writing under the name


“Student”, is the founder of the t
distribution.
• Gosset was an Oxford graduate in
mathematics and worked for the
Guinness Brewery in Dublin.
• He developed the t distribution while
working on small-scale materials and
temperature experiments.

• The t distribution is a family of similar


probability distributions.
• A specific t distribution depends on a
parameter known as the degrees of
Meaning of Confidence
freedom.
• Because 90% of all the intervals • Degrees of freedom refer to the
constructed using 𝑥̅ + 1.645𝜎𝑥̅ will number of independent pieces of
contain the population mean, we say information that go into the
we are 90% confident that the interval computation of s.
𝑥̅ + 1.645𝜎𝑥̅ includes the population
mean m. • A t distribution with more degrees of
• We say that this interval has been freedom has less dispersion.
established at the 90% confidence level. • As the degrees of freedom increase, the
• The value .90 is referred to as the difference between the t distribution
confidence coefficient. and the standard normal probability
distribution becomes smaller and
smaller.
Adequate Sample Size

• In most applications, a sample size of n Comparison of the standard normal distribution


≥ 30 is adequate. with t distributions having 10 and 20 degrees of
• If the population distribution is highly freedom.
skewed or contains outliers, a sample
size of 50 or more is recommended.
• If the population is not normally
distributed but is roughly symmetric, a
sample size as small as 15 will suffice.
• If the population is believed to be at
least approximately normal, a sample
size of less than 15 can be used.
• For more than 100 degrees of freedom, • If the population is believed to be at
the standard normal z value provides a least approximately normal, a sample
good approximation to the t value. size of less than 15 can be used.
• The standard normal z values can be
found in the infinite degrees row
(labeled ∞ ) of the t distribution table Summary of Interval Estimation Procedures
for a Population Mean
𝑠
𝑥̅ ± 𝑡𝛼/2
√𝑛

Using Excel’s Descriptive Statistics Tool

1. Click the Data tab on the


Ribbon
2. In the Analysis group click Data
Analysis
3. Choose Descriptive Statistics
from the list of Analysis tools
4. When the Descriptive statistics Sample Size for an Interval Estimate of a
dialog box appears Population Mean
▪ Enter Input Range
▪ Select Grouped by
columns Let E = the desired margin of error.
▪ Select Labels in the first
• E is the amount added to and
row
subtracted from the point estimate to
▪ Select Output range:
obtain an interval estimate.
▪ Enter C1 in the output
• If a desired margin of error is selected
range box
prior to sampling, the sample size
▪ Select summary
necessary to satisfy the margin of error
statistics
can be determined.
▪ Select confidence level
for mean
▪ Enter 95 in the
confidence level for • Margin of Error
mean box 𝜎
▪ Click OK 𝐸 = 𝑧𝛼/2
√𝑛
• Necessary Sample Size
Adequate Sample Size

• Usually, a sample size of n ≥ 30 is (𝑧𝛼/2 )2 𝜎2


adequate when using the expression n= 𝐸2
𝑥̅ ± 𝑡𝛼/2 𝑠/√𝑛 to develop an interval
estimate of a population mean. • The Necessary Sample Size equation
requires a value for the population
• If the population distribution is highly
standard deviation  .
skewed or contains outliers, a sample
size of 50 or more is recommended.
• If  is unknown, a preliminary or
• If the population is not normally planning value for  can be used in
distributed but is roughly symmetric, a the equation.
sample size as small as 15 will suffice. 1. Use the estimate of the
population standard
deviation Sample Size for an Interval Estimate of
computed in a previous a Population Proportion
study.
• Margin of Error
2. Use a pilot study to select a
preliminary study and use 𝑝̅ (1−𝑝̅ )
the sample standard
E = 𝑧𝛼/2 √
𝑛
deviation from the study.
3. Use judgment or a “best • Solving for the necessary sample
guess” for the value of  . size n, we get
2
Interval Estimate of a Population (𝑧𝛼/2 ) 𝑝̅ (1 − 𝑝̅ )
Proportion 𝑛=
𝐸2
• The general form of an interval
estimate of a population proportion • However, 𝑝̅ will not be known until
is: after we have selected the sample.
𝑝̅ + Margin of Error We will use the planning value p* for
𝑝̅ .

• The sampling distribution of 𝑝̅ plays


a key role in computing the margin • Necessary Sample Size
of error for this interval estimate.
2
(𝑧𝛼/2 ) 𝑝∗ (1 − 𝑝∗ )
• The sampling distribution of 𝑝̅ can 𝑛=
be approximated by a normal 𝐸2
distribution whenever np > 5 and
n(1 – p) > 5.
• The planning value p* can be chosen
by:
1. Using the sample proportion
• Normal Approximation of Sampling
from a previous sample of
Distribution of 𝑝̅
the same or similar units, or
2. Selecting a preliminary
sample and using the sample
proportion from this sample.
3. Using judgment or a “best
guess” for a p* value.
4. Otherwise, using .50 as the
p* value.

Implications of Big Data

• As the sample size becomes


extremely large, the margin of error
becomes extremely small and
𝑝̅ (1 − 𝑝̅ ) resulting confidence intervals
𝑝̅ ± 𝑧𝛼/2 √ become extremely narrow.
𝑛
• No interval estimate will accurately
reflect the parameter being
estimated unless the sample is
relatively free of nonsampling error.
• Statistical inference along with
information collected from other
sources can help in making the most
informed decision.
CHAPTER 9
Hypothesis Testing Alternative Hypothesis as a Research
Hypothesis
Hypothesis testing = used to
determine whether a statement • Many applications of hypothesis
about the value of a population testing involve an attempt to gather
parameter should or should not be evidence in support of a research
rejected. hypothesis.
• In such cases, it is often best to
Null hypothesis = denoted by H0 , is begin with the alternative
a tentative assumption about a hypothesis and make it the
population parameter. conclusion that the researcher
hopes to support.
Alternative hypothesis = denoted by • The conclusion that the research
Ha, is the opposite of what is stated hypothesis is true is made if the
in the null hypothesis. sample data provide sufficient
evidence to show that the null
• The hypothesis testing procedure hypothesis can be rejected.
uses data from a sample to test the
two competing statements indicated
by H0 and Ha.
Summary of Forms for Null and Alternative
Hypotheses about a Population Mean
Developing Null and Alternative
Hypotheses • The equality part of the hypotheses
always appears in the null
• It is not always obvious how the null hypothesis.
and alternative hypotheses should
be formulated. • In general, a hypothesis test about
• Care must be taken to structure the the value of a population mean m
hypotheses appropriately so that must take one of the following three
the test conclusion provides the forms (where m0 is the hypothesized
information the researcher wants. value of the population mean).
• The context of the situation is very
important in determining how the
hypotheses should be stated. 𝐻0 : 𝜇 ≥ 𝜇0 𝐻0 : 𝜇 ≤ 𝜇0
• In some cases it is easier to identify 𝐻𝑎 : 𝜇 < 𝜇0 𝐻𝑎 : 𝜇 > 𝜇0
the alternative hypothesis first. In
other cases the null is easier. One-tailed (lower-tail) One-tailed (upper-tail)
• Correct hypothesis formulation will
take practice.
𝐻0 : 𝜇 = 𝜇0
𝐻𝑎 : 𝜇 ≠ 𝜇0 Two-tailed test
• We might begin with a belief or
assumption that a statement about
the value of a population parameter
Type I Error
is true.
• We then using a hypothesis test to • Because hypothesis tests are based
challenge the assumption and on sample data, we must allow for
determine if there is statistical the possibility of errors.
evidence to conclude that the
assumption is incorrect. • A Type I error is rejecting H0 when it
• In these situations, it is helpful to is true.
develop the null hypothesis first.
Level of Significance = the Weak evidence to conclude Ha is
probability of making a Type I error true.
when the null hypothesis is true as • Greater than .10
an equality Insufficient evidence to conclude Ha
is true.
Significance Tests = Applications of
hypothesis testing that only control
the Type I error Critical Value Approach to One-Tailed
Hypothesis Testing

• The test statistic z has a standard


Type II Error
normal probability distribution.
• A Type II error is accepting H0 when • We can use the standard normal
it is false. probability distribution table to find
• It is difficult to control the the z-value with an area of a in the
probability of making a Type II error. lower (or upper) tail of the
• Statisticians avoid the risk of making distribution.
a Type II error by using “do not • The value of the test statistic that
reject H0” and not “accept H0”. establishes the boundary of the
rejection region is called the critical
value for the test.

• The rejection rule is:


Lower tail: Reject H0 if z < -za
Upper tail: Reject H0 if z > za

Steps of Hypothesis Testing


p-Value Approach to One-Tailed
1. Develop the null and alternative
hypotheses.
Hypothesis Testing 2. Specify the level of significance
a.
• The p-value is the probability
3. Collect the sample data and
computed using the test statistic,
compute the value of the test
that measures the support (or lack
statistic.
of support) provided by the sample
4. p-Value Approach
for the null hypothesis.
5. Use the value of the test statistic
• If the p-value is less than or equal to
to compute the p-value.
the level of significance a, the value
6. Reject H0 if p-value < a.
of the test statistic is in the rejection
region. Critical Value Approach
• Reject H0 if the p-value < a .
Step 4. Use the level of significance
a to determine the critical value and
the rejection rule.
Suggested Guidelines for Interpreting p-
Values Step 5. Use the value of the test
statistic and the rejection rule to
• Less than .01 determine whether to reject H0.
Overwhelming evidence to conclude
Ha is true.
• Between .01 and .05 p-Value Approach to Two-Tailed
Strong evidence to conclude Ha is Hypothesis Testing
true.
• Between .05 and .10 • Compute the p-value using the
following three steps:
Tests About a Population Mean: s
1. Compute the value of the test Unknown
statistic z.
• Test Statistic:
2. If z is in the upper tail (z > 0),
compute the probability that z is 𝑥̅ − 𝜇0
𝑡=
greater than or equal to the 𝑠⁄√𝑛
value of the test statistic. If z is in
the lower tail (z < 0), compute • This test statistic has a t distribution
the probability that z is less than with n - 1 degrees of freedom.
or equal to the value of the test
statistic. Rejection Rule: p -Value Approach
3. Double the tail area obtained in
step 2 to obtain the p-value. Reject H0 if p –value < a

• The rejection rule: Reject H0 if the Rejection Rule: Critical Value


p-value < a . Approach
H0: m > m0 Reject H0 if t < -ta
Critical Value Approach to Two-Tailed H0: m < m0 Reject H0 if t > ta
Hypothesis Testing
H0: m = m0 Reject H0 if t < - ta/2 or
• The critical values will occur in both
t > ta/2
the lower and upper tails of the
standard normal curve. p -Values and the t Distribution
• Use the standard normal probability
• The format of the t distribution table
distribution table to find za/2 (the z-
provided in most statistics textbooks
value with an area of a/2 in the
does not have sufficient detail to
upper tail of the distribution).
determine the exact p-value for a
hypothesis test.
• The rejection rule is: Reject H0 if z <
• However, we can still use the t
-za/2 or z > za/2.
distribution table to identify a range
for the p-value.
• An advantage of computer software
Relationship between Interval estimation
packages is that the computer
and Hypothesis testing
output will provide the p-value for
• Select a simple random sample from the t distribution.
the population and use the value of
the sample mean 𝑥̅ to develop the
confidence interval for the A Summary of Forms for Null and
population mean m. (Confidence Alternative Hypotheses About a
intervals are covered in Chapter 8.) Population Proportion
• If the confidence interval contains
• The equality part of the hypotheses
the hypothesized value m0, do not
always appears in the null
reject H0. Otherwise, reject H0.
hypothesis.
(Actually, H0 should be rejected if m0
happens to be equal to one of the
• In general, a hypothesis test about
end points of the confidence
the value of a population proportion
interval.)
p must take one of the following
three forms (where p0 is the
hypothesized value of the
population proportion).
• One-tailed (lower tail)
𝐻0 : 𝑝 ≥ 𝑝0 𝐻𝑎 : 𝑝 < 𝑝0

• One-tailed (upper tail)


𝐻0 : 𝑝 ≤ 𝑝0 𝐻𝑎 : 𝑝 > 𝑝0

• Two-tailed
𝐻0 : 𝑝 = 𝑝0 𝐻𝑎 : 𝑝 ≠ 𝑝0

Tests About a Population Proportion


𝑝̅ −𝑝0
Test Statistic: 𝑧 =
𝜎𝑝
̅

𝑝0 (1−𝑝0 )
where: 𝜎𝑝̅ =√ a
𝑛

assuming np > 5 and n(1 – p) > 5

Rejection Rule: p –Value Approach


Reject H0 if p –value < a

Rejection Rule: Critical Value


Approach
H0: p < p0 Reject H0 if z > za
H0: p > p0 Reject H0 if z < -za
H0: p = p0 Reject H0 if z < -za/2 or
z > za/2

You might also like