You are on page 1of 42

Business Statistics- Session 10

Hypothesis Formulation

and Hypothesis Testing


Type I Errors, Type II Errors and Statistical Power

• Type I error (): the probability of rejecting the null


hypothesis when it is actually true.

• Type II error (): the probability of failing to reject the


null hypothesis given that the alternative hypothesis is
actually true.

• Statistical power (1 - ): the probability of correctly


rejecting the null hypothesis.
Choosing the Appropriate Statistical Technique
Estimating the Population Mean using the z Statistic (σ Known)

A point estimate is a statistic taken from a sample that is used to estimate a


population parameter.

An interval estimate (confidence interval) is a range of values within which


the analyst can declare, with some confidence, the population parameter lies.

• A point estimate is only as good as the representativeness of its sample.

• Because of variation in sample statistics, estimating a population parameter with an


interval estimate is often preferable to using a point estimate.
Estimating the Population Mean Using the z Statistic (σ Known)

100(1-α)% Confidence Interval to Estimate μ: σ known

𝜎
𝑥+ 𝑧α/2
𝑛
or

𝜎 𝜎
𝑥 − 𝑧α/2 ≤ 𝜇 ≤ 𝑥ҧ + 𝑧α/2
𝑛 𝑛
where
α = the area under the normal curve outside the confidence
interval area

α/2 = the area in one end (tail) of the distribution outside the
confidence interval
Estimating the Population Mean Using the z Statistic (σ Known)

Example: A cellular telephone company’s would like to estimate the


population monthly mean number of texts in the 18-to-24-year-old age
category.
• From a sample of 85 bills it is determined that the sample mean is 1300 texts.
• Using this sample mean, a confidence interval can be calculated within which the
researcher is relatively confident that the actual population mean is located.
• Suppose that, from previous studies, the population standard deviation is known to be
about 160.

• The value of z is chosen based on the desired level of confidence.


– Common levels of confidence are 90%, 95%, 98%, and 99%.
– There is a trade-offs between sample size, interval width, and level of confidence
Estimating the Population Mean Using the z Statistic (σ Known)

Example, continued.

• The researcher chooses a 95% confidence level

• α = .05; α/2 = .025, the area in each tail.


• Looking up ,4750 (= .5 - .025) in the z table gives a z value of 1.96.
• Since the distribution is symmetric, the lower bound is -1.96.
• Now the confidence interval can be estimated, using the formula.
Estimating the Population Mean Using the z Statistic (σ Known)

Example, continued.
𝜎
𝑥 + 𝑧α/2
𝑛
160
1300 + 1.96
85
1300 + 34.01
1265.99 ≤ 𝜇 ≤ 1334.01

• The research company can conclude with 95% confidence that the
population mean number of texts per month for an American in
the 18-to-24-years-of-age category is between 1265.99 texts and
1334.01 texts.
𝜎
• 34.01 (= 𝑧α/2 ) is the margin of error.
𝑛
• The margin of error defines the upper and lower bounds of the
confidence interval.
Estimating the Population Mean Using the z Statistic (σ Known)

What does being 95% confident mean?

• In 100 random samples of 85 bills, approximately 95 of


the 100 intervals would contain the population mean.
– In a 90% confidence interval, only 90 of the 100 intervals would be
likely to contain the population mean.
– In practice, the researcher usually takes only a single sample, but
can be 95% confident that the interval that is found includes the
population mean.

• The figure shows 20 random samples and the intervals


found.
Estimating the Population Mean Using the t Statistic (σ Unknown)
Estimating the Population Mean Using the t Statistic (σ Unknown)

Characteristics of the t Distribution

• t distributions are symmetric, unimodal, and a family of curves.


• The t distributions are flatter in the middle and have more area in their tails than the
standard normal distribution.
• As the sample size becomes large, the t distribution approaches the z distribution.
Estimating the Population Mean Using the t Statistic (σ Unknown)

Reading the t Distribution Table


• The t distribution table is a compilation of many t distributions, with each line of the table
having different degrees of freedom and containing t values for different t distributions.

• The degrees of freedom for the t statistic presented in this section are computed by n - 1.

• The term degrees of freedom refers to the number of independent observations for a source
of variation minus the number of independent parameters estimated in computing the
variation.
Estimating the Population Mean Using the t Statistic (σ Unknown)

Example: If a 90% confidence interval is being computed, the total area in the two tails is
10%.
• Thus, α is .10 and α/2 is .05.
• If the degrees of freedom (df) are 24, the t value is located at the intersection of the df
value and the selected α/2 value.
• In the excerpt from the t table below, the t value is 1.711.
Estimating the Population Mean Using the t Statistic (σ Unknown)

Confidence Intervals to Estimate the Population Mean


Using the t Statistic
𝑠
𝑥ҧ ± 𝑡𝑎/2,𝑛−1
𝑛
𝑠 𝑠
𝑥ҧ − 𝑡𝑎,𝑛−1 ≤ 𝜇 ≤ 𝑥ҧ + 𝑡𝑎/2,𝑛−1
2 𝑛 𝑛
df = n-1

Example: Suppose a researcher wants to estimate the average


amount of comp time accumulated per week for managers in the
aerospace industry. He randomly samples 18 managers and measures
the amount of extra time they work during a specific week.
• The sample mean is 13.56 hours, and the sample standard
deviation is 7.80 hours.
• The researcher would like a 90% confidence interval.
• Since n < 30, and σ is unknown, use the t distribution.
Estimating the Population Mean Using the t Statistic (σ Unknown)

Example, continued.
• With a sample size of 18, there are n-1 = 17 degrees of freedom.
• For a 90% confidence interval, α/2 = .05.
• t value = 1.740
𝑠
𝑥ҧ ± 𝑡𝑎/2,𝑛−1
𝑛

7.80
13.56 ± 1.740
18

10.36 ≤ 𝜇 ≤ 16.76

• The researcher is 90% confident that the average amount of comp


time accumulated by a manager per week in this industry is
between 10.36 and 16.76 hours.
Estimating the Population Mean Using the t Statistic (σ Unknown)

Using the Computer to Construct t Confidence Intervals for the Mean


• The Excel output includes the mean, the standard error, the sample standard deviation,
and the error of the confidence interval, referred to by Excel as the “confidence level.”

• The confidence interval must be


computed from the sample mean and the
confidence level.

• Minitab gives the confidence interval


endpoints.
Estimating the Population Proportion

Confidence Interval to Estimate p

ො 𝑞ො
𝑝∙ ො 𝑞ො
𝑝∙
𝑝Ƹ − 𝑧 ≤ 𝑝 ≤ 𝑝Ƹ + 𝑧𝑎/2
𝑛 𝑛
where
𝑝=Ƹ sample proportion
𝑞ො = 1 − 𝑝Ƹ
𝑝 = population proportion
𝑛 = sample size
Estimating the Population Variance

The researcher may be more interested in the population variance


than the mean or the proportion.
• The sample variance is the point estimate of the population
variance.
• The ratio of the sample variance (𝑠 2 ), multiplied by n − 1, to the
population variance (𝜎 2 ) follows a chi-square distribution (𝜒 2 ).
– This distribution is NOT robust to the assumption of that the
population is normally distributed and should not be used if that
cannot be assumed.

𝝌𝟐 Formula for Single Variance

𝑛 − 1 𝑠 2
𝜒2 =
𝜎2

df = 𝑛 − 1
Estimating the Population Variance

Confidence Interval to Estimate the Population Variance

𝑛 − 1 𝑠2 𝑛 − 1 𝑠 2

2 ≤ 𝜎2 ≤ 2
𝜒𝑎/2 𝜒1−𝑎/2

df = 𝑛 − 1

• The 𝜒 2 distribution is not symmetrical, and its shape will vary


according to the degrees of freedom.
Estimating the Population Variance

Example: Suppose eight purportedly 7-centimeter aluminum


cylinders in a sample are measured in diameter, resulting in the
following values:

• The sample variance is 𝑠 2 = .0022125, the point estimate.


• For a 90% confidence interval with n-1 = 7 degrees of freedom, the
values from the 𝜒 2 table are 2.16735 and 14.0671.
Estimating the Population Variance

Example, continued:

𝑛 − 1 𝑠2 𝑛 − 1 𝑠 2

2 ≤ 𝜎2 ≤ 2
𝜒𝑎/2 𝜒1−𝑎/2

7 (.0022125) 2
7 (.0022125)
≤𝜎 ≤
14.0671 2.16735

. 001101 ≤ 𝜎 2 ≤ .007146

• The confidence interval says that with 90% confidence, the


population variance is somewhere between .001101 and .007146.
Testing Hypotheses on a Single Mean

• One sample t-test: statistical technique that is used to


test the hypothesis that the mean of the population
from which a sample is drawn is equal to a comparison
standard.
Testing Hypotheses about Two Related Means

• Paired samples t-test: examines differences in same


group before and after a treatment.
Testing Hypotheses about Several Means

• Analysis Of Variance (ANOVA) helps to examine the


significant mean differences among more than two
groups on an interval or ratio-scaled dependent
variable.
Model Significance

• H0: 0 = 1 = ... = m = 0 (all parameters are zero)


H1: Not H0
Conceptual Model

+
Physical Likelihood
Attractiveness to Date
Multiple Regression Analysis

• We use more than one (metric or non-metric)


independent variable to explain variance in a (metric)
dependent variable.
Conceptual Model

Perceived Intelligence
+

+
Physical Likelihood
Attractiveness to Date
M odel Summary

Adj usted Std. Error of


M odel R R Square R Square the Esti mate
1 .844 .712 .706 5.895

ANOVA

Sum of
M odel Squares df M ean Square F Si g.
1 Regressi on 8257.731 2 4128.866 118.808 .000
Resi dual 3336.228 96 34.752
T otal 11593.960 98

Coefficients

Unstandardi zed Standardi zed


Coeffi ci ents Coeffi ci ents
M odel B Std. Error Beta t Si g.
1 (Constant) 31.575 3.130 10.088 .000
PERC_INT GCE .050 .037 .074 1.340 .183
PHYS_ATT R .523 .034 .846 15.413 .000
Conceptual Model

Gender
Perceived Intelligence
+ +

+
Physical Likelihood
Attractiveness to Date
Moderators
• Moderator is qualitative (e.g., gender, race, class) or quantitative
(e.g., level of reward) that affects the direction and/or strength of
the relation between dependent and independent variable

• Analytical representation

Y = ß0 + ß1X1 + ß2X2 + ß3X1X2

with Y = DV
X1 = IV
X2 = Moderator
M odel Summary

Adj usted Std. Error of


M odel R R Square R Square the Esti mate
1 .910 .828 .821 4.601

ANOVA

Sum of
M odel Squares df M ean Square F Si g.
1 Regressi on 9603.938 4 2400.984 113.412 .000
Resi dual 1990.022 94 21.170
T otal 11593.960 98
Coefficients

Coefficients

Unstandardi zed Standardi zed


Coeffi ci ents Coeffi ci ents
M odel B Std. Error Beta t Si g.
1 (Constant) 32.603 3.163 10.306 .000
PERC_INT GCE .000 .043 .000 .004 .997
PHYS_ATT R .496 .027 .802 18.540 .000
GENDER -.420 3.624 -.019 -.116 .908
PI_GENDER .127 .058 .369 2.177 .032

interaction significant effect on dep. var.


Conceptual Model
Gender
Perceived Intelligence
+ +

+
Physical Likelihood
Attractiveness to Date

+
Communality of +
Perceived Fit
Interests
Mediating/intervening variable
• Accounts for the relation between the independent and dependent
variable

• Analytical representation
1. Y = ß0 + ß1X
=> ß1 is significant

2. M = ß2 + ß3X
=> ß3 is significant

3. Y = ß4 + ß5X + ß6M
=> ß5 is not significant With Y = DV
=> ß6 is significant X = IV
M = mediator
Step 1
Mode l S umm ary

Adjus ted St d. E rror of


Model R R Square R Square the E stimate
1 .963 .927 .923 3. 020

ANOVA

Sum of
Model Squares df Mean Square F Si g.
1 Regression 10745.603 5 2149.121 235.595 .000
Residual 848.357 93 9.122
Total 11593.960 98
Step 1 cont’d

Coefficients

Unstandardized Standardi zed


Coeffi cients Coeffi cients
Model B Std. Error Beta t Si g.
1 (Cons tant) 17.094 2.497 6.846 .000
PERC_INTGCE .030 .029 .044 1.039 .301
PHYS_ATTR .517 .018 .836 29.269 .000
GENDER -.783 2.379 -.036 -.329 .743
PI_GENDER .122 .038 .356 3.201 .002
COMM_INTER .212 .019 .319 11.187 .000

significant effect on dep. var.


Step 2
Mode l S umm ary

Adjus ted St d. E rror of


Model R R Square R Square the E stimate
1 .977 .955 .955 2. 927

ANOVA

Sum of
Model Squares df Mean Square F Si g.
1 Regression 17720.881 1 17720.881 2068.307 .000
Residual 831.079 97 8.568
Total 18551.960 98
Step 2 cont’d

Coefficients

Unstandardized Standardi zed


Coeffi cients Coeffi cients
Model B Std. Error Beta t Si g.
1 (Cons tant) 8.474 1.132 7.484 .000
COMM_INTER .820 .018 .977 45.479 .000

significant effect on mediator


Step 3
Mode l S umm ary

Adjus ted St d. E rror of


Model R R Square R Square the E stimate
1 .966 .934 .930 2. 885

ANOVA

Sum of
Model Squares df Mean Square F Si g.
1 Regression 10828.336 6 1804.723 216.862 .000
Residual 765.624 92 8.322
Total 11593.960 98
Step 3 cont’d

Coefficients

Unstandardized Standardi zed


Coeffi cients Coeffi cients
Model B Std. Error Beta t Si g.
1 (Cons tant) 14.969 2.478 6.041 .000
PERC_INTGCE .019 .028 .028 .688 .493
PHYS_ATTR .518 .017 .839 30.733 .000
GENDER -2.040 2.307 -.094 -.884 .379
PI_GENDER .142 .037 .412 3.825 .000
COMM_INTER -.051 .085 -.077 -.596 .553
PERC_FIT .320 .102 .405 3.153 .002

insignificant effect of indep. var on dep. Var.


significant effect of mediator on dep. var.

You might also like