You are on page 1of 25

3/16/2011

Hypothesis Testing 2
{ CE 21. Engineering Statistics

1
3/16/2011

Summary:
Let X1,…,Xn be a large (n>30) sample from a population with
mean µ and standard deviation σ.
To test a null hypothesis of the form:
H0: µ ≤µ0, H0:µ ≥µ0, H0: µ =µ0:
𝑋−𝜇0
- Compute the z-score: 𝑧 = .
𝜎/ 𝑛
If 𝜎 is unknown it can be approximated with s.
- Compute the P-value. The P-value is an area under the
normal curve, which depends on the alternate hypothesis as
follows:
Alternate Hypothesis P-value
H1: µ > µ0 Area to the right of z
H1: µ < µ0 Area to the left of z
H1: µ ≠ µ0 Sum of the areas in the tails cut off by z and -z

 The smaller the P-value, the more certain we can be


that Ho is false.
 The larger the P-value, the more plausible Ho

becomes, but we can never be certain that Ho is


true.
 A rule of thumb suggests to reject Ho whenever

P ≤ 0.05. While this rule is convenient, it has no


scientific basis.
 If P ≤ 0.05, the result is statistically significant at the

5% level. Or the null hypothesis is rejected at 5%


level.

2
3/16/2011

The Relationship
Between Hypothesis
Tests and Confidence
Intervals

Example: The sample mean lifetime of 50 micro


drills was =12.68 holes drilled and s=6.83. Setting
α to 0.05 (5%), the 95% confidence interval for µ
was computed to be (10.79, 14.57).

Test Ho: µ = 10.79 versus H1: µ ≠ 10.79. Do a


similar test for 14.57.

What conclusion can you make from the results?

What about for values inside the interval?

3
3/16/2011

 The 95% confidence interval consists of


precisely those values of µ whose P-values are
greater than 0.05 in a hypothesis test.

 The confidence interval contains all the values


that are plausible for the population mean µ.

Quiz:
1. For which value is the null hypothesis more plausible:
P=0.5 or P=0.05?
2. If P=0.01, which is the best conclusion?
a. H0 is definitely false.
b. H0 is definitely true.
c. There is a 1% probability that H0 is true.
d. H0 might be true, but it’s unlikely.
e. H0 might be false, but it’s unlikely.
f. H0 is plausible.
3. True or False: If P=0.02, then
a. The result is statistically significant at the 5% level.
b. The result is statistically significant at the 1% level.
c. The null hypothesis is rejected at the 5% level.
d. The null hypothesis is rejected at the 1% level.

4
3/16/2011

Tests for a Population


Proportion

The same procedures essentially apply when


dealing with population proportions.
However, as discussed in previous lessons,

𝜇=𝑝
𝑝 1−𝑝
𝜎2 =
𝑛
This test requires that the sample proportion be approximately
normally distributed.
 This assumption will be justified when both np0 > 10 and n(1-p0) > 10.

 p0 is the population proportion specified in the null distribution.

5
3/16/2011

Example 7:
A supplier of semiconductor wafers claims
that of all the wafers he supplies, no more than
10% are defective. A sample of 400 wafers is
tested and 50 of them, or 12.5%, are defective.
Can we conclude that the claim is false?

6
3/16/2011

Example 8:
The article “Refinement of Gravimetric Geoid
Using GPS and Leveling Data” presents a
method for measuring orthometric heights
above sea level. For a sample of 1225 baselines,
926 gave results that were within the class C
spirit leveling tolerance limits. Can we conclude
that this method produces results within the
tolerance limits more than 75% of the time?

Summary:
Let X be the number of successes in n independent
Bernoulli trials, each with success probability p; in other
words, let X~Bin (n,p).
To test a null hypothesis, assuming that both np0 and
n(1-p0) are greater than 10:

Compute the z-score:

Compute the P-value. The P-value is an area under the


normal curve, which depends on the alternate hypothesis
as follows:
Alternate Hypothesis P-value
H1: p > p0 Area to the right of z
H1: p < p0 Area to the left of z
H1: p ≠ p0 Sum of the areas in the tails cut off by z
and-z

7
3/16/2011

Small-Sample Tests for a


Population Mean

-Uses the t-test, rather than the z-test.


-If 𝜎 is known, use z, not t.

Example 9: Spacer collars for a transmission


countershaft have a thickness specification of 38.98-
39.02. The process that manufactures the collars is
supposed to be calibrated so that the mean thickness
is 39.00 mm.

A sample of six collars is drawn and measured for


thickness. The six thicknesses are 39.030, 38.997,
39.012, 39.008, 39.019, and 39.002. Assume that the
population of thicknesses is approximately normal.
Can we conclude that the process needs recalibration?

8
3/16/2011

Example 10: Before a substance can be deemed


safe for landfilling, its chemical properties
must be characterized. An article reports that
in a sample of six replicates of sludge from a
New Hampshire wastewater treatment plant,
the mean pH was 6.68 with a standard
deviation of 0.20. Can we conclude that the
mean pH is less than 7.0?

9
3/16/2011

Tests for the Difference


Between Two Means

For Large Samples (nX > 30 and nY > 30):


To test a null hypothesis either of the form
H0 :
H0 :
H0::

Compute the z-score:

If and are unknown they may be


approximated with and , respectively.

10
3/16/2011

Example 11:
An engineer claims that a new type of power
supply for home computers lasts longer than
the old type. Independent random samples of
75 each of the two types are chosen, and the
sample means and standard deviations of their
lifetimes are computed:
New: 𝑋= 4387 h s1=252 h
Old: 𝑋= 4260 h s2=231 h

Can you conclude that the mean lifetime of


new power supplies is greater than that of the
old power supplies?

For Small Samples:


Should come from normal populations
with means and and standard
deviations and
If and are not known to be equal, use the
following procedure in testing the null
hypothesis.

Compute degree of freedom, v, rounded


down to the nearest integer.

ii. Compute the test statistic, t.

11
3/16/2011

iii. Compute the P-value. The P-value is an area under the


normal curve, which depends on the alternate hypothesis as
follows:

Alternate Hypothesis P-value


H1: 𝜇𝑋 − 𝜇𝑌 > ∆0 Area to the right of t
H1: 𝜇𝑋 − 𝜇𝑌 < ∆0 Area to the left of t
H1: 𝜇𝑋 − 𝜇𝑌 ≠ ∆0 Sum of the areas in the tails cut off
by t and –t

READING ASSIGNMENT:
Tests for the difference between two proportions
(pages 425-428)
Tests with paired data (page 439- 441)

12
3/16/2011

Schedule
 March 16 – Chi Square Tests
 March 18- F Tests, Power
 March 23- Third Long Exam, w/ cheat sheet (Wednesday)
 4-6 PM
 Conflict: 25 1 PM
 April 1- Final Presentation (Friday)

The Chi-Square Test

13
3/16/2011

 Used when data consists of nominal or ordinal variables

Nominal variables:
Variables with no inherent order or ranking sequence,
-e.g. numbers used as names (group 1, group 2...), gender,

Ordinal variables:
Variables with an ordered series,
- e.g. "greatly dislike, moderately dislike, indifferent,
moderately like, greatly like".
***Numbers assigned to such variables indicate rank order only -
the "distance" between the numbers has no meaning.

Multinomial trial – an experiment that


can result in k outcomes, where k ≥ 2

 generalization of the Bernoulli trial


 Example: Roll of a fair die (6 outcomes)

14
3/16/2011

The Chi-Sqaure test has two main uses:


 Comparing the distribution of one category variable

(nominal or ordinal) with another.


 Comparing an observed distribution with a

theoretically expected one.

 Comparing the distribution of one category variable


with another.

Example:
Of 120 male and 100 female applicants to university, 90 male and 40 female had
work experience.
Does the gender of an applicant to university correspond to whether or not they
have prior work experience?

15
3/16/2011

 Comparing an observed distribution with a


theoretically expected one.
Example:

In a population of mice, do the proportions differ from those


expected?

Examples

16
3/16/2011

Example:
A gambler wants to test a die to see if it is not
fair.
H0: Die is fair. (p01=…p06=1/6)
He rolls the die 600 times and obtains the ff.
results:
Category Observed Expected
1 115 100
2 97 100
3 91 100
4 101 100
5 110 100
6 86 100
Total 600 600

 Expected value = mean number of trials that


would result in a specific outcome if H0 were true.

 Chi-square statistic- measures the closeness of the


expected value to the observed value
2
𝑘 (𝑂𝑖 −𝐸𝑖 )
 𝜒2 = 𝑖=1 𝐸 𝑖

17
3/16/2011

 If 𝜒 2 is large, there is stronger evidence against H0.


 For k outcomes, there are k-1 degrees of freedom

 The chi-square test provides a good estimate when


all the expected values are ≥ 5.

 Chi square statistic for the example is 6.12.

P-value:
 Check if all expected values are ≥ 5.

 Check table for chi-square value. The areas given across


the top are the areas to the right of the critical value.
 P-value for the example > 0.10. We therefore do not

reject H0.

18
3/16/2011

Example 1:
Rivets are manufactured for a certain purpose. The
length specification is 1.20-1.30 cm. It is thought that
90% of the rivets manufactured meet the specification,
while 5% are too short, and 5% are too long.

In a random sample of 1000 rivets, 860 met the specs, 60


were too short, and 80 were too long. Can you conclude
that the true percentages differ from 90%, 5%, and 5%?
State the appropriate null hypothesis.
Compute the expected values under the null hypothesis.
Compute the value of the chi-square statistic.
Find the P-value. What do you conclude?

Chi-Square test for homogeneity


 If you conduct several trials, you determine

if each has the same set of possible


outcomes.

H0: The probabilities of the outcomes are the


same for each experiment.

19
3/16/2011

Example:
Four machines manufacture cylindrical steel pins. The pins are
subject to a diameter specification. A pin may meet the
specification, or it may be too thin or too thick. Pins are sampled
for each machine, and the number of pins in each category is
counted. The results are shown in the contingency table:

Too thin OK Too thick Total


Machine 1 10 102 8 120
Cell: each row-column intersection
Machine 2 34 161 5 200

Machine 3 12 79 9 100

Machine 4 10 60 10 80

Total 66 402 32 500

Example:
Four machines manufacture cylindrical steel pins. The pins are
subject to a diameter specification. A pin may meet the
specification, or it may be too thin or too thick. Pins are sampled
for each machine, and the number of pins in each category is
counted. The results are shown in the contingency table:

Too thin OK Too thick Total


Machine 1 10 102 8 120
Marginal Totals
Machine 2 34 161 5 200

Machine 3 12 79 9 100

Machine 4 10 60 10 80

Total 66 402 32 500

20
3/16/2011

Notation for Observed values (i  rows, j 


columns)
H0: For each column j, p1j=…= pIj
O1. = sum of observed values in row i
O.j = sum of observed values in column j

Column 1 Column 2 … Column J Total

Row 1 O11 O12 … O1J O1.


Row 2 O21 O22 … O2J O2.
: : : : : :
Row I OI1 OI2 … OIJ OI.
Total O.1 O.2 … O.J O..

Computing the expected value:


For cell ij,
𝑂𝑖. 𝑂.𝑗
𝐸𝑖𝑗 =
𝑂..
𝐼 𝐽
2
(𝑂𝑖𝑗 − 𝐸𝑖𝑗 )2
𝜒 =
𝐸𝑖𝑗
𝑖=1 𝑗=1

Degrees of freedom = (I-1)(J-1)

21
3/16/2011

Example 2: Given the table below, test the null


hypothesis that the proportion of pins that are too
thin, OK, or too thick are the same for all the
machines.

Too thin OK Too thick Total


Machine 1 10 102 8 120
Machine 2 34 161 5 200
Machine 3 12 79 9 100
Machine 4 10 60 10 80
Total 66 402 32 500

Chi-square test for Independence


 In the previous example, the column totals were
random, while the row totals were fixed in
advance.

Too thin OK Too thick Total


Machine 1 10 102 8 120
Machine 2 34 161 5 200
Machine 3 12 79 9 100
Machine 4 10 60 10 80
Total 66 402 32 500

 For cases when both row and column totals are


random,ie., they are independent, the same
procedure applies.

22
3/16/2011

Chi-square test for Independence

A public opinion poll surveyed a simple random sample of


1000 voters. Respondents were classified by gender and by
voting preference Results are shown in the contingency
table below.

Is there a gender gap? Do the men's voting preferences


differ significantly from the women's preferences?

Example 3:
Cylindrical steel pins are subject to a length and
diameter specification. With respect to length, a pin
may meet the specification, or it may be too long or
too short.

A total of 1021 pins are sampled and categorized


wrt both length and diameter specification. The
results are presented in the table below.

Test the null hypothesis that the proportion of pins


that are too thin, OK, or too thick wrt diameter
specification do not depend on the classification
wrt length specification..

23
3/16/2011

Observed values for 1021 steel pins

Diameter
Length Too thin OK Too Total
thick
Too 13 117 4 134
Short
OK 62 664 80 806
Too 5 68 8 81
Long
Total 80 849 92 1021

Example 4:
For the given table of observed values,
Construct the corresponding table of expected
values.
If appropriate, perform the chi-square test for the
null hypothesis that the row and column outcomes
are independent. If not appropriate, explain why.

Observed Values
1 2 3
A 15 10 12
B 3 11 11
C 9 14 12

24
3/16/2011

End.

25