Professional Documents
Culture Documents
Chi Square
Chi Square
Department of Mathematics
University of Houston
Lecture 14 - 3339
2 Chi-Square
3 Examples
4 χ2 Test of Independence
6 Examples
Mars Inc. claims that they produce M&Ms with the following
distributions:
Brown 30% Red 20% Yellow 20%
Orange 10% Green 10% Blue 10%
A bag of M&Ms was randomly selected from the grocery store shelf,
and the color counts were:
Brown 14 Red 14 Yellow 5
Orange 7 Green 6 Blue 10
Hypotheses:
I H0 : The proportions are the same as what is claimed.
I Ha : At least one proportion is different as what is claimed.
This would be better in context of the problem. For example in our
M&Ms test;
I H0 : The distribution of candy colors is as the manufacturer claims.
I Ha : The distribution of candy colors is not what the manufacturer
claims.
Hypotheses:
I H0 : The proportions are the same as what is claimed.
I Ha : At least one proportion is different as what is claimed.
This would be better in context of the problem. For example in our
M&Ms test;
I H0 : The distribution of candy colors is as the manufacturer claims.
I Ha : The distribution of candy colors is not what the manufacturer
claims.
Hypotheses:
I H0 : The proportions are the same as what is claimed.
I Ha : At least one proportion is different as what is claimed.
This would be better in context of the problem. For example in our
M&Ms test;
I H0 : The distribution of candy colors is as the manufacturer claims.
I Ha : The distribution of candy colors is not what the manufacturer
claims.
Hypotheses:
I H0 : The proportions are the same as what is claimed.
I Ha : At least one proportion is different as what is claimed.
This would be better in context of the problem. For example in our
M&Ms test;
I H0 : The distribution of candy colors is as the manufacturer claims.
I Ha : The distribution of candy colors is not what the manufacturer
claims.
The sum of the third column is called the Chi-square test statistic, χ2 .
X (observed − expected)2
χ2 =
expected
Red 14 0.2
Yellow 5 0.2
Orange 7 0.1
Green 6 0.1
Blue 10 0.1
> chisq.test(c(14,14,5,7,6,10),correct=FALSE,p=c(.3,.2,.2,
.1,.1,.1))
> chisq.test(c(14,14,5,7,6,10),correct=FALSE,p=c(.3,.2,.2,
.1,.1,.1))
The following table shows three different airlines row variable and the
number of delayed or on-time flights column variable from
lightstats.com.
2. Hypotheses
I Null hypothesis: There is no association (independence) between
the row variable and column variable.
I Alternative hypothesis: There is an association (dependence)
between the row variable and column variable.
I In the previous example:
The following table shows three different airlines row variable and the
number of delayed or on-time flights column variable from
flightstats.com.
1530×287 1530×3155
Southwest 3442 = 127.5741 3442 = 1402.4259 1530
957×287 957×3155
United 3442 = 79.7963 3442 = 877.20367 957
1530×287 1530×3155
Southwest 3442 = 127.5741 3442 = 1402.4259 1530
957×287 957×3155
United 3442 = 79.7963 3442 = 877.20367 957
1530×287 1530×3155
Southwest 3442 = 127.5741 3442 = 1402.4259 1530
957×287 957×3155
United 3442 = 79.7963 3442 = 877.20367 957
1530×287 1530×3155
Southwest 3442 = 127.5741 3442 = 1402.4259 1530
957×287 957×3155
United 3442 = 79.7963 3442 = 877.20367 957
1530×287 1530×3155
Southwest 3442 = 127.5741 3442 = 1402.4259 1530
957×287 957×3155
United 3442 = 79.7963 3442 = 877.20367 957
1530×287 1530×3155
Southwest 3442 = 127.5741 3442 = 1402.4259 1530
957×287 957×3155
United 3442 = 79.7963 3442 = 877.20367 957
Delayed On-time
(112−79.6296)2 (843−875.3704)2
American 79.6296 = 13.159 875.3704 = 1.197
(114−127.5741)2 (1416−1402.4259)2
Southwest 127.5741 = 1.4443 1402.4259 = 0.1314
(61−79.7963)2 (896−877.20367)2
United 79.7963 = 4.428 877.20367 = 0.4028
Test statistic:
Delayed On-time
(112−79.6296)2 (843−875.3704)2
American 79.6296 = 13.159 875.3704 = 1.197
(114−127.5741)2 (1416−1402.4259)2
Southwest 127.5741 = 1.4443 1402.4259 = 0.1314
(61−79.7963)2 (896−877.20367)2
United 79.7963 = 4.428 877.20367 = 0.4028
Test statistic:
Delayed On-time
(112−79.6296)2 (843−875.3704)2
American 79.6296 = 13.159 875.3704 = 1.197
(114−127.5741)2 (1416−1402.4259)2
Southwest 127.5741 = 1.4443 1402.4259 = 0.1314
(61−79.7963)2 (896−877.20367)2
United 79.7963 = 4.428 877.20367 = 0.4028
Test statistic:
Delayed On-time
(112−79.6296)2 (843−875.3704)2
American 79.6296 = 13.159 875.3704 = 1.197
(114−127.5741)2 (1416−1402.4259)2
Southwest 127.5741 = 1.4443 1402.4259 = 0.1314
(61−79.7963)2 (896−877.20367)2
United 79.7963 = 4.428 877.20367 = 0.4028
Test statistic:
Delayed On-time
(112−79.6296)2 (843−875.3704)2
American 79.6296 = 13.159 875.3704 = 1.197
(114−127.5741)2 (1416−1402.4259)2
Southwest 127.5741 = 1.4443 1402.4259 = 0.1314
(61−79.7963)2 (896−877.20367)2
United 79.7963 = 4.428 877.20367 = 0.4028
Test statistic:
Delayed On-time
(112−79.6296)2 (843−875.3704)2
American 79.6296 = 13.159 875.3704 = 1.197
(114−127.5741)2 (1416−1402.4259)2
Southwest 127.5741 = 1.4443 1402.4259 = 0.1314
(61−79.7963)2 (896−877.20367)2
United 79.7963 = 4.428 877.20367 = 0.4028
Test statistic:
Delayed On-time
(112−79.6296)2 (843−875.3704)2
American 79.6296 = 13.159 875.3704 = 1.197
(114−127.5741)2 (1416−1402.4259)2
Southwest 127.5741 = 1.4443 1402.4259 = 0.1314
(61−79.7963)2 (896−877.20367)2
United 79.7963 = 4.428 877.20367 = 0.4028
Test statistic:
Delayed On-time
(112−79.6296)2 (843−875.3704)2
American 79.6296 = 13.159 875.3704 = 1.197
(114−127.5741)2 (1416−1402.4259)2
Southwest 127.5741 = 1.4443 1402.4259 = 0.1314
(61−79.7963)2 (896−877.20367)2
United 79.7963 = 4.428 877.20367 = 0.4028
Test statistic:
> airline<-matrix(c(112,114,61,843,1416,896),nrow=3,ncol=2)
> chisq.test(airline,correct = FALSE)
data: airline
X-squared = 20.762, df = 2, p-value =3.102e-05
> eat<-matrix(c(100,900,120,880,280,720,390,610,570,430),nrow = 2
,ncol = 5)
> eat
[,1] [,2] [,3] [,4] [,5]
[1,] 100 120 280 390 570
[2,] 900 880 720 610 430
> chisq.test(eat,correct = FALSE)
data: eat
X-squared = 742.4, df = 4, p-value < 2.2e-16
c) The male students have less variability than the female students.
d) About 50% of the male students have weights between 150 and
185 lbs.
X 0 1 2 3 4
P(X ) 2k 3k 5k 3k 4k
Let X be the amount of time (in hours) the wait is to get a table at a
restaurant. Suppose the cdf is represented by
0
x <0
F (x) = 4 x 1 2 0≤x ≤2
1 x >2
2. Find P(X ≥ 1)
2. P(Z ≤ −1.9)
2. P(X ≥ 7.1)
Identify the most appropriate test to use for the following situation: A
national computer retailer believes that the average sales are greater
for salespersons with a college degree. A random sample of 14
salespersons with a degree had an average weekly sale of $3542 last
year, while 17 salespersons without a college degree averaged $3301
in weekly sales. The standard deviations were $468 and $642
respectively. Is there evidence to support the retailer’s belief?
a) One sample t test
b) Matched pairs
c) Two sample t test
d) Two sample p test
Data for gas mileage (in mpg) for different vehicles was entered into a
software package and part of the ANOVA table is shown below:
Source DF SS MS
Vehicle 2 440 220.00
Error 17 318 18.71
Total 19 758
IT 295 152 214 171 131 178 225 141 116 173
GPA 2.4 .6 .2 0 1 .6 1 .4 0 2.6
RR 41 18 45 29 28 38 25 26 22 37
a) Calculate the line of best fit that predicts the GPA on the basis of IT
scores.
b) Calculate the line of best fit that predicts the GPA on the basis of
RR scores.
c) Which of the two lines calculated in parts a and b best fits the data?
Provided
I Online calculator; it will be a link you see in the exam.
I R; it will be a link you see in the exam.
I Formula sheet and z, t, chi-square tables.
Can bring
I Pencil; you will need something to write with for the free response
questions.
I Your Cougar Card.