Lecture Chi Square Non Parametric Test

Lecture Statistics
One-Sample non-Parametric Tests

Chi-Square Test
6 Steps in
Hypothesis Testing
1. State the null hypothesis, H0 and the

alternative hypothesis, H1
2. Choose the level of significance, , and the
sample size, n
3. Determine the appropriate test statistic and
sampling distribution
4. Determine the critical values that divide the
rejection and nonrejection regions
6 Steps in
Hypothesis Testing (continued)
5. Collect data and compute the value of the

test statistic
6. Make the statistical decision and state the
managerial conclusion. If the test statistic
falls into the nonrejection region, do not
reject the null hypothesis H0. If the test
statistic falls into the rejection region, reject
the null hypothesis. Express the managerial
conclusion in the context of the problem
One-sample Hypothesis Testing
• Nominal Data: Chi-Square Test (Non-parametric
test)
• Ordinal Data: Kolmogorov-Smirnov (K-S) Test

(Non-Parametric Test)
• Interval/Ratio Data: Z-test or t-test (Parametric test)

Chi-Square Test for Nominal
(Categorical) Data
A chi-square statistic is used to investigate whether
distributions differ from one another.
• One variable—how its distribution compares to a

second, given distribution(a test of goodness of fit)
• Two variables—tests whether the two variables are

related (statistically) to each other (a test for
independence)
Chi-square statistic
(observed frequency − expected frequency) 2

 =
2
expected frequency
Note that chi-square tests can only be used on actual
numbers and not on percentages, proportions, etc
For reasonably large n, the above statistic under Ho has an

approximate chi-squared distribution with (k-1) degrees of
freedom, where k is the no of categories
Chi-square (cont.)
• Here’s the distribution:

It is actually a family of
distributions depending on
the degrees of freedom
• if there is in fact no relationship between two variables, if

you draw repeated samples and calculate the formula on the
last slide, you’ll get this kind of distribution simply because
of chance variation.
• What we do in practice is to draw one sample and make the

calculation. If the number we get is large enough, it tells us
that we almost certainly didn’t get this number by chance.
i.e., there really is a relationship between these variables in
the population.
Chi-Square Statistical Table
• Chi-sq critical values
• (area to the right of crit val.)
Chi-Square Distributions
Example 1:
Car accidents and day of the week
A study of 667 drivers who were using a cell phone when they were involved
in a collision on a weekday examined the relationship between these
accidents and the day of the week.
Are the accidents equally likely to occur on any day of the working week?
To answer these questions we use the chi-square
goodness of fit test
Data for n observations on a categorical variable

(for example, day of week,)
with k possible outcomes

(k=5 weekdays)
are summarized as observed counts, n1, n2, . . . , nk in k

cells.
The Chi-square statistic
(observed frequency − expected frequency) 2

 =
2
expected frequency
Chi-square distribution with k-1 degrees of freedom, where k

is the number of categories
expected frequency if H0 is true

Decision Rule
2
The χ STAT test statistic approximately follows a chi-
squared distribution with (k-1) degrees of freedom
Decision Rule:
χ 2
If STAT  χ 2
α , reject H0,
otherwise, do not reject 
H0
0
Do not Reject H0 2
reject H0 2α
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 12-14

The Chi-Square Test Statistic
• The χ2 test statistic approximately follows a chi-squared
distribution with k-1 degrees of freedom, where k is the
number of categories.
• If the χ2 test statistic is large, this is evidence against the null
hypothesis.
(Obs − Exp ) 2
2 = 
all cells Exp
Decision Rule: .05

If χ 2
 χ 2
.05 ,reject H0,
otherwise, do not reject 0

H0. Do not Reject H0 2
reject H0
2.05
(compare X2 to table value)
H0 specifies that all days are equally likely for
car accidents ➔ each pi = 1/5.
The expected count for each of the five days is npi = 667(1/5) = 133.4.
2 2
(observed - expected) (count - 133.4)
2 =  = day
= 8.49
expected 133.4
Following the chi-square distribution with 5 − 1 = 4 degrees of freedom.
p
df 0.25 0.2 0.15 0.1 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
1 1.32 1.64 2.07 2.71 3.84 5.02 5.41 6.63 7.88 9.14 10.83 12.12
2 2.77 3.22 3.79 4.61 5.99 7.38 7.82 9.21 10.60 11.98 13.82 15.20
3 4.11 4.64 5.32 6.25 7.81 9.35 9.84 11.34 12.84 14.32 16.27 17.73
4 5.39 5.99 6.74 7.78 9.49 11.14 11.67 13.28 14.86 16.42 18.47 20.00
5 6.63 7.29 8.12 9.24 11.07 12.83 13.39 15.09 16.75 18.39 20.51 22.11
6 7.84 8.56 9.45 10.64 12.59 14.45 15.03 16.81 18.55 20.25 22.46 24.10
Since
7 the
9.04value 8.49
9.80 of
10.75 the test
12.02 statistic
14.07 is
16.01 less than
16.62 the
18.48 table
20.28 value
22.04of 9.49,
24.32 we
26.02
8 10.22 11.03 12.03 13.36 15.51 17.53 18.17 20.09 21.95 23.77 26.12 27.87
do9 not 11.39
reject H12.24
0 13.29 14.68 16.92 19.02 19.68 21.67 23.59 25.46 27.88 29.67
10 12.55 13.44 14.53 15.99 18.31 20.48 21.16 23.21 25.19 27.11 29.59 31.42
➔11There is
13.70 no significant
14.63 15.77 evidence
17.28 19.68of different
21.92 car
22.62 accident
24.72 rates
26.76 for
28.73 different
31.26 33.14
12 14.85 15.81 16.99 18.55 21.03 23.34 24.05 26.22 28.30 30.32 32.91 34.82
weekdays
13 15.98 when
16.98 the driver
18.20 was
19.81 using
22.36 a cell
24.74 phone.
25.47 27.69 29.82 31.88 34.53 36.48
(bounds on P-value)
H0 specifies that all days are equally likely for
car accidents ➔ each pi = 1/5.
The expected count for each of the five days is npi = 667(1/5) = 133.4.
2 2
(observed - expected) (count - 133.4)
2 =  = day
= 8.49
expected 133.4
Following the chi-square distribution with 5 − 1 = 4 degrees of freedom.
p
df 0.25 0.2 0.15 0.1 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
1 1.32 1.64 2.07 2.71 3.84 5.02 5.41 6.63 7.88 9.14 10.83 12.12
2 2.77 3.22 3.79 4.61 5.99 7.38 7.82 9.21 10.60 11.98 13.82 15.20
3 4.11 4.64 5.32 6.25 7.81 9.35 9.84 11.34 12.84 14.32 16.27 17.73
4 5.39 5.99 6.74 7.78 9.49 11.14 11.67 13.28 14.86 16.42 18.47 20.00
5 6.63 7.29 8.12 9.24 11.07 12.83 13.39 15.09 16.75 18.39 20.51 22.11
67.78 <
7.84 2 8.56 9.45 10.64 12.59 14.45 15.03 16.81 18.55
X = 8.49 < 9.49 Thus the bounds on the P-value are 0.05 < P-value 20.25 22.46 24.10
< 0.1
7 9.04 9.80 10.75 12.02 14.07 16.01 16.62 18.48 20.28 22.04 24.32 26.02
We 8don’t10.22
know11.03the exact
12.03 P-value
13.36 but
15.51 we17.53
DO know
18.17 that P-value
20.09 21.95 > 23.77
0.05, thus
26.12 we27.87
9 11.39 12.24 13.29 14.68 16.92 19.02 19.68 21.67 23.59 25.46 27.88 29.67
conclude
10
that
12.55
…
13.44 14.53 15.99 18.31 20.48 21.16 23.21 25.19 27.11 29.59 31.42
➔11There is no14.63
13.70 significant
15.77 evidence
17.28 19.68of different
21.92 car accident
22.62 24.72 rates28.73
26.76 for different
31.26 33.14
12 14.85 15.81 16.99 18.55 21.03 23.34 24.05 26.22 28.30 30.32 32.91 34.82
weekdays
13 15.98
when
16.98
the driver
18.20
was
19.81
using
22.36
a cell
24.74
phone.
25.47 27.69 29.82 31.88 34.53 36.48
Example 2: M & M Colors
❑Mars, Inc. periodically changes the M&M (milk chocolate)

color proportions. Last year the proportions were:
yellow 20%; red 20%, orange, blue, green 10% each; brown
30%
• In a recent bag of 106 M&M’s I had the following numbers

of each color:
Yellow Red Orange Blue Green Brown
29 (27.4%) 23 (21.7%) 12 (11.3%) 14 (13.2%) 8 (7.5%) 20 (18.9%)
• Is this evidence that Mars, Inc. has changed the color

distribution of M&M’s?
Example 2: M & M Colors
• H0 : pyellow=.20, pred=.20, porange=.10, pblue=.10,
pgreen=.10, pbrown=.30
Yellow Red Orange Blue Green Brown Total

Obs. 29 23 12 14 8 20 106
Exp. 21.2 21.2 10.6 10.6 10.6 31.8 106
• Expected yellow = 106*.20 = 21.2, etc. for other

expected counts.
(Obs − Exp ) 2 (29 − 21.2) 2 (23 − 21.2) 2
 2
= 
all cells Exp
=
21.2
+
21.2
+
(12 − 10.6) 2 (14 − 10.6) 2 (8 − 10.6) 2 (20 − 31.8) 2

+ + +
10.6 10.6 10.6 31.8
= 2.87 + 0.153 + 0.185 + 1.091 + 0.638 + 4.379
= 9.316
Example 2: M & M Colors (cont.)
 2 = 9.316;degrees of freedom = 6 − 1 = 5
The test statistic is χ 2 = 9.316 ; χ0.05

2
with 5 d.f. = 11.070
Decision Rule:
If χ 2  χ.05
2
,reject H0,
otherwise, do not reject
H0.
0.05 Here,
χ 2
= 9.316 < χ.05
2
= 11.070,
so we do not reject H0 and
0 conclude that there is not
Do not Reject H0 2 sufficient evidence to conclude
reject H0
20.05 = 11.070 that Mars has changed the color
proportions.
Example
Computer systems crash for many reasons, among them

software failure, hardware failure, operator error, and system
overloading. It is throughtthat10% of the crashes are due to
software failure, 5% due to hardware failure, 25% due to
operator error, 40% to system overloading and the rest to other
causes. Over an extended period of study 150 crashes are
observed and each is classified according to its probable cause.
It is found that 13 are due to software failure, 10 to hardware
failure, 42 to operator failure, 65 to system overloading, and
the rest to other causes. Do these data lead us to suspect the
accuracy of the stated percentages?
Contingency Tables
• A contingency table is a method of summarizing the
relationship between variables. It is a table of frequencies
classified according to the values of the variables in
question. It is used to summarise categorical data. What you
find in the rows of a contingency table is contingent
upon(dependent upon) what you find in the column.
• Used to classify sample observations according to two or

more characteristics
• Also called a cross-classification table.

Contingency Table Example
Left-Handed vs. Gender
Dominant Hand: Left vs. Right
Gender: Male vs. Female
▪ 2 categories for each variable, so

called a 2 x 2 table
▪ Suppose we examine a sample of

300 children
Contingency Table Example(continued)
Sample results organized in a contingency table:
Hand Preference
sample size = n = 300:
Gender Left Right
120 Females, 12
were left handed Female 12 108 120
180 Males, 24 were
left handed Male 24 156 180
36 264 300
Testing for independence
The null hypothesis is that the row and column

variables are independent. The alternative
hypothesis is that the row and column variables
are dependent.
H0: The two categorical variables are independent

(i.e., there is no relationship between them)
H1: The two categorical variables are dependent
(i.e., there is a relationship between them)
Test for the Equality Between
Proportions
H0: π1 = π2 (Proportion of females who are left
handed is equal to the proportion of
males who are left handed)
H1: π1 ≠ π2 (The two proportions are not the same –
hand preference is not independent
of gender)
• If H0 is true, then the proportion of left-handed females

should be the same as the proportion of left-handed males.
Left-handedness is then independent of gender.
Chi-square tests for independence
(Obs − Exp ) 2
 =  2
all cells Exp

❑Expected cell frequencies:
row total  column total
Exp =
n
Where:
row total = sum of all frequencies in the row
column total = sum of all frequencies in the column
n = overall sample size
df=(r-1)*(k-1)
Computing the
Average Proportion
The average X1 + X2 X
p= =
proportion is: n1 + n2 n
120 Females, 12 Here:

were left handed
12 + 24 36
180 Males, 24 were p= = = 0.12
left handed
120 + 180 300
i.e., of all the children the proportion of left handers is 0.12,

that is, 12%
Finding Expected Frequencies
• To obtain the expected frequency for left handed females,

multiply the average proportion left handed (p) by the total
number of females
• To obtain the expected frequency for left handed males,
multiply the average proportion left handed (p) by the total
number of males
If the two proportions are equal, then
P(Left Handed | Female) = P(Left Handed | Male) = .12
i.e., we would expect (.12)(120) = 14.4 females to be left handed

(.12)(180) = 21.6 males to be left handed
Observed vs. Expected
Frequencies
Hand Preference
Gender Left Right
Observed = 12 Observed = 108
Female 120
Expected = 14.4 Expected = 105.6
Male 180
36 264 300
The Chi-Square Test Statistic
Hand Preference
Gender Left Right
Female 120
Male 180
36 264 300
The test statistic is:
(f o − f e ) 2
χ 2STAT = 
all cells
fe
(12 − 14.4) 2 (108 − 105.6) 2 (24 − 21.6) 2 (156 − 158.4) 2
= + + + = 0.7576
14.4 105.6 21.6 158.4
Decision Rule
2
The test statistic is χ STAT = 0.7576 ; χ 02.05 with 1 d.f. = 3.841
Decision Rule:
2
If χ STAT > 3.841, reject H0,
otherwise, do not reject H0
Here,
2 2
0.05 χ STAT = 0.7576< χ 0.05 = 3.841,
so we do not reject H0 and
0 conclude that there is not
Do not Reject H0 2 sufficient evidence that the two
reject H0
20.05 = 3.841 proportions are different at  =
0.05
Example 2: meal plan selection
• The meal plan selected by 200 students is shown below:
Number of meals per week

Class
Standing 20/week 10/week none Total
Level 1 24 32 14 70
Level 2 22 26 12 60
Level 3 10 14 6 30
Level 4 14 16 10 40
Total 70 88 42 200
(cont.)
• The hypotheses to be tested are:
H0: Meal plan and class standing are independent

(i.e., there is no relationship between them)
H1: Meal plan and class standing are dependent
(i.e., there is a relationship between them)
(cont.)
Expected Cell Frequencies
Observed:
Number of meals
per week
Class Expected cell frequencies if H0 is
Standing 20/wk 10/wk none Total true:
Level 1 24 32 14 70
Level 2. 22 26 12 60 Number of meals
Level 3 10 14 6 30 Class per week
Level 4 14 16 10 40 Standing 20/wk 10/wk none Total
Total 70 88 42 200 Level 1 24.5 30.8 14.7 70
Example for one cell: Level 2 21.0 26.4 12.6 60

Level 3 10.5 13.2 6.3 30
Exp = Level 4 14.0 17.6 8.4 40
n
30  70 Total 70 88 42 200
= = 10.5
200
(cont.) The Test Statistic
• The test statistic value is:
(Obs − Exp ) 2
 = 
2
all cells Exp

(24 − 24.5) 2 (32 − 30.8) 2 (10 − 8.4) 2
= + + + = 0.709
24.5 30.8 8.4
χ 0.2 05 = 12.592 from the chi-squared distribution

with (4 – 1)(3 – 1) = 6 degrees of freedom
(cont.)
Decision and Interpretation
The test statistic is  2 = 0.709 ; 0.05
2
with 6 d.f. = 12.592
Decision2 Rule:
If  > 12.592, reject H0, otherwise,
do not reject H0
0.05 Here,
2 2
= 0.709 < χ 0.05 = 12.592,
so do not reject H0
0
Do not Reject H0 2 Conclusion: there is not
reject H0 sufficient evidence that meal
20.05=12.592 plan and class standing are
related.
2 Test of Independence
The Chi-square test statistic is:
( fo − fe )2
2
χ STAT = 
all cells
fe
n where:
fo = observed frequency in a particular cell of the r x c table
fe = expected frequency in a particular cell if H0 is true
χ 2STAT for the r x c case has (r - 1)(c - 1) degrees of freedom
(Assumed: each cell in the contingency table has expected

frequency of at least 1)
Expected Cell Frequencies
• Expected cell frequencies:

fe =
n
Where:
row total = sum of all frequencies in the row
column total = sum of all frequencies in the column
n = overall sample size
Decision Rule
• The decision rule is
2
If χ STAT  χ α2 , reject H0,
otherwise, do not reject H0
2
Where χ α is from the chi-squared distribution
with (r – 1)(c – 1) degrees of freedom
Example
Suppose you have the following categorical data for 3
types of flu in three different regions
Asia Europe America Totals

Flu A 30 15 45 90
Flu B 2 5 53 60
Flu C 53 45 2 100
Totals 85 65 100 250
Is there a relationship between location and types of

flu?

Lecture Chi Square Non Parametric Test

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Chi Square Non Parametric Test

Uploaded by

Copyright:

Available Formats

Lecture Statistics

One-Sample non-Parametric Tests

1. State the null hypothesis, H0 and the

5. Collect data and compute the value of the

• Ordinal Data: Kolmogorov-Smirnov (K-S) Test

• Interval/Ratio Data: Z-test or t-test (Parametric test)

• One variable—how its distribution compares to a

• Two variables—tests whether the two variables are

(observed frequency − expected frequency) 2

For reasonably large n, the above statistic under Ho has an

• Here’s the distribution:

• if there is in fact no relationship between two variables, if

• What we do in practice is to draw one sample and make the

Data for n observations on a categorical variable

with k possible outcomes

are summarized as observed counts, n1, n2, . . . , nk in k

(observed frequency − expected frequency) 2

Chi-square distribution with k-1 degrees of freedom, where k

expected frequency if H0 is true

Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc.. Chap 12-14

Decision Rule: .05

otherwise, do not reject 0

❑Mars, Inc. periodically changes the M&M (milk chocolate)

• In a recent bag of 106 M&M’s I had the following numbers

• Is this evidence that Mars, Inc. has changed the color

Yellow Red Orange Blue Green Brown Total

• Expected yellow = 106*.20 = 21.2, etc. for other

(12 − 10.6) 2 (14 − 10.6) 2 (8 − 10.6) 2 (20 − 31.8) 2

The test statistic is χ 2 = 9.316 ; χ0.05

Computer systems crash for many reasons, among them

• Used to classify sample observations according to two or

• Also called a cross-classification table.

▪ 2 categories for each variable, so

▪ Suppose we examine a sample of

The null hypothesis is that the row and column

H0: The two categorical variables are independent

• If H0 is true, then the proportion of left-handed females

all cells Exp

120 Females, 12 Here:

i.e., of all the children the proportion of left handers is 0.12,

• To obtain the expected frequency for left handed females,

i.e., we would expect (.12)(120) = 14.4 females to be left handed

Number of meals per week

H0: Meal plan and class standing are independent

Example for one cell: Level 2 21.0 26.4 12.6 60

all cells Exp

χ 0.2 05 = 12.592 from the chi-squared distribution

χ 2STAT for the r x c case has (r - 1)(c - 1) degrees of freedom

(Assumed: each cell in the contingency table has expected

row total  column total

Asia Europe America Totals

Is there a relationship between location and types of

You might also like