You are on page 1of 37

CHI SQUARE

χ²
Mary Joy M. Manjares
MAED- Math
Monaliza D. Perez
MAED- Guidance

χ²
WHAT IS CHI-SQUARE?

• Common statistical test used for nominal data


• Non-parametric or distribution-free test
• Developed by Karl Pearson
A. Nominal Variables
-variables that have two or more categories, but which do not have an intrinsic order.
Example:
Type of property into distinct categories such as houses, condos, co-ops or bungalows.
Classifying where people live in the USA by state
CHI-SQUARE?

B. Dichotomous Variables
-nominal variables which have only two categories or levels.
Example:
Gender: somebody as either "male" or "female".
Do you own a mobile phone? Ownership as either "Yes" or "No".
Type of property had been classified as either residential or commercial
CHI SQUARE

C. Requirements for the use of Chi-Square


• Data must be independent.
• Categories into which data are placed must be mutually exclusive.
• All data must be used.
When should you use a chi square test?

✗ The Chi Square Test is appropriate when the following conditions


are met:
the sampling method is simple random sampling
the variable under study is categorical
the expected value of the number of sample observations in
each level of the variable is at least 5

6
GOODNESS OF FIT TESTS

• 𝝌𝟐 is the random variable whose sampling distribution is approximately very close by the Chi-
Square distribution with k-1 degrees of freedom.
• refers to the fitness of the data to follow a certain probability distribution such as normal,
binomial, hypergeometric, Poisson, geometric or any discrete variable.
• To test how good is the fit of the observed data to the theoretical distribution, the formula
(𝑶𝒊 $𝑬𝒊 )𝟐
that follows is used:𝝌𝟐 = 𝜮
𝑬𝒊

Where 𝑂' is the observed frequency in ith category;


𝐸' is the expected frequency in the ith category
DECISION RULE

àReject the null hypothesis if χ2=χα, k-12


àShould not be used if the expected frequency is at least equal to 5

Observed frequencies
Chi square value is Good fit Acceptance of Null
are too close to
expected frequencies small Hypothesis
Hypothesis Testing

Example:
The North Luzon expressway utilizing four lanes in each direction was studied to see whether drivers preferred to drive on the
inside lanes. A total of 1000 cars were observed during the early morning traffic and their respective lanes recorded. Results
are as follow:
Do the data present sufficient evidence to indicate that some lanes are preferred over others?
Hypothesis Testing
•We assume that the random variable is in binomial distribution.
•Using α=0.05
Ho:p1=p2=p3=p4=14 ;
àdata follows a distribution that lanes are not preferred by others
Ha: At least one of the p’s is not equal.
àSome lanes are preferred over by others

LANE 1 2 3 4

OBSERVED 294 276 238 192


GOODNESS OF A FIT TEST

Step 1: Arrange the data in the form of a frequency distribution.

LANE OBSERVED FREQUENCY

1 294

2 276

3 238

4 192

TOTAL 1000
GOODNESS OF A FIT TEST

Step 2: Obtain the expected frequency for each category.


LANE OBSERVED EXPECTED FREQUENCY
FREQUENCY

1 294 250

2 276 250 fe1=fe2=fe3=fe4


fe1=1000/4
3 238 250 =250

4 192 250

TOTAL 1000
GOODNESS OF A FIT TEST

Step 3: Set up a summary table to calculate the chi-square value


LANE 𝒇𝒐 𝒇𝒆 𝒇𝒐 -𝒇𝒆 (𝒇𝒐 − 𝒇𝒆 )𝟐 (𝒇𝒐 − 𝒇𝒆 )𝟐
𝒇𝒆
1 294 250 44 1936 7.744
2 276 250 26 676 2.704
3 238 250 -12 144 0.576
4 192 250 -58 3364 13.456
TOTAL 1000 6122 24.48

Step 4: Find the degrees of freedom.


àdf=k-1; where k is the number of row
df=k-1
= 4-1
df=3
GOODNESS OF A FIT TEST

Step 5: Compare the calculated χ2 with the appropriate chi-square value from the distribution table. α=0.05, df=3.

Step 6: Conclude.
Since, the calculated 𝜒 ! of 24.48 is larger than the table value, we reject the null hypothesis. These findings
suggest that some lanes are preferred over by some drivers. Likewise, we conclude that we have insufficient
evidence to accept that the random variables fitted to a binomial distribution.
CONTINGENCY TABLE
Tests using Contingency Tables
Two tests are the independence variables and the test of homogeneity of
proportions test.

The test of independence is used to determine whether two variables are


independent of or related to each other when a single sample is selected.
The test of homogeneity is used to determine whether the proportions for a
variable are equal when several samples are selected from different populations.

Both tests use the chi square distribution and a contingency table, and the test
value is found in the same way.
TEST OF INDEPENDENCE

To test whether two categorical variable are associated with each other, the formula employed is:
(𝑶 −𝑬 )𝟐
𝒊𝒋 𝒊𝒋
𝝌𝟐 = 𝜮
𝑬𝒊𝒋
Where 𝑂$% is the observed frequency in ith row and jth column;
𝐸$ is the expected frequency in the ith row and jth column.
For a contingency table that has r rows and c columns, the Chi-square test can be generalized as a test of
independence. Thus, as a test of independence, hypotheses are as follows:
𝐻" : There is no relationship between two categorical variables. (The two variables are independent.)
𝐻# : There is a relationship between two categorical variables. (The two variables are not independent.)
TEST OF INDEPENDENCE

𝑫𝒆𝒄𝒊𝒔𝒊𝒐𝒏 𝑹𝒖𝒍𝒆:

àReject H& if χ' ≥ χ'! (r-1)(c-1); otherwise, accept.


"

àReject the null hypothesis at a specified level of significance if the computed


value of chi-square exceeds the table value.
TEST OF INDEPENDENCE

Example:
Considering a study in which the effectiveness of hypnosis as a means of improving the memory of the
eyewitness to a crime is examined and the result is shown:
Hypotheses:
Ho: Hypnosis does not affect the recognition memory of eye witness to a crime.
Ha: Hypnosis affects the recognition memory of eye witness to a crime.
Hypnotized Control

Correct Identification 7 17

Incorrect Identification 33 23

Total 40 40
TEST OF INDEPENDENCE

Step 1: Rearrange the data in the form of a 2x2 table containing the observed frequencies for each cell.
Hypnotized Control Total
Correct Identification 7 17 24
Incorrect Identification 33 23 56
Total 40 40 80

Step 2: Obtain the expected frequencies from each cell.


Hypnotized Control Total

Correct Identification 𝐻$ =
%&(&()
=12 𝐶$ =
%&(&()
=12 24
*( *(

Incorrect Identification 𝐻+ =
,-(&()
=28 𝐶+ =
,-(&()
=28 56
*( *(

Total 40 40 80
TEST OF INDEPENDENCE

Step 3: Subtract the expected frequencies from the observed frequencies.


𝒇𝒐 -𝒇𝒆
𝐻$ (Hypnotized/Correct) 7-12= -5
𝐻+ (Hypnotized/Incorrect) 33-28=5
𝐶$ (Control/Correct) 17-12=5
𝐶+ (Control/Incorrect) 23-28=-5

Step 4: Square the difference.


𝒇𝒐 -𝒇𝒆 (𝒇𝒐 − 𝒇𝒆 )𝟐

𝐻$ (Hypnotized/Correct) -5 25
𝐻+ (Hypnotized/Incorrect) 5 25
𝐶$ (Control/Correct) 5 25
𝐶+ (Control/Incorrect) -5 25
TEST OF INDEPENDENCE

Step 5: Divide the squared difference by the expected frequencies.


𝒇𝒐 -𝒇𝒆 (𝒇𝒐 − 𝒇𝒆)𝟐 (𝒇𝒐 − 𝒇𝒆)𝟐
𝒇𝒆
(Hypnotized/Correct) -5 25 $%
= 2.08
&$

(Hypnotized/Incorrect) 5 25 $%
= 0.89
$'

(Control/Correct) 5 25 $%
= 2.08
&$

(Control/Incorrect) -5 25 $%
= 0.89
&$

Step 6: Add the quotients to obtain the chi-square value.


χ2=Σ(Oij-Eij)2Eij
χ2=2.08+0.89+2.08+0.89
χ2=5.94
TEST OF INDEPENDENCE

Step 7: Find the degrees of freedom.


df= (r-1)(c-1)
df= (2-1)(2-1)
df= (1)(1)
df= 1

Step 8: Compare the obtained chi-square value with the table value at 0.05 level of significance.
TEST OF INDEPENDENCE

Step 9: Conclude.
Since, the obtained chi-square value of 5.94 is greater than the tabular value of
3.84, then we have insufficient evidence to accept the null hypothesis. The result suggests
significant difference in the ability of hypnotized and control subjects in identifying a thief.
The hypnotized subjects were less not more accurate in identifying the thief.
TEST OF HOMOGENEITY

àUsed to test the homogeneity of the responses of the respondents with regard to
certain issues and opinions; where responses are put in a contingency table.
Examples:
Impeachment trial of Pres. Estrada– the reactions of the Filipino people
Opening of the second envelope-favor, not favor or neutral
Survey from students of different colleges
TEST OF HOMOGENEITY

Example:
President Arroyo made a nationwide announcement on television about her conversation with
the COMELEC Commissioner and she asked for public apology. To determine the opinion of the public, a
survey was conducted in 4 towns of La Union. The following table gives the opinion of 2000000 parents
from San Fernando, 1500 parents from Agoo, 1000 parents from Bacnotan and 1000 parents from San
Juan.
At the 0.01 level of significance, test for homogeneity of opinion among the 4 municipalities
concerning the public apology of President Arroyo.
TEST OF HOMOGENEITY

• Step 1: Formulate the hypotheses.


• 𝐻( : For each opinion the proportions of municipalities are the same.
• 𝐻) : For at least one opinion the proportions of the municipalities are not the same.
• Step 2: Determine the expected frequencies.
• The expected frequency in each cell can be determined by getting the product of the
row total and column total then divide the product by the grand total.
(𝒓𝒐𝒘 𝒕𝒐𝒕𝒂𝒍)(𝒄𝒐𝒍𝒖𝒎𝒏 𝒕𝒐𝒕𝒂𝒍)
𝑬𝒙𝒑𝒆𝒄𝒕𝒆𝒅 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 =
𝒈𝒓𝒂𝒏𝒅 𝒕𝒐𝒕𝒂𝒍
TEST OF HOMOGENEITY

OBSERVED FREQUENCIES
Opinion Municipalities

San Fernando Bacnotan Agoo San Juan TOTAL

Agree 650 400 660 340 2050

Disagree 420 330 300 420 1470

No Opinion 930 270 540 360 1980

2000 1000 1500 1000 5500


TEST OF HOMOGENEITY

EXPECTED FREQUENCIES
Opinion Municipalities

San Fernando Bacnotan Agoo San Juan TOTAL

Agree 745.45 372.73 559.09 372.73 2050

Disagree 534.54 267.27 400.91 267.27 1470

No Opinion 720.00 360.00 540.00 360.00 1980

2000 1000 1500 1000 5500


TEST OF HOMOGENEITY

• Step 3: Subtract the expected frequencies from the observed frequencies.


OBSERVED EXPECTED 𝒇𝒐 -𝒇𝒆 (𝒇𝒐 − 𝒇𝒆 )𝟐
650 745.45 -95.45 9100.7025
420 534.54 -114.54 13119.4116
930 720.00 210.00 44100.00
400 372.73 27.27 743.6529
330 267.27 62.73 3935.0529
270 360.00 -90.00 8100.00
660 559.09 100.91 10182.8281
300 400.91 -100.91 10182.8281
540 540.00 0.00 0.00
340 372.73 -172.73 29835.6529
420 267.27 152.73 23326.4529
240 360.00 -120.00 14400.00
TEST OF HOMOGENEITY

• Step 4: Divide the squared difference by the expected frequency.


TEST OF HOMOGENEITY

• Step 5: Add the quotients to obtain the chi-square value.


(𝒇𝒐 − 𝒇𝒆 )𝟐 /𝒇𝒆

9100.7025/745.45 = 12.21
(𝑓 −𝑓 )'
13119.4116/534.54 = 24.54 ( 8
𝜒' = 𝛴
44100.00/720.00 = 61.25 𝑓8
'
𝜒 = 12.21+ 24.54+ 61.25+ 2.00+
743.6529/372.73 = 2.00

3935.0529/267.27 = 14.72 14.72+ 22.50+ 18.21+


8100.00/360.00 = 22.50
25.40+0.00+ 80.05+ 87.28+40.00
10182.8281/559.09 = 18.21
𝜒 ' =388.16
10182.8281/400.91 = 25.40

0.00/540.00 = 0.00

29835.6529/372.73 = 80.05

23326.4529/267.27 = 87.28

14400.00/360.00 = 40.00
TEST OF HOMOGENEITY

Step 6 Find the degrees of freedom.


𝑑𝑓 = 𝑟 − 1 𝑐 − 1
𝑑𝑓 = 3 − 1 4 − 1
𝑑𝑓 = 2 3
𝑑𝑓 = 6
Step 7 Compare the obtained chi-square value with the table value at 0.01 level of significance.
𝜒 - =388.16; Table 𝜒 - =18.812 at 0.01 and df=6
TEST OF HOMOGENEITY

• Step 8: Conclude.
Since, the tabular chi-square value of 16.812 is less than the computed value of 388.16, then there is
sufficient evidence to reject the null hypothesis and conclude that at least the proportions of the opinions in each
municipality are not the same. Meaning, people in different municipalities give different views with regards to the
public apology of Pres. Arroyo.
THANK YOU
FOR
LISTENING!
MERRY
CHRISTMAS
TO ALL!!!
MARY JOY M. MANJARES
MAED- MATH
MONALIZA D. PEREZ
MAED- GUIDANCE

You might also like