You are on page 1of 27

The Chi-square Test for

Independence

Dr. Tarek Tawfik


Indications

:Type of descriptive statistics -1


The frequency distribution contained in a
bi-variate table. Nominal, Ordinal and
.Categorized Interval/ratio data
:Number of samples -2
The Chi-square test is basically the same,
regardless of whether we have one, two or
.more than two samples
12/07/21 Dr Tarek Amin 2
Chi-square test 21

For one sample it is called


.Goodness-of-fit-test

For two or more samples it is


called Chi-square for
.independence
12/07/21 Dr Tarek Amin 3
Statistical Independence
:Two variables are statistically Independent
If the classification of those cases in terms of
one variable is not related to the
classification of those cases in terms of
.the other variable

One variable can not be interpreted,


classified, changed in response to
.other
12/07/21 Dr Tarek Amin 4
Income and Place of Residence
Income Place of
residence
Total High Low
)%( .No )%( .No )%( .No
323(45) 118(34) 205 (55) Rural

397(55) 230(66) 167(45) Urban

720(100) 348(100) 372 (100) Total


Null hypothesis
H0: income and place of residence are
. independent of each other

Ha: income and place of residence are


not independent (dependent) of
. each other

Under the null hypothesis of independence


the relative (%) frequencies for each group are
expected to be the same as that for
combined. the groups
Expected relative frequencies
Income Place of
residence
Total High Low
% 45 % 45 45% Rural

55% % 55 55% Urban

% 100 100% 100% Total

If this is true, we should not expect random samples


to reflect the hypothesis of independence
Three different random samples

Total High low Residence

45% 44% 47% Rural


55% 56% 53% Urban
100% 100% 100% Total
The Chi-square is a mean by
which we can capture this
Total High low Residence
difference between
45% 30% 65% Rural observed and expected
55% 70% 35% Urban frequencies
100% 100% 100% Total

Total High low Residence

45% 40% 52% Rural


55% 60% 48% Urban
100% 100% 100% Total
Chi-square tests that this difference is not random
.)due to chance(

The
TheChi-square
Chi-squarestatistic
statisticisiscalculated
calculatedfrom
fromthe
thedifference
difference
between
betweenthe
theobserved
observedandandexpected
expectedfrequencies
frequenciesinineach
eachcell
cellofof
.a.abi-variate
bi-variatetable
table

The
TheChi-square
Chi-squaredistribution
distributionisisthe
theprobability
probabilitydistribution
distributionofof
the
theChi-square
Chi-squarestatistic
statisticfor
foran
aninfinite
infinitenumber
numberofofrandom
random
samples
samplesofofthe
thesame
samesize
sizedrawn
drawnfrom
frompopulations
populationswhere
wherethe
the
.two
.twovariables
variablesare
areindependent
independentforforeach
eachother
other

 
2  Fo  Fe  2

Fe
The Chi-square Distribution

Rt. Skewness
Long single tail
Always positive
Squared value

12/07/21 Dr Tarek Amin 10


The Chi-square Distribution

The greater the difference between the observed


frequencies and the expected frequencies, the
larger the value of chi-square.
Chi-square is calculated on squared differences
so it is always positive (the rule of tail direction
is not applicable).
It has a long tail, reflecting the fact that it is
possible to select random samples that yield a
very high value for Chi-square, even though the
variables are independent.

12/07/21 Dr Tarek Amin 11


Place of residence and
income
100
90
80 45
70 66
60
% 50 Urban
40 Rural
30 55
20 34
10
0
Low High
we can easily see that there is obviously a
difference between low and high income earners in
terms of where they live, but could this be due to
12/07/21
?random variation 12
Dr Tarek Amin
Calculating the expected frequencies to answer
this question
Income Place of
residence
Total High Low

323 156.6=348/100*45 167.4=372/100*45 Rural Row


total

397 191.4=348/100*55 204.6=372/100*55 XUrban

720 348 372 Column total


Total Grand
total

To
Tocalculate
calculatethe
theexpected
expectedfrequency
frequencyfor
for
each
eachcell
cellmultiply
multiplythe
thecolumn
columntotal
totalby
by
the
therow
rowtotal
totaland
anddivide
dividethe
theproduct
productbyby
the
thegrand
grandtotal
total
Observed and expected differences

Income Place of
residence
Total High Low

323 118 205 Rural


323X348/ 720=156.1 323X372/720= 166.9
397 230 167 Urban
397X348/720=191.9 X 372 /720 = 205.1 397
720 348 372 Total
Expected value: for any cell = column total * row total / grand total

 2  Fo  Fe  2

Fe
Calculation of 2

Income Place of
residence
Total High Low

323  2= (118-156.1)  2= (205- 166.9)2/ Rural


156.12/= 9.3 166.9 = 8.7

397 Urban
 2= (230-  2= (167- 205.1)2/
191.9)2/191.9= 7.6 205.1 =7.1

720 Total
 Fo  F 
348 372 2

 
2 e

12/07/21 Fe
Dr Tarek Amin 15
.Calculation of the Chi-square

2  
 Fo32.7
 Fe 
2
= 8.7+9.3+7.1+7.6 =
Fe

Refer
Refertototable
tablethe
thecritical
criticalvalues
valuesof
of
Chi-square
Chi-squareafter
aftercalculating
calculatingthe
the
degree
degreeof offreedom
freedom
df
df=(r-1)(c-1)
=(r-1)(c-1)=1
=1
r=
r=number
numberof ofrows
rows
c=
c=number
numberof ofcolumns
columns

12/07/21 Dr Tarek Amin 16


Critical values for Chi-square
Level of Significance df
0.001 0.01 0.05 0.10 0.20 0.30 0.50 0.70 0.90 0.99
10.827 6.635 3.841 2.706 1.642 1.074 0.455 0.148 0.0158 0.00016 1
13.815 9.210 5.991 4.605 3.219 2.408 1.386 0.713 0.211 0.0201 2
16.268 11.341 7.815 6.251 4.642 3.665 2.366 1.424 0.584 0.115 3
18.465 13.277 9.488 7.779 5.989 4.878 3.357 2.195 1.064 0.297 4
20.517 15.086 11.070 9.236 7.289 6.064 4.351 3.000 1.610 0.554 5
.
.
59.703 50.892 43.773 40.256 36.250 33.530 29.336 25.508 20.599 14.953 30

χ2critical= 3.841
)alpha= 0.05 and df=1(

?Is there a relation (dependence) between income and residence


Enterobacter species are a major cause of Nosocomial (hospital acquired)
gram negative bacteremia. Of interest is the ability of the organism to
.develop resistance to antibiotic administered
A study was conducted to determine the emergence of multi-drug resistance
of Enterobater bacteremia and the clinical setting in which the condition
.occurs (the previous intake of antibiotics)
.The data can be tabulated in the following 2x2 table

Total Multi resistant isolates Antibiotics in the


No Yes last two weeks

103 67 36 Yes
26 25 1 No
129 92 37 Total
Can we conclude that there is a relationship between multi-resistant
?Enterobacter status and the previous intake of antibiotics
The Chi-square goodness-of-fit test

The chi-square goodness of fit test is non-


parametric test for the frequency
distribution of cases across a range of
.values for a single variable

.Single variable with multiple categories


*Seasons of a year
*Hours of a day
Is the crime rate affected by the
?seasons
We are interested in the distribution of crime rate
.across the range of seasons
According to the null hypothesis we
assume that there is no relationship
.between crime rate and season
So fe= total number of crime/4 Seasons of the year
f
. e= the expected frequency in each category
However, the crime rate might be affected by random events that
cause the distribution to be little bit different from this expected
.result

Not every sample will conform with this


expectation of an exactly equal
.number of crimes in each season
This difference (the expected and
the observed) can be

 fo  fe 
calculated using the Chi- 2

.square statistic

2

fe
Rate of crime over seasons
Total Autumn Winter Spring Summer

1020 250 200 270 300 Observed

1020 255 255 255 255 Expected

5- 55- 15 45 Residual

The
Theexpected
expectedfrequency
frequencyobtained
obtainedby
by
the
thetotal
totalnumber
number/number
/numberof ofcategories=
categories=1020/4=255
1020/4=255
The
Theresidual
residualisisthe
thedifference
differencebetween
betweenthe
theexpected
expected
.and
.andthetheobserved
observeddifference
difference
Applying to the formula

 2

 fo  fe  2
fe
2
/255)250-255(+ 2/255)200-255(+2/255)270-255(+ 2/255)300-255( =
20.78 =

Look at the table for critical score at ά level of 0.05


:and a degree of freedom
df= K (number of categories)-1
df= 4-1=3
Critical values for Chi-square

Level of Significance df
0.001 0.01 0.05 0.10 0.20 0.30 0.50 0.70 0.90 0.99
10.827 6.635 3.841 2.706 1.642 1.074 0.455 0.148 0.0158 0.00016 1
13.815 9.210 5.991 4.605 3.219 2.408 1.386 0.713 0.211 0.0201 2
16.268 11.341 7.815 6.251 4.642 3.665 2.366 1.424 0.584 0.115 3
18.465 13.277 9.488 7.779 5.989 4.878 3.357 2.195 1.064 0.297 4
20.517 15.086 11.070 9.236 7.289 6.064 4.351 3.000 1.610 0.554 5
.
.
59.703 50.892 43.773 40.256 36.250 33.530 29.336 25.508 20.599 14.953 30

χ2
2 ==7.815
7.815
χcritical
critical
)alpha=
)alpha=0.05
0.05and
anddf=3
df=3( (
?What this does mean

The value of the sample Chi-square


falls into the critical region,
therefore the null hypothesis of an
even distribution of crimes across
.the seasons is rejected

There is some association between crime


.rate and the seasons
12/07/21 Dr Tarek Amin 25
A research team conducted a survey in which subjects were adult
smokers. Each subject in a sample of 200 was asked to indicate the
extent to which he/she agreed with the statement ' I would like to quit
:smoking.' The results were as follow

Strongly disagree Disagree Agree Strongly agree Responses

8 60 30 102 Responding number

,Can one conclude on the basis of these data that, in the sampled population
?opinions are not equally distributed over the four levels of agreement
.Let the probability of committing a type I error be 0.05 and find the P value
Thank you

You might also like