You are on page 1of 15

Quantitative Methods 1

Chi-Square

Module 009 - Chi-Square

At the end of this module you are expected to:


1. Explain what is Chi-Square;
2. Determine how Chi-Square statistic work;
3. Familiarize with the 2 x 2 Contingency Table;
4. Recognize Chi Square Goodness of Fit (One Sample Test);
5. Identify the Chi Square Test of Independence

Chi-Square

The Chi Square statistic is usually utilized for testing relationships between unmitigated
variables. The null hypothesis of the Chi-Square test is that no relationship exists on
the all-out variables in the population; they are independent. There are two
fundamentals of irregular values which also produce two data types, which are all
out and numerical. There is a use in the chi square statistics (X2) statistics used to
discover the difference in distributions of clear cut variables in the examination.
Vitally, the clear cut variable gives class data while the numerical variables gives
data in numerical frame. There are specific inquiry-related questions just like “Do
you get a vehicle?” or “What is your major?”. For the reason that they produce data,
such as “no” or “science, these inquiries are clear cut. These reactions to these
inquiries, “What is your G.P.A?” or “How tall are you?” are regarded as numerical.
Numerical data can be either considered as nonstop or discrete. The following table
below may allow you to recognize the differences among the two variables.

Type of Data Type of Question Possible Responses

Categorical What is your sex? Male of Female

Discrete- How many cars do


Numerical two or three
you own?
Continuous – How tall are
Numerical 72 inches
you?

Course Module
Table 1. Categorical and Numerical Sample
Note that discrete data arise from an including procedure, while nonstop data arise from an
estimating procedure. The statistics of Chi square reflect on the counts or counts of
clear cut reactions among (at least two) independent groups.(Note: Chi square tests
should be used for real numbers, not for rates, extent, means, etc.)

How does the Chi-Square statistic work?

The Chi-Square measurement is most ordinarily used to survey Tests of Independence


while using a cross tabulation (generally called a bivariate table). Crosstabulation
presents the distributions of two full scale factors in the meantime, with the unions
of the classes of the factors appearing in the phones of the table. The Test of
Independence studies whether an alliance exists between the two factors by taking
a gander at the watched case of responses in the cells to the model that would be
typical if the factors were extremely autonomous of each other. Learning the Chi-
Square measurement and differentiating it against an essential value from the Chi-
Square distribution empowers the specialist to assess whether the watched cell
counts are fundamentally unique in relation to the ordinary cell checks.

The computation of the Chi-Square statistic is very straightforward and natural:

𝑋2 = ∑ (𝑓0 − 𝑓𝑒 ) 𝑓𝑒

where fo = the watched recurrence (the watched checks in the cells)

what's more, fe = the normal recurrence if NO relationship existed between the


factors

As depicted in the condition, the Chi-Square measurement relies upon the


distinction between what is truly found in the data and what may be typical if there
was truly no connection between the factors.

2 x 2 Contingency Table

There are a few kinds of chi square tests that depend on how the data was collected and
how the hypothesis was tested. The least difficult case we'll start with is a 2 x 2
contingency table. If we set the table 2 x 2 to the general documentation in Table 1,
using the letters a, b, c, and d to indicate the substance of the cells, we would have
the accompanying information at that point.
Variable 1

Variable 2 Data Type 1 Data Type 2 Totals

Category 1 A B a+b
Quantitative Methods 3
Chi-Square

Category 2 C D c+d
Total a+c b+d a+b+c+d=N

Table 2. General notation for a 2 x 2 contingency table.

For a 2 x 2 contingency table the Chi Square statistic is calculated by the formula:
(𝑎𝑑 − 𝑏𝑐)2 (𝑎 + 𝑏 + 𝑐 + 𝑑)
𝑋2 =
(𝑎 + 𝑏)(𝑐 + 𝑑)(𝑏 + 𝑑)(𝑎 + 𝑐)

Note: See that the four elements of the denominator are the four aggregates that can be
seen in the columns and table sections.

Suppose you managed a preliminary medication on a group of creatures by yourself. You


hypothesized that the creatures receiving medication would show expanded pulses
in contrast with those not receiving the medication. You are conducting the study
and collecting the accompanying data:

Ho: The extent of creatures with an expanded pulse is independent of medication


treatment
Ha: The extent of creatures with an expanded pulse is connected with medication
treatment.

Non-increased Heart
Increased Heart Rate Total
Rate

Treated 36 14 50
Not Treated 30 25 55
Total 66 39 109

Table 3. Hypothetical drug trial results.

Applying the formula above we get:

105[(36)(25) − (14)(30)]2
𝐶ℎ𝑖 𝑠𝑞𝑢𝑎𝑟𝑒 = = 3.418
(50)(55)(39)(66)

We need to know how many degrees of opportunity we have, before we can continue.
When a correlation is made between one sample and another, a basic guideline is
Course Module
that the degrees of opportunity are measure up to(number of sections minus one) x
(number of lines minus one) not including the lines or segments aggregates. This
provides (2 - 1) x (2 - 1) = 1.
We have directly our chi square measurement (x2 = 3.418), our intended alpha dimension
of centrality (0.05), and our degrees of chance (df = 1). Entering the Chi square
distribution table with 1 dimension of chance and examining our value of x2 (3.418)
along the segment lies somewhere within the scope of 2.706 and 3.841.Looking at
likelihood is between the likelihood levels of 0.10 and 0.05. That means the p - value
is above 0.05 (extremely 0.065). Because a p - value of 0.65 is more conspicuous
than the recognized essentialness dimension of 0.05 (i.e. p > 0.05), we disregard the
invalid speculation. By the end of the day, in the degree of animals whose beat
extended, there is no measurably huge contrast.

What are the the the possible consequences if the the the number of the control animals,
where their beat is not increased, is reduced to 29 as opposed to 30 and thus the
number of controls, the heart rate of which does not yield any increase, is changed
from 25 to 26? Examine it. Note that 4.125 is the the the the new X2 value and and
and and that it outperforms the table value of 3.841 (at alpha 0.05 and 1 dimension
of chance ). This implies p < 0.05 (it is now is is 0.04) and we are expelling the
invalid theory for elective speculation - the beat of animals is different between
treatment groups. Exactly when we considered p < 0.05 all things suggest this as a
critical contrast.

Df. 0.5 0.10 0.05 0.02 0.01 0.001


1 0.455 2.706 3.841 5.412 6.635 10.827
2 1.389 4.605 5.991 7.824 9.210 13.815
3 2.366 6.251 7.815 9.837 11.345 16.268
4 3.357 7.779 9.488 11.668 13.277 18.268
5 4.351 9.236 11.070 13.388 15.086 20.517

Table 4. Chi Square distribution table probability level (alpha).

Plug your watched and anticipated values into the accompanying applet to make the chi square
estimates somewhat simpler. Tap the cell and enter the value afterwards. Tap the figure
catch in the bottom right corner to see the imprinted chi square value in the bottom left
corner.

Chi Square Goodness of Fit (One Sample Test)

This test allows us to contrast an accumulation of unmitigated data and some hypothetical
anticipated distribution. This test is regularly used in hereditary qualities to
contrast the effects of a cross and the hypothetical distribution dependent on
hereditary hypothesis. Suppose you made a straightforward monohybrid cross
between two people heterozygous for the attribute of intrigue.

𝐴𝑎 𝑥 𝐴𝑎

A a Total
Quantitative Methods 5
Chi-Square

A 10 42 52
a 33 15 48
Total 43 57 100

Table 5. Results of a monohybrid cross between two heterozygotes for the 'a' gene.

For the A type, its phenotypic ratio is 85 and for the homozygous recessive type, it is 15 for
a cross between two heterozygotes of the monohybrid. In addition, the phenotypic
ratio is already predicted to be 3:1. The expected results for A - type should be 75
and 25 for a - type. Are the expected results different from the actual results?

(𝑜𝑏𝑠𝑒𝑟𝑣𝑒 − 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑)2
𝑋2 = ∑
𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑

By following these steps, the chi square statistic x2 can be calculated: Each number
observed will be subtracted to the corresponding number . It is the formula (O - E)

The difference will be squared [ (O —E)2 ].

The squares from the table are divided by the number expected for that particular cell[ (O -
E)2 / E ].

Get the summary of all the values (O - E)2/E . That's how the chi square statistics are
calculated.
The calculation in this case is below:

(𝑂 − 𝐸)2
Observe Expected (O-E) (O-E)2
𝐸
A-type 85 75 10 100 1.33
a-type 15 25 10 100 4.0
Total 100 100 5.33

Table 6. Outcomes of the monohybrid cross from the 2 heterozygotes especially for the "a"
gene.

x2 = 5.33

The statistic of the chi square are now given (x2 = 5.33), our level of opportunity is equal to
1, and our predetermined meaning level of alpha is 0.05. Looking at our chi square
distribution table with 1 as the level of opportunity. As we look at the line, we can
see our value of x2 5.33 in the range of 3.841 and 5.412. .The relative probability is
Course Module
0.05<P<0.02.This is obviously less than the 5 percent or 0.05 preferred level, so we
reject the null hypothesis where the two distributions are the same. On the other
hand, when the x2 statistic exceeds the basic value of a probability level of 0.05, we
conclude that the null hypothesis of equal distributions is dismissed.We can now
reject the null hypothesis hypothesis because the x2 statistics (5.33) are higher than
the basic value with a probability level of 0.05 (3.841), where the specified values of
our cross are considered equal to the distribution of the hypothesis in the ratio of
3:1.

Chi Square Test of Independence

When used in a contingency table that usually has c sections and r lines, the chi square test
can be considered as an independence test. The following are the elective and null
specifications when testing its independence

Ho: The two categorical variables are independent.

Ha: The two categorical variables are related.

We can use the equation Chi Square = the sum of all the (fo - fe)2 / fe

The fe represents the frequency of expected values while the the the fo represents the
frequency of the observed data The organized table is going to be like the table
below:
Category 1 Category 1 Category 1 Row Totals
Sample A A B E a+b+c
Sample B D E F d-e-f
Sample 3 G H I g+h+i
Column
a+d+g b+e+h c+f+i a+b+c+d+e+f+g+h+i=N
House

Table 8. Frequency Table

For now, it is necessary to calculate the expected values in each table cell.We can calculate
it by using the total row equation times the total column, divided by the total (N)
column.For example, the expected value would be (a+b+c)(a+d+g)/N for a cell. After
the expected values are calculated for each cell, we can use the same procedure are
before for a simple 2 x 2 table

Chi Distribution

This distribution is an exceptional case of gamma distribution. A n - degree distribution of


the chi square is considered proportional to the distribution of gamma with b = 0.5
(or β = 2) and a = n/2.

.It must be assumed that the distribution of the customary takes a volatile sample. The chi
square distribution is the division of the overall erratic samples squared. The degree
Quantitative Methods 7
Chi-Square

of chance (k) is equivalent to the summary number of samples. of For example, if


you have a case from the mean distribution that took 10 samples, df=10. In the
distribution of the chi square, the degrees of chance are added to its mean. The
average of this clear distribution in the given model will be 10. In each case, the chi
square distribution is skewed. Regardless of this, the more noticeable the degrees of
chance, the more will take the chi square distribution when the standard
distribution is completed.

The distribution of chi is a distribution that is practically identical. It displays the square
base of a variable moved by the distribution of chi - square ; with the degrees of
freedom = n > 0. The degrees of chance have a likelihood thickness limit which is:

f(x) = 2(1-n/2) x(n-1) e(-(x2)/2) / Γ(n/2)

For values where x is positive.

Figure 1: Chi-Square Distribution


URL: http://www.statisticshowto.com/probability-and-statistics/chi-square/
Retrieved: September 08, 2018

In this function, its cdf does not have a closed form, but it can be approximated with the use
of calculus with a series of integrals.
Uses

The distribution of the chi square contains numerous statistical uses that include:

• Confidence interval estimation for a population standard deviation of an


ordinary distribution from a sample standard deviation.

• Independence of two criteria of grouping of qualitative variables.

• Relationships between straight out variables (contingency tables).

• Sample variance study when the hidden distribution is typical.

Course Module
• Tests of deviations of differences among expected and watched frequencies
(one-way tables).

• The chi-square test (a goodness of fit test).

How to Calculate a Chi Square Statistic

Its purpose is to test hypothesis.

(𝑂𝑖 − 𝐸𝑖 )2
𝑋𝑐2 = ∑
𝐸𝑖

The chi-square equation is a troublesome formula to manage. That is for the most part
because there is a requirement for a large number to be included. One simple way is
through a table construction to illuminate the equation.

Question sample: Visual artists were added in a survey for identifying their zodiac sign.
There were 256 of them. The gathered results show that there were 23 for Pisces,
20 for Aquarius, 18 for Capricorn, 23 for Sagittarius, 20 for Scorpio, 19 for Libra, 18
for Virgo, 21 for Leo, 19 for Cancer, 22 for Gemini, 24 for Taurus and 29 for Aries.
Discover if the hypothesis that there are even distribution of zodiac signs for the
visual artists.

Step 1: Organize a table with columns for "Categories,""Observed,""Expected,""Residual


(Obs - Exp),""(Obs - Exp)2" and "Component (Obs - Exp)2/Exp." You should not
worry about not understanding these terms. In the following steps we will be
discussing it.
Quantitative Methods 9
Chi-Square

Figure 2: Step 1 Sample


URL: http://www.statisticshowto.com/probability-and-statistics/chi-square/
Retrieved: September 08, 2018

Step 2:Fill the Categories column. In the question, it should be stated. The number of zodiac
signs are 12 based on the question, so:

Figure 3: Step 2 Sample


URL: http://www.statisticshowto.com/probability-and-statistics/chi-square/
Retrieved: September 08, 2018

Step 3: Complete your numbers. It is the specific number for column 2 in each
corresponding item. In the question, too, the counts are given:

Figure 4: Step 3 Sample

Course Module
URL: http://www.statisticshowto.com/probability-and-statistics/chi-square/

Step 4. It is necessary to calculate the expected value to be written in column 3. Based on


the question, the 12 zodiac signs are expected to be distributed to all 256 people, so
256/12 = 21.333. Fill the expected value in column 3.

Figure 5: Step 4 Sample


URL: http://www.statisticshowto.com/probability-and-statistics/chi-square/
Retrieved: September 08, 2018

Step 5:The expected value from the Step 4 will be subtracted by the observed value
from Step 3 . The results will then be placed in the "Residual" column. For example,
29 - 21.333=7.667 is the Aries which will be placed in the first row.
Quantitative Methods 11
Chi-Square

Figure 6: Step 5 Sample


URL: http://www.statisticshowto.com/probability-and-statistics/chi-square/
Retrieved: September 08, 2018

Step 6: The results from the Step 5 should be squared. The amounts are placed in
the column (Obs - Exp)2.

Figure 7: Step 6 Sample


URL: http://www.statisticshowto.com/probability-and-statistics/chi-square/
Retrieved: September 08, 2018
Course Module
Step 7: The expected value in the Step 4 should be divided by the amounts in Step 6. Place
all the results in the last column after that.

Figure 8: Step 7 Sample

URL: http://www.statisticshowto.com/probability-and-statistics/chi-square/
Retrieved: September 08, 2018

Step 8: Obtain the sum of all possible values in the last column

Figure 9: Step 8 Sample


URL: http://www.statisticshowto.com/probability-and-statistics/chi-square/
Retrieved: September 08, 2018
Quantitative Methods 13
Chi-Square

References and Supplementary Materials


Books and Journals
1. Thomas W. MacFarland, Jan M. Yates; 2016; Introduction to Nonparametric Statistics
for the Biological Sciences Using R; Switzerland; Springer International Publishing
2. David M. Diez, Christopher D. Barr, Mine Çetinkaya-Rundel; 2015; USA; OpenIntro,
Incorporated,

Online Supplementary Reading Materials


1. The Chi-Square Statistics; http://math.hws.edu/javamath/ryan/ChiSquare.html;
September 08, 2018
2. Use of Chi-Square Statistics; ocw.jhsph.edu/courses/FundEpiII/PDFs/Lecture17.pdf;
September 08, 2018
3. The Chi-Square Test Statistics;
http://www.stat.wmich.edu/s216/book/node114.html; September 08, 2018
4. Chi-Square Tests;
https://newonlinecourses.science.psu.edu/statprogram/reviews/statistical-
concepts/chi-square-tests; September 08, 2018

Online Instructional Videos

1. And we've already done some hypothesis testing with the chi-squared statistic, and
we've even done some hypothesis testing based on two-way tables. And now we're
going to extend that by thinking about a chi-squared test for association between
two variables. So let's say that we suspect that someone's foot length is related to
their hand length. That these things are not independent. Well, what we can do is set
up a hypothesis test. And remember, the null hypothesis in a hypothesis test, is to
always assume no news. So what we could say is here is that there is no association.
No association between, between foot and hand length. Another way to think about
it is that they are independent. And oftentimes what we're doing is called a chi-
squared test for independence. And then our alternative hypothesis would be our
suspicion there is an association. There is an association. So, foot and hand length
are not independent. So what we can then do is go to a population, and we can
randomly sample it. And so let's say we randomly sample 100 folks. And for all of
those 100 folks, we figure out whether their right hand is longer, their left hand is
longer, or both hands are the same. And we also do that for the feet, and we tabulate
all of the data. And this is the data that we actually get. Now it's worth thinking
about this for a second on how what we just did is different from a chi-squared test
for homogeneity. And a chi-squared test for homogeneity, we sample from two
different populations where we look at two different groups, and we see whether
the distribution of a certain variable amongst those two different groups is the
Course Module
same. Here we are just sampling from one group, but we're thinking about two
different variables for that one group. We're thinking about feet length, and we're
thinking about hand length. And so you can see here, that 11 folks had both their
right hand longer and their right foot longer. Three folks had their right hand longer,
but their left foot was longer. And then eight folks had their right hand longer, but
both feet were the same. Likewise, we had nine people where their left foot and
hand was longer, but you had two people where the left hand was longer, but the
right foot was longer. And we can go through all of these. But to do our chi-squared
test, we would've said, what would be the expected value of each of these data
points if we assumed that the null hypothesis was true? That there was no
association between foot and hand length. So to help us do that, I'm going to make a
total of our columns here, and also a total of our rows here. Let me draw a line here,
so we know what's going on. And so, what are the total number of people who had a
longer right hand? Well, it's going to be 11 plus three plus eight, which is 22. The
total number of people who had a longer left hand is two plus nine plus 14, which is
25. And then the total number of people whose hands had the same length, 12 plus
13 plus 28, 25 plus 28, that is 53. And then if I were to total this column, 22 plus 25
is 47, plus 53, we get 100, right over here. And then if we total the number of people
who had a longer right foot, 11 plus two plus 12, is 13 plus 12, that is 25. Longer left
foot, three plus nine plus 13, that's also 25. And then we can either add these up, and
we would get 50, or we could say, hey 25 plus 25 plus what is 100? Well, that is
going to be equal to 50. Now to figure out these expected values, remember, we're
going to figure out the expected values assuming that the null hypothesis is true.
Assuming that these distributions are independent. That feet length and hand length
are independent variables. Well, if they are independent, which we are assuming,
then our best estimate is that 22% have a longer right hand, and our best estimate is
that 25% have a longer right foot. And so out of 100, you would expect 0.22 times
0.25 times 100 to have a longer right hand and foot. I'm just multiplying the
probabilities, which you would do if these were independent variables. And so 0.22
times 0.25, let's see, one fourth of 22 is 5 1/2, so this is going to be equal to 5.5. Now
what number would you expect to have a longer right hand, but a longer left foot? So
that would be 0.22 times 0.25 times 100. Well, we already calculated what that
would be. That would be 5.5. And then to figure out the expected number that would
have a longer right hand, but both feet would be the same length, we could multiply
22 out of 100 times 50 out of 100 times 100, which is going to be half of 22, which is
equal to 11. And we can keep going. This value right over here would be 0.25 times
0.25 times 100, 25 times 25 is 625, so that would be 6.25. This value right over here
would be 0.25 times 0.25 times 100, which is again, 6.25. And then this value right
over here, a couple of ways we can get it. We can multiply 0.25 times 50 times 100,
which would get us to 12.5, or we could have said this plus this plus this has to equal
25, so this would be 12.5. And on this expected value, we can figure out because 5.5
plus 6.25 plus this is going to equal 25. So let's see, 5.5 plus 6.25 is 11.75. 11.75 plus
13.25 is equal to 25. Same thing over here. This would be 13.25, 'cause this is 11.75
plus 13.25 is equal to 25. If we add these two together, we get 26.5. 26.5 plus what is
equal to 53? Well, it'd be equal to another 26.5. Now once you figure out all of your
expected values, that's a good time to test your conditions. The first condition is that
you took a random sample. So let's assume we had done that. The second condition
is that your expected value for any of the data points has to be at least equal to five.
And we can see that all of our expected values are at least equal to five. The actual
data points we got do not have to be equal to five. So it's okay that we got a two
here, because the expected value here is five or larger. And then the last condition is
Quantitative Methods 15
Chi-Square

the independence condition. That either we are sampling with replacement or that
we have to feel comfortable that our sample size is no more than 10% of the
population. So let's assume that that happened as well. So assuming we met all of
those conditions, we are ready to calculate our chi-squared statistic. And so what
we're going to do, is for every data point, we're going to find the difference between
the data point, 11 minus the expected, minus 5.5, squared over the expected, so I did
that one. Now I'll do this one. So plus three minus 5.5 squared over 5.5 plus, now I'll
do this one, eight minus 11 squared over 11, then I'll do this one, two minus 6.25
squared over 6.25. And I'll keep doing it. I'm going to do it for all nine of these data
points. And I actually calculated this ahead of time to save some time. And so if you
do this for all nine of the data points, you're going to get a chi-squared statistic of
11.942. Now before we calculate the P-value, we're going to have to think about
what are our degrees of freedom? Now we have a three-by-three table here, so one
way to think about it, it's the number of rows minus one, times the number of
columns minus one, and this is two times two, which is equal to four. Another way to
think about it is if you know four of these data points and you know the totals, then
you can figure out the other five data points. And so now we are ready to calculate a
P-value. And you can do that using a calculator, and you can do that using a chi-
squared table, but let's say we did it using a calculator, and we get a P-value of 0.018
And just to remind ourselves what this is, this is the probability of getting a chi-
squared statistic at least this large or larger. And so next, we do what we always do
with hypothesis testing. We compare this to our significance level. And we actually
should have set our significance level from the beginning. So let's just assume that
when we set up our hypotheses here, we also said that we want a significance level
of 0.05. You really should do this before you calculate all of this. But then you
compare your P-value to your significance level, and we see that this P-value is a
good bit less than our significance level. And so one way to think about it is, we got
all these expected values assuming that the null hypothesis was true. But the
probability of getting a result this extreme or more extreme is less than 2%, which is
lower than our significance level. And so this will lead us to reject our null
hypothesis and it suggests to us that there is an association between hand length
and foot length.; https://www.khanacademy.org/math/ap-statistics/chi-square-
tests/chi-square-tests-two-way-tables/v/chi-square-test-association-
independence; September 08, 2018

Course Module

You might also like