TUTORIAL 1 CHI SQUARE TEST
Introduction
Inheritance obeys the same rules of probablilty that apply to tossing coins and
rolling dice. Mendels great achievement was his recognition from experimental
results that this is so.
A simple case: given a pair of alleles of a gene, one dominant and one recessive,
their recombination at fertilization is like flipping two coins at the same time.
A coin has two sides, head and tail (like two alleles of a gene). If you flip two coins
and examine the paired outcomes, you will observe three possible combinations:
HH, HT, and TT. Since there is half a chance that either coin will come up heads,
the probability of a homozygous outcome, HH (also of hh), is 1/2 x 1/2 = 1/4. The
outcome, HT, can arise in two ways so its probability is 1/4 + 1/4 = 2/4. The
outcome of many such trials of HT x HT (flipping 2 coins and examining which
pair of faces is up) will be 1/4HH + 2/4HT + 1/4TT. The more times you flip the
coins together, the closer you will come to these ideal ratios.
When you examine the results of a genetic cross you may ask if the numbers you
observe are in agreement with the hypothetical outcome of the cross. For
example, among the progeny of a monohybrid cross Rr x Rr, you expect that 3/4
will have phenotype R_ and 1/4 rr. The phenotypes you observe and count
probably wont match these ratios exactly because chance plays a role in
biological phenomena, in this case, fertilization events. Chance enters again with
the corn cross, since you will not be counting all kernels on every ear, and some
kernels are missing due to handling or consumption by mice.
Is the difference between your observation and the expected result small enough
that it could have been produced by chance alone. This is the null hypothesis
that there is no real difference between the observed data and the predicted data.
Example
Suppose you counted 79 R_ and 33 rr. The total number of individuals you
counted, N, is 112. You expect 3/4 to be R_ (84) and 1/4 to be rr (28). Are your
results close enough to these ratios for you to accept the null hypothesisthat
there is no real difference? The Chi-square test is one tool for making this
decision.
Phenotypes
R_
rr
Total
Observed
(O)
79
33
112
Expected
(E)
(3/4) x 112 = 84
(1/4) x 112 = 28
112
D=O-E
D2
D2/E
-5
5
0
25
25
0.30
0.89
1.19
2 = (Observed - Expected)2/(Expected).
This means add up the values in the last column.
You can compare the chi-square sum, 1.19, with the numbers in a table of
critical values to decide whether to accept the null hypothesisthat the observed
results are so close to expected results that there is no difference, and our
original hypothesis is accepted.
Table 1. Selected percentile values of the 2 distribution
df*
1
2
3
4
5
6
.99
.000157
.0201
.115
.297
.554
.872
Probablilities
.95
.50
.10
.00393
.455
2.706
.003
1.386
4.605
.352
2.366
6.251
.711
3.357
7.779
1.145
4.351
9.236
1.635
5.348
10.645
*df is degrees of freedom.
.05
3.841
5.991
7.815
9.488
11.070
12.592
.01
6.635
9.210
11.341
13.277
15.086
16.812
Degrees of freedom: In a two-phenotype system, when you know the number for
one phenotype, the result for the other (the rest of the population) is
automatically determined. In this kind of genetic data, the number of degrees of
freedom is one less than the number of different phenotypes observed.
As a rule, if the probability of obtaining a particular 2 value is greater than 5 in
100 (P > 0.05), then the difference between expected and observed is not
considered statistically significant, and the null hypothesis is accepted.
Since we observed 2 different phenotypes in the monohybrid cross, there is only 1
degree of freedom; numbers in only the first row of the above table are relevant.
The value 1.19 falls between probabilities of .50 and .10. This is interpreted to
mean that in 10 to 50 out of 100 observed samples (10 to 50 percent of the time),
we could expect 2 values this big or bigger due to chance. Thats reasonable; the
observed deviation can simply be a chance or sampling error. Note that the test
does not prove that the hypothesis is true; it indicates that the observations
provide no statistically compelling argument against it.
Mendelian Data
Question 1 - Monohybrid cross
In corn genetics, P (purple) is the dominant allele, and p (yellow) is the recessive
allele. Wild-type plants have yellow-pigmented grains, while mutant individuals have
purple grains. The outcome of a monohybrid cross of two parent plants (Pp X pp)
yielded 39 purple kernels from a total of 110 kernels counted. Answer the following
question, with reference to a cross-breeding diagram (Branch or Punnet Square).
Phenotypes
Observed
(O)
Expected
(E)
Total
D=O-E
D2
D2/E
0
2
: .
Degrees of freedom
: .
Range of probability
: .
Accept or reject null hypothesis
: .
Question 2 - Dihybrid cross
In chickens, colored feathers (F) are dominant over white feathers (f) and a large
comb(B) is dominant over a small comb (b):
a. A hen is Ffbb and a rooster is ffBb. Give their phenotypes.
b. Do the Punnett Square for this cross.
c. Give the proportions of each of the four possibilities assuming independent
assortment.
Over several years 84 chicks are born to this hen and rooster. There are 16 with
white feathers and large combs, 22 with white feathers and small combs, 28 with
colored feathers and large combs and 9 with colored feathers and small combs.
Do the Chi Square test to see if this is significant or not.
What is the null
hypothesis? Does the cross in problem 2 fulfill the null hypothesis or not?
CHI SQUARE TABLES
df\area
0.995
0.990
0.975
0.950
0.900
0.750
0.500
0.250
0.100
0.050
0.025
0.010
0.005
0.00004 0.00016 0.00098 0.00393 0.01579 0.10153 0.45494 1.32330 2.70554 3.84146 5.02389 6.63490 7.87944
0.01003 0.02010 0.05064 0.10259 0.21072 0.57536 1.38629 2.77259 4.60517 5.99146 7.37776 9.21034 10.59663
0.07172 0.11483 0.21580 0.35185 0.58437 1.21253 2.36597 4.10834 6.25139 7.81473 9.34840 11.34487 12.83816
0.20699 0.29711 0.48442 0.71072 1.06362 1.92256 3.35669 5.38527 7.77944 9.48773 11.14329 13.27670 14.86026
0.41174 0.55430 0.83121 1.14548 1.61031 2.67460 4.35146 6.62568 9.23636 11.07050 12.83250 15.08627 16.74960
10
2.56
4.87
6.18
7.27
8.30
9.34
10.47
11.78
13.44
14.53
15.99
16.35
16.75
17.20
17.71
18.31
19.02
19.92
21.16
23.21
29.59
9
2.09
4.17
5.38
6.39
7.36
8.34
9.41
10.66
12.24
13.29
14.68
15.03
15.42
15.85
16.35
16.92
17.61
18.48
19.68
21.67
27.88
8
1.65
3.49
4.59
5.53
6.42
7.34
8.35
9.52
11.03
12.03
13.36
13.70
14.07
14.48
14.96
15.51
16.17
17.01
18.17
20.09
26.12
7
1.24
2.83
3.82
4.67
5.49
6.35
7.28
8.38
9.80
10.75
12.02
12.34
12.69
13.09
13.54
14.07
14.70
15.51
16.62
18.48
24.32
Chi squared
Degrees of freedom (df)
6
5
0.87
0.55
2.20
1.61
3.07
2.34
3.83
3.00
4.57
3.66
5.35
4.35
6.21
5.13
7.23
6.06
8.56
7.29
9.45
8.12
10.64
9.24
10.95
9.52
11.28
9.84
11.66
10.19
12.09
10.60
12.59
11.07
13.20
11.64
13.97
12.37
15.03
13.39
16.81
15.09
22.46
20.52
4
0.30
1.06
1.65
2.19
2.75
3.36
4.04
4.88
5.99
6.74
7.78
8.04
8.34
8.67
9.04
9.49
10.03
10.71
11.67
13.28
18.47
3
0.11
0.58
1.01
1.42
1.87
2.37
2.95
3.66
4.64
5.32
6.25
6.49
6.76
7.06
7.41
7.81
8.31
8.95
9.84
11.34
16.27
2
0.02
0.21
0.45
0.71
1.02
1.39
1.83
2.41
3.22
3.79
4.61
4.82
5.05
5.32
5.63
5.99
6.44
7.01
7.82
9.21
13.82
1
0.00
0.02
0.06
0.15
0.27
0.45
0.71
1.07
1.64
2.07
2.71
2.87
3.06
3.28
3.54
3.84
4.22
4.71
5.41
6.63
10.83
p value
.99
.90
.80
.70
.60
.50
.40
.30
.20
.15
.10
.09
.08
.07
.06
.05
.04
.03
.02
.01
.001