You are on page 1of 4

Genome wide association analysis

GWAS for case -control (qualitative trait) design


Phenotype SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8 SNP9
Disease AG AC GT AG CC GA TT TT AC
Disease AA AC GT AA CC GA TT TT AC
Disease AA AC GT AA CC GA TT TT AC
Disease AA AC GT AA CC GG TT TT AC
Disease AA AC GT AA CC GG TT TT AC
Healthy GG AC TT AA CC AA AA AT AC
Healthy GG AC TT AA CC AA AA AT AC
Healthy GG AC GG AA CC GA AA AT AC
Healthy GG AA GG AG CC GA AA AT AA
Healthy AG AA GG AG CC GA AA AT AA

First we have to quantify the effect of single SNPs on the phenotype. A contingency table can be
used for this purpose.

For SNP1, the contingency table to estimate the genotypic effect on the phenotype can be generated
as follows.

AA AG GG Total
Disease 4 1 0 5
Healthy 0 1 4 5
Total 4 2 4 10

Now we have to test if the number of disease and healthy individuals is statistically different
between the three genotypes or in other words if the genotypes at SNP1 have a significant effect on
the disease status of individuals. For this purpose we have to perform hypothesis testing as follows.

Null hypothesis: Effect of SNP1 on phenotype = 0


Alternate hypothesis: Effect of SNP1 on phenotype > 0

To test the hypothesis chi square text can be used.

The chi-square formula is: χ2 = ∑(Oi – Ei)2/Ei, where Oi = observed value (actual value) and Ei =
expected value.

The expected values for each of the genotypes can be calculated as follows.
(Column total X row total)/ Grand total
AA AG GG Total
Disease (4 x 5)/10 = 2 (2 x 5)/10 = 1 (4 x 5)/10 = 2 5
Healthy (4 x 5)/10 = 2 (2 x 5)/10 = 1 (4 x 5)/10 = 2 5
Total 4 2 4 10

χ2 = (4 – 2)2/2 + (1– 1)2/1 + (0 – 2)2/2 +(0 – 2)2/2 + (1 – 1)2/1 + (4 – 2)2/2

χ2 = 2 + 0 + 2 + 2 + 0 + 2 =8

This is the calculated χ2 value. We have to compare this with the tabulated χ2 value at alpha level
0.05 and 2 degree of freedom. Tabulated χ2 value in is 5.991.
As our calculated χ2 is larger than the tabulated χ2 we will accept the alternate hypothesis that
Effect of SNP1 on phenotype > 0 which means that SNP1 is associated with the phenotype.

SNP2- phenotype association analysis:


For SNP2, the contingency table to estimate the genotypic effect on the phenotype can be generated
as follows.

AA AC Total
Disease 0 5 5
Healthy 2 3 5
Total 2 8 10

Now we have to test if the number of disease and healthy individuals is statistically different
between the three genotypes or in other words if the genotypes at SNP2 have a significant effect on
the disease status of individuals. For this purpose we have to perform hypothesis testing as follows.

Null hypothesis: Effect of SNP2 on phenotype = 0


Alternate hypothesis: Effect of SNP2 on phenotype > 0

To test the hypothesis chi square text can be used.

The chi-square formula is: χ2 = ∑(Oi – Ei)2/Ei, where Oi = observed value (actual value) and Ei =
expected value.

The expected values for each of the genotypes can be calculated as follows.
(Column total X row total)/ Grand total
AA AC Total
Disease (2 x 5)/10 = 1 (8 x 5)/10 = 4 5
Healthy (2 x 5)/10 = 1 (8 x 5)/10 = 4 5
Total 2 2 10

χ2 = (0 – 1)2/1 + (5 – 4)2/4 + (2 – 1)2/1 +(3 – 4)2/4

χ2 = 1 + 0.25 + 1 + 0.25 = 2.5

This is the calculated χ2 value. We have to compare this with the tabulated χ2 value at alpha level
0.05 and 1 degree of freedom. Tabulated χ2 value in is 3.841.
As our calculated χ2 is smaller than the tabulated χ2 we will accept the null hypothesis that Effect
of SNP2 on phenotype = 0 which means that SNP2 is not associated with the phenotype.
Same procedure can be performed for other SNPs as well to test their association.

GWAS for quantitative traits

Phenotype SNP1 SNP2 SNP3 SNP4 SNP5 SNP6 SNP7 SNP8 SNP9
12 AG AC GT AG CC GA TT TT AC
10 AA AC GT AA CC GA TT TT AC
13 AA AC GT AA CC GA TT TT AC
9 AA AC GT AA CC GG TT TT AC
8 AA AC GT AA CC GG TT TT AC
5 GG AC TT AA CC AA AA AT AC
13 GG AC TT AA CC AA AA AT AC
14 GG AC GG AA CC GA AA AT AC
17 GG AA GG AG CC GA AA AT AA
12 AG AA GG AG CC GA AA AT AA

First we have to quanitfy the effect that a SNP has on the phenotype. To calculate the effect of
SNP1 on the phenotype we can perform regression of phenotype on SNP1. The regression
coefficient thus obtained will provide the effect size of the association between SNP1 and the
phenotype. But as simplest form of regression (linear regression) can only be performed when both
dependent and independent variables are quantitative, we have to convert the SNP genotypes into
quantitative variables. It means that we have to convert the SNP coding to additive coding (0,1,2).
For each SNP, this can be done by considering one of the two alleles of a SNP as reference allele
and then count the number of that allele in each individual. For example for SNP 1 if we consider G
as the reference allele then genotype GG = 2, AG = 1 and AA = 0.
Either of the two alleles can be considered as reference allele bu the convention is to use the allele
that is least frequent in the population. This setting is arbitrary and either of the two alleles can be
used as reference. The same recoding process can be repeated for the other SNPs.

Phenotype SNP1_G SNP2_C SNP3_T SNP4_G SNP5_C SNP6_A SNP7_T SNP8_A SNP9_C
12 1 1 1 1 2 1 2 0 1
10 0 1 1 0 2 1 2 0 1
13 0 1 1 0 2 1 2 0 1
9 0 1 1 0 2 0 2 0 1
8 0 1 1 0 2 0 2 0 1
5 2 1 2 0 2 2 1 1 1
13 2 1 2 0 2 2 1 1 1
14 2 1 0 0 2 1 1 1 1
17 2 0 0 1 2 1 1 1 0
12 1 0 0 1 2 1 1 1 0

Here we can use the following formula to find the regression coefficient that will tell us the effect
size of SNP1 on the phenotype: b = Cov(SNP genotype, Phenotype)/ Var(SNP genotype). Or in
other terms
b = Cov (Xi Yi)/ Var(Xi)
Covariance between genotype and phenotype can be calculated as follows.

(Y-Ymean) (X-
Y X Y-Ymean X-Xmean Xmean) (X-Xmean)2
12 1 0.69 0 0 0
10 0 -1.3 -1 1.3 1
13 0 1.7 -1 -1.7 1
9 0 -2.3 -1 2.3 1
8 0 -3.3 -1 3.3 1
5 2 -6.3 1 -6.3 1
13 2 1.7 1 1.7 1
14 2 2.7 1 2.7 1
17 2 5.7 1 5.7 1
12 1 0.69 0 0 0

Ymean = 11.3
Xmean = 1

Cov(XY) = sum {(Y-Ymean) (X- Xmean)}/n-1 = 9/10-1 = 1


Var(X) = sum(X-Xmean)2/n-1 = 8/10-1 = 0.89

b = Cov (Xi Yi)/ Var(Xi) = 1/0.89 = 1.13

It seems from the analysis that the change of one unit (addition of one G allele) in the SNP1
genotype cause a increase of 1.13 units of the phenotype.

The next step should be to test if this effect is large enough to be denoted as statistically significant.
For that we would have to perform a statistical test but in this course we will not go into that.

You might also like