You are on page 1of 14

R: chi square test

"It has long been an axiom of mine that the little things are
InfluentialPoints.com Use/Abuse Principles How To Related infinitely the most important" (Sherlock Holmes)
Biology, images,
analysis, design...
 

Study & Experimental Design


chi square test - using R
analytical surveys 

case-control designs 

cohort designs  Wald confidence interval of risk ratio


descriptive surveys 
Note, the following code assumes the epitools library has been installed. For
fully randomized experiments  epitools help on this function, enter ?riskratio

intentional bias & fraud 


require(epitools)
tabrisk=matrix(c(384,374,34,35),nc=2)
missing data  riskratio(tabrisk, method="wald")

multiple group designs 

pseudoreplication 
Gives something like this:
survey sampling methods 

Summary statistics
> riskratio(tabrisk, method="wald")
coefficient of variation (CV)  $data
Outcome
Predictor Disease1 Disease2 Total
data verification  Exposed1 384 34 418
Exposed2 374 35 409
displaying frequency Total 758 69 827
distributions  $measure
risk ratio with 95% C.I.
measures of association  Predictor estimate lower upper
Exposed1 1.000000 NA NA
Exposed2 1.052064 0.6695936 1.653
measures of births & deaths 
$p.value
two-sided
measures of disease frequency  Predictor midp.exact fisher.exact chi.square
Exposed1 NA NA NA
Exposed2 0.8269603 0.900074 0.8257292
measures of location 
$correction
[1] FALSE
quantiles & their display 
attr(,"method")
types of variables  [1] "Unconditional MLE & normal approximation (Wald) CI"

variance & standard deviation 

validity measures: binary


variables   

validity measures: continuous


variables 

Distributions & Inferential


statistics

binomial & Poisson  


distributionss 

bootstrap confidence intervals  If you like our critical approach to analysis you will really like our hyperbook: 

http://influentialpoints.com/notes/n9rme1.htm[23/12/2018 06:06:54]
R: chi square test

Avoiding and Detecting Statistical Malpractice (Design & Analysis for Biologists, with R).
confidence intervals of means 
 
confidence intervals of
proportions & rates 

confidence intervals of ratios 


Except where otherwise specified, all text and images on this page are copyright
negative binomial distribution 
InfluentialPoints, all rights reserved. For images that are not copyright InfluentialPoints,
normal distribution  their sources are credited on web-pages attached via hypertext links to those images.
home  sitemap  about us 
null hypothesis significance
testing 

power & sample size 

standard error of means 

transformations 

z-scores 

Comparing two samples

binomial & related tests 

chi-square test for association 

Fisher's exact test 

goodness-of-fit tests 

Kolmogorov-Smirnov tests 

nonparametric correlation 

runs & Wald-Wolfowitz tests 

t-test: paired 

t-test: two-sample 

Wilcoxon-Mann-Whitney U-test 

Wilcoxon matched pairs test 

z-test for independent


proportions 
 
Linear models

ANOVA: blocked designs 

ANOVA: factorial designs 

ANOVA: Kruskal-WallisA 

ANOVA: nested designs 

ANOVA: one-way fixed effects 

ANOVA: one-way random


effects 

ANOVA: splitplot & repeated

http://influentialpoints.com/notes/n9rme1.htm[23/12/2018 06:06:54]
R: chi square test

measures 

covariance analysis ANCOVA 

Cox's proportional hazards 

multiple comparison tests 

regression: errors-in-variables 

regression: logistic 

regression: multiple linear 

regression: simple linear 

Pearson's correlation
coefficient 

More Information

use & misuse of statistics in


biology 

how to (or how not to) do it 

related topics 

http://influentialpoints.com/notes/n9rme1.htm[23/12/2018 06:06:54]
R: chi square test

http://influentialpoints.com/notes/n9rme1.htm[23/12/2018 06:06:54]
How to: Confidence interval risk ratio odds ratio rate ratio

InfluentialPoints.com

Confidence intervals of ratios Risk ratio, odds ratio, and rate


ratio

Worked example I

Our first example uses results from a randomized trial on the effect of vitamin E supplementation on
the incidence of macular degeneration.

Effect of Vitamin E on the incidence of macular degeneration


Type Treatment No. positive No negative % positive Risk ratio (RR)
Treated 35 374 8.56 1.05
Early
Placebo 34 384 8.13 1.00

The risk ratio is an appropriate summary measure to use to assess the association between
treatment and outcome. Since sample sizes are reasonably large we attach a Wald normal
approximation interval to estimate the standard error:

Using
1 1 1 1
SE(lnRR)   =   √  -   +   -   = √0.05315 = 0.23054
35 409 34 418

95% CI (lnRR)  =   ln (1.05)  ± (1.96 × 0.23054)


    =   0.40307 - 0.50065

95% CI (RR)  =   0.67 - 1.65

The statistic and confidence interval as calculated above are the same as those given by the
riskratio function, of epitools package for R, for the normal approximation (Wald) confidence interval:
Risk ratio = 1.052 (0.670 - 1.653).

The interval widely overlaps 1.0 suggesting that vitamin E has no significant effect on the incidence
of macular degeneration. This conclusion is supported by the non-significant P-value from a
Pearson's chi square test (0.826).

Worked example II

Our second example uses a result from a cross-sectional survey on the prevalence of dystocia in
cats. We previously looked at this work in relation to the confidence intervals attached to the
prevalence estimates. We will calculate both the odds ratio (as used by the authors) and the risk

http://influentialpoints.com/Training/confidence_intervals_of_risk_ratio_odds_ratio_and_rate_ratio.htm[23/12/2018 06:07:15]
How to: Confidence interval risk ratio odds ratio rate ratio

ratio with their accompanying intervals.

Prevalence of dystocia in cats in relation to breed


Odds ratio Risk ratio
Breed No. positive No negative % positive
OR RR
Manx 1 17 5.55 15.686 14.87
Colony 3  800 0.37 (1.00) (1.00)

In this case one of the sample sizes is small and one of the proportions is small. Hence the Wald
interval calculated below may be unreliable, so we would do better to also calculate a conditional
exact interval using the epitools oddsratio function for R:

Using
1 1 1 1
SE(lnOR)   =   √ + + +  = √1.393407 = 1.180427
1 17 3 800

95% CI (lnOR)  =   ln (15.686)  ± (1.96 × 1.180427)


    =   0.439133 - 5.066405

95% CI (OR)  =   1.551 - 158.60

For the odds ratio in R we obtain the same for the Wald interval (OR = 15.69, 95% CI 1.55 to
158.60), but the conditional exact interval overlaps 1 (OR = 15.48, 95% CI 0.28 to 204.67), as does
the (more reliable) mid-P interval (OR = 16.77, 95% CI 0.56 to 153.09). Hence it is now highly
questionable whether we have actually demonstrated that there is any difference between breeds.

For the risk ratio we obtained a risk ratio of 14.87 with a Wald interval of 1.62 to 136.2, the same as
those given by the epitool package riskratio function for the normal approximation (Wald) confidence
interval. Using the same R-function the Wald normal approximation, with small sample adjustment,
gave a risk ratio of 11.17 with an interval 1.22 to 102.25. The exact mid-P value however was
0.0876, somewhat above the conventional 0.05 level.

Using

If you like our critical approach to


analysis you will really like our
hyperbook: 
Avoiding and Detecting Statistical
Malpractice (Design & Analysis for
Biologists, with R).

http://influentialpoints.com/Training/confidence_intervals_of_risk_ratio_odds_ratio_and_rate_ratio.htm[23/12/2018 06:07:15]
How to: Confidence interval risk ratio odds ratio rate ratio

Except where otherwise specified, all


text and images on this page are
copyright InfluentialPoints, all rights
reserved. For images that are not
copyright InfluentialPoints, their
sources are credited on web-pages
attached via hypertext links to those
images.
home  sitemap  about us 

http://influentialpoints.com/Training/confidence_intervals_of_risk_ratio_odds_ratio_and_rate_ratio.htm[23/12/2018 06:07:15]
R: chi square test

"It has long been an axiom of mine that the little things are
InfluentialPoints.com Use/Abuse Principles How To Related infinitely the most important" (Sherlock Holmes)
Biology, images,
analysis, design...
 

Study & Experimental Design


chi square test - using R
analytical surveys 

case-control designs 

cohort designs  Confidence intervals of odds ratio


descriptive surveys 

fully randomized experiments 

intentional bias & fraud 


tabodds=matrix(c(800,17,3,1),nc=2)
oddsratio.wald(tabodds)
missing data  oddsratio.fisher(tabodds)
oddsratio.midp(tabodds)
multiple group designs 

pseudoreplication 

survey sampling methods 

Summary statistics
Gives something like this (results edited for clarity):
coefficient of variation (CV) 
$data
data verification 
Outcome
Predictor Disease1 Disease2 Total
Exposed1 800 3 803
displaying frequency Exposed2 17 1 18
distributions 
Total 817 4 821
oddsratio.wald(tabodds)
measures of association  $measure
odds ratio with 95% C.I.
Predictor estimate lower upper
measures of births & deaths  Exposed1 1.00000 NA NA
Exposed2 15.68627 1.551454 158.5991
measures of disease frequency  oddsratio.fisher(tabodds)
$measure
odds ratio with 95% C.I.
measures of location  Predictor estimate lower upper
Exposed1 1.00000 NA NA
quantiles & their display  Exposed2 15.48370 0.2823527 204.6715
oddsratio.midp(tabodds)
types of variables  $measure
odds ratio with 95% C.I.
Predictor estimate lower upper
variance & standard deviation  Exposed1 1.00000 NA NA
Exposed2 16.77320 0.5638613 153.095
validity measures: binary
variables 

validity measures: continuous  


variables 

Distributions & Inferential


statistics

binomial & Poisson


distributionss 
 
bootstrap confidence intervals 

http://influentialpoints.com/notes/n9rme2.htm[23/12/2018 06:07:20]
R: chi square test

confidence intervals of means  If you like our critical approach to analysis you will really like our hyperbook: 
confidence intervals of
Avoiding and Detecting Statistical Malpractice (Design & Analysis for Biologists, with R).
proportions & rates 
 
confidence intervals of ratios 

negative binomial distribution 

normal distribution  Except where otherwise specified, all text and images on this page are copyright
InfluentialPoints, all rights reserved. For images that are not copyright InfluentialPoints,
null hypothesis significance their sources are credited on web-pages attached via hypertext links to those images.
testing 
home  sitemap  about us 
power & sample size 

standard error of means 

transformations 

z-scores 

Comparing two samples

binomial & related tests 

chi-square test for association 

Fisher's exact test 

goodness-of-fit tests 

Kolmogorov-Smirnov tests 

nonparametric correlation 

runs & Wald-Wolfowitz tests 

t-test: paired 

t-test: two-sample 

Wilcoxon-Mann-Whitney U-test 

Wilcoxon matched pairs test 

z-test for independent


proportions 
 
Linear models

ANOVA: blocked designs 

ANOVA: factorial designs 

ANOVA: Kruskal-WallisA 

ANOVA: nested designs 

ANOVA: one-way fixed effects 

ANOVA: one-way random


effects 

ANOVA: splitplot & repeated

http://influentialpoints.com/notes/n9rme2.htm[23/12/2018 06:07:20]
R: chi square test

measures 

covariance analysis ANCOVA 

Cox's proportional hazards 

multiple comparison tests 

regression: errors-in-variables 

regression: logistic 

regression: multiple linear 

regression: simple linear 

Pearson's correlation
coefficient 

More Information

use & misuse of statistics in


biology 

how to (or how not to) do it 

related topics 

http://influentialpoints.com/notes/n9rme2.htm[23/12/2018 06:07:20]
R: chi square test

http://influentialpoints.com/notes/n9rme2.htm[23/12/2018 06:07:20]
> mytable=matrix(c(12,8,10,14),byrow=TRUE,ncol=2)

> colnames(mytable)=c("M+","M-")

> rownames(mytable)=c("E+","E-")

> mytable

M+ M-

E+ 12 8

E- 10 14

> mytable

M+ M-

E+ 12 8

E- 10 14

> oddsratio(mytable)

$`data`

M+ M- Total

E+ 12 8 20

E- 10 14 24

Total 22 22 44

$measure

NA

odds ratio with 95% C.I. estimate lower upper

E+ 1.000000 NA NA

E- 2.053199 0.6110623 7.244228

$p.value

NA

two-sided midp.exact fisher.exact chi.square


E+ NA NA NA

E- 0.2466261 0.3640443 0.2258724

$correction

[1] FALSE

attr(,"method")

[1] "median-unbiased estimate & mid-p exact CI"

> mytable=matrix(c(12,8,10,14,7,13),byrow=TRUE,ncol=2)

> colnames(mytable)=c("M+","M-")

> rownames(mytable)=c("E+","E-","E")

> mytable

M+ M-

E+ 12 8

E- 10 14

E 7 13

> oddsratio(mytable)

$`data`

M+ M- Total

E+ 12 8 20

E- 10 14 24

E 7 13 20

Total 29 35 64

$measure

NA

odds ratio with 95% C.I. estimate lower upper


E+ 1.000000 NA NA

E- 2.053199 0.6110623 7.244228

E 2.690924 0.7494862 10.405104

$p.value

NA

two-sided midp.exact fisher.exact chi.square

E+ NA NA NA

E- 0.2466261 0.3640443 0.2258724

E 0.1305440 0.2049272 0.1133944

$correction

[1] FALSE

You might also like