You are on page 1of 4

Part I: Intro to Biostatistics and Foundational Concepts

 Explanatory vs. response variables


 Categorical vs. discrete numerical vs. continuous numerical variables
 Interpreting quantiles from a cumulative frequency distribution
 Mosaic plots
 Location (mean, median, model) vs. dispersion (range, variance, coefficient of
variation)
 Quantile vs. quartile
 Interpreting quartiles from a box plot
 Define degrees of freedom
 Distribution of the data vs. sampling distribution of the mean
 Standard Error
 Confidence intervals and interpretation
 Definition of…
o Test statistic
o critical value of the test statistic,
o critical probability below which we reject H0,
o P-value
 Understand tradeoffs among false positives vs. false negatives, the confusion
matrix.
o How does changing alpha impact Type I or Type II error?
 Binomial distribution
 Binomial test
 95% confidence interval for a proportion
 T-distribution
 One-sample t-test
 How to calculate more accurate confidence interval on the mean from a small
sample size
 Chi-squared goodness-of-fit test
o incl. Chi-squared GOF with Poisson distribution
 Chi-squared contingency test
 Fischer exact test
 Odds vs. odds ratio vs. relative risk 
 Z-distribution
 Central limit theorem
 Why we cannot use the z-distribution for hypothesis testing with small sample
sizes
 Fallacy of indirect comparisons
 Comparing means
o Paired t-test
o Two-sample t-test
o Welch’s t-test
 Comparing variances
o F-distribution
o F-test calculation
o When to use Levene’s test
 Test for normality
o QQ plot
o Shapiro-Wilks test
 Log transformation and a one or two-sample test
 Sign test for one sample or paired design (by hand)
 Mann-Whitney U test

For every test you should be able to:

 Know the type of problem to apply it to (“which test”)


 List the assumptions of thee test
 Know how to formulate null and alternative hypothesis
 Set up the problem and do the calculation of the test statistic
 Calculate the degrees of freedom
 Look up the critical value of the test statistic on a table
 Interpret significance based on the test statistics (compared to critical value) or
the p-value (compared to alpha)

Part II: Experimental Design and Application of Theory

 Eliminating bias
o Controls (and regression toward the mean)
o Random assignment to treatments
o Blinding
 Increase precision
o Extreme treatments
o Replication
o Balance
o Blocking
 Pseudoreplication
 Correcting for multiple tests with Bonferroni correction
 Pearson’s correlation and Spearman’s rank correlation
o State null and alternative hypotheses
o Assumptions
 Caveats associated with correlation
o Correlation does not mean causation
o Attenuation (effect of measurement error in X or Y)
o Range effects
o Spurious correlation
 General Linear Models
o Structural Assumptions
 Y is a single numerical response variable
 Response variable is linearly related to explanatory variables
 Check with residual plot
o Data Assumptions
 Residuals are normally distributed
 Check with QQ plot
 Residuals are independent of each other
 Check autocorrelation of residuals (ACF)
 Homoscedasticity
 Check with residual plot
o Additional Data Assumptions (w/more than 1 explanatory variable)
 Non-collinearity
 Simple Regression
o Hypotheses and assumptions
o Coefficient of determination (r-squared value)
o Confidence intervals vs. prediction bands
o Effect of measurement on X and Y
 Multiple Regression
o Hypotheses and assumptions
 Individual coefficients vs entire model (omnibus F test)
o R-squared vs adjusted r-squared (adjusts for more predictors)
o Diagnoses for multicollinearity, variance inflation factor
 One-way ANOVA
o Non-parametric alternative to ANOVA: Kruskal-Wallis test
o Post-hoc comparisons with Tukey HSD
o Proportion of variation
 2-way and multi-way ANOVA
o Proportion of variation
o Interactions
o Synergistic vs antagonistic treatment effects
o Post-hoc comparisons with Tukey HSD
 ANCOVA
o Hypotheses and assumptions
o Proportion of variations
o Post-hoc comparisons with Tukey HSD
o Interactions
o 3 steps for ANCOVA
 Test for interaction of the slope
 Test for overall slope assuming no interactions
 Test for differences in adjusted means in levels of the factor
 Monte Carlo Simulations
o Bootstrap
o Permutation
o Jackknife
 Likelihood and Model Selection
o Likelihood
o F-test
o AIC
Textbook used in course: Whitlock, M.C., and D. Schluter. The Analysis of Biological
Data. 2nd Edition.

Note: we did not cover ElasticNet for variable selection, but any resources you might
recommend to help me work up to that would be greatly appreciated!

You might also like