Professional Documents
Culture Documents
Burt Gerstman
Summary Points and Objectives
Chapter 1: Measurement
Biostatistics is more than a compilation of computational techniques!
Identify the main types of measurement scales: quantitative, ordinal, and categorical.
Understand the layout of a data table (observations, variables, values)
Appreciate the essential nature of data quality (GIGO principle).
1
Sample standard deviation: s = s 2 ; direct formula s =
n −1 ∑(x − x)
i
2
Recognize paired samples and adapt the one-sample t procedures to paired samples
Evaluate the Normality assumption in small, medium, and large samples
Conduct sample size and power analyses:
2
⎛ σ⎞
o to limit margin of error m when estimating μ, use n = ⎜ z1− α ⎟
2 m
⎝ ⎠
Δ2
⎛ | Δ | n ⎞⎟
o to determine the power of a test to detect Δ , 1 − β = Φ⎜ − z1− α +
⎜ 2 σ ⎟⎠
⎝
Optional: Be aware and understand the historical relevance of equal variance (“pooled”) t procedures
2 ⎛1 1 ⎞ df ⋅ s 2 + df 2 ⋅ s22
where SE = s pooled ⎜⎜ + ⎟⎟ where spooled
2
= 1 1 and df = (n1 − 1) + (n2− 1)
⎝ n1 n2 ⎠ df1 + df 2
Power and sample size
2σ 2 z12− α
To estimate μ1 − μ2 with margin of error m, use n = 2
in each group
m2
in each group
2
Δ2
If it is not possible to study groups of equal size, then determine n by the above formulas, fix the
nn1
size of n1, and have n2 = .
2n1 − n
MSB
Fstat = with dfB and dfW
MSW
⎛1 1 ⎞⎟
SE xi − x j = MSW ⎜ + and df = N – k (C.) P-value and interpretation
⎜ ni n j ⎟
⎝ ⎠
Recognize the problem of multiple comparisons and use Bonferroni method to keep the the
family-wise error rate in check (when appropriate): PBonf = PLSD × c where c represents the number
of post hoc comparisons made.
Assess the equal variance assumption graphically, by comparing group standard deviations, and
with Levene’s test of H0: σ21 = σ22 = … = σ2k.
Use robust non-parametric ANOVA (i.e., the Kruskal-Wallis test) when necessary.
r −ϖ r +ϖ t df2 ,1− α
Confidence interval for ρ: LCL = and UCL = where ϖ = 2 2
1 − rϖ 1 + rϖ t df ,1− α + df
2
s
Least squares regression model: yˆ = a + bx where b = r Y and a = y − bx .
sX
Slope estimate b is the key statistic in all this, representing the predicted change in Y per unit X.
Inference about population slope β:
1
Standard error of the regression sY |x =
n−2 ∑ residuals 2
with df = n – 2
sY | x
(1 −α)100% confidence interval for β = b ± (tn-2,1-α/2)(SEb) where SEb =
n −1 ⋅ sX
b
To test H0: β = 0, use tstat =
SEb
Optional: An ANOVA procedure can be used to test H0: β = 0 using an Fstat (pp. 321–324)
ai
In naturalistic and cohort samples, report incidence (or prevalences) in each group: pˆ i = .
ni
Characteristics of chi-square probability distributions (e.g., start at 0, asymmetrical, become
increasingly symmetrical as the df increases)
Hypothesis test for association (large samples)
(A.) H0: no association in population (homogeneity of proportions)
Case E+ Case E−
Control E+ a b
Control E− c d
ln OR ± z α ⋅SEln Oˆ R ˆ
c 1 1
Oˆ R = ; (1 – α)100% confidence interval for the OR = e
1−
2
where SEln Oˆ R = +
b c b
( c − b) 2
Hypothesis test: (A.) H0: OR = 1 (B.) z stat = (C.) P-value and interpretation; use exact
c+b
binomial procedure when there are 5 or less discordant pairs