You are on page 1of 4

BIODEMIC – IU Academic Club of School of Biotechnology

HANDOUT TA BIOSTATISTICS
𝑠
A – CALCULATION: - n is small: (1-α)100% CI for μ: 𝑋̅ ± 𝑡𝛼 ×
2 √𝑛
I. Descriptive statistics 𝑠
- n is large: (1-α)100% CI for μ: 𝑋̅ ± 𝑧𝛼 ×
- Measures of central tendency or location: 2 √𝑛

+ Median: middle value (=MEDIAN(table)) 2/ CI for the population proportion (p):

+ Mode: most frequently-occurring value - Is applied for large sample: both 𝑛 × 𝑝 and 𝑛 × 𝑞 > 5.

+ Mean: average value (=AVERAGE(table)) - Population proportion: 𝑝; sample proportion: 𝑝̂ .

- Measures of variability or dispersion: 𝑝̂𝑞̂


- (1-α)100% CI for p: 𝑝̂ ± 𝑧𝛼 √ 𝑛
2
+ Range = max value - min value
𝟐
+ IQR (Interquartile range) = 3rd quartile - 1st 3/ CI for the population variance (𝛿 ):

quartile 2 (𝑛−1)×𝑠2 (𝑛−1)×𝑠2


(1-α)100% CI for 𝛿 : [ , ]
𝜒2 𝛼 2
𝜒𝛼
+ Variance (=VAR.S(table)) 1−
2 2

+ SD (Standard deviation) (=STDEV.S(Table))


B – HYPOTHESIS TESTING:
II. Probability: I. Normality test: Chi-square goodness of fit tests:
𝑛(𝐴∩𝐵) - Hypotheses:
- Intersection probability: 𝑃(𝐴 ∩ 𝐵) = 𝑛(𝑠)
+ H0: The sample data are not significantly different
- Union probability:
from a normal population.
𝑛(𝐴 ∪ 𝐵)
𝑃(𝐴 ∪ 𝐵) = = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) + H1: The sample data are significantly different from
𝑛(𝑠)
a normal population.
𝑛!
- Combinations: 𝑛𝐶𝑟 = (=COMBIN (n, r)) (𝑓0 −𝑓𝑒 )2
𝑟!(𝑛−𝑟)! - Chi-square statistic (𝜒2𝑠𝑐𝑜𝑟𝑒 ): 𝜒2𝑘−𝑎−1 = ∑ 𝑓𝑒
𝑛!
- Permutations: 𝑛𝑃𝑟 = (𝑛−𝑟)!
(=PERMUT (n, r)) - Critical values with df = k – a -1 (usually = 3)
𝑃(𝐴∩𝐵)
- Conditional probability: 𝑃(𝐴|𝐵) =
𝑃(𝐵)
, 𝑃(𝐵) ≠0 𝜒2𝑐𝑟𝑖𝑡𝑠 = 𝐶𝐻𝐼𝑆𝑄. 𝐼𝑁𝑉 (𝛼2 , 𝑘 − 𝑎 − 1) =
𝛼
=> 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴|𝐵) × 𝑃(𝐵) = 𝑃(𝐵|𝐴) × 𝑃(𝐴) 𝐶𝐻𝐼𝑆𝑄. 𝐼𝑁𝑉. 𝑅𝑇 ( 2 , 𝑘 − 𝑎 − 1)
- 2 events A and D are statistically independent:
- Compare 𝜒2𝑠𝑐𝑜𝑟𝑒 and 𝜒2𝑐𝑟𝑖𝑡𝑠 → conclude H0.
+ Conditions: 𝑃(𝐴|𝐷) = 𝑃(𝐴) 𝑎𝑛𝑑 𝑃(𝐷|𝐴) =
𝑃(𝐷)
II. Parametric tests:
+ Consequence: 𝑃(𝐴 ∩ 𝐷) = 𝑃(𝐴) × 𝑃(𝐷)
1/ Hypothesis testing for ONE-SAMPLE
inference:
III. Confidence intervals:
a) Testing population means (μ):
1/ CI for population mean (μ):
- z test: σ is known, population is normal (sample size ≥
a) When σ is known:
30).
𝛿
(1-α)100% CI for μ: 𝑋̅ ± 𝑧𝛼 × 𝑋̅−𝜇
2 √𝑛 → Test statistic: 𝑧𝑠𝑐𝑜𝑟𝑒 = 𝛿

b) When σ is unknown: √𝑛
BIODEMIC – IU Academic Club of School of Biotechnology

- t test: σ is unknown, s is known, population is normal. + Calculation: (n: number of pairs, d: sample
𝑋̅−𝜇 difference in pairs).
→ Test statistic: 𝑡𝑠𝑐𝑜𝑟𝑒 = 𝑠 (𝑑𝑓 = 𝑛 − 1)
√𝑛
⸰ 𝑑̅ (mean of sample difference) =AVERAGE(all
b) Testing population proportion (p):
d)
- n ≤ 500: can use binomial distribution.
⸰ 𝑠𝑑 (SD of sample difference) =STDEV.S(all d)
- n > 500: use normal approximation 𝑑̅−𝐷
𝑝̂−p
+ 𝑡𝑠𝑐𝑜𝑟𝑒 = 𝑠𝑑 (𝑑𝑓 = 𝑛 − 1)
Test statistic: 𝑧 = 𝑝𝑞
√𝑛
√𝑛 𝑠𝑑 𝑠𝑑
+ CI: 𝑑̅ − 𝑡𝑐𝑟𝑖𝑡 × ≤ 𝐷 ≤ 𝑑̅ + 𝑡𝑐𝑟𝑖𝑡 ×
√𝑛 √𝑛
c) Testing population variances (σ2):
b) Test for differences in 2 population
(𝑛−1)×𝑠2
Test statistic: 𝜒2 = 𝛿2 proportions (p):
2/ Hypothesis testing for TWO-SAMPLE - Hypotheses:
inference: + H0: p1 = p2
a) Test for difference in population means (μ) + H1: p1 ≠ p2
- Hypotheses: 𝑥 +𝑥 ̂+𝑛
𝑛1 𝑝 2𝑝̂2
- Calculation:𝑝̅ = 𝑛1 +𝑛2 = 1
𝑛1 +𝑛2
; 𝑞̅ = 1 − 𝑝̅
1 2
+ H0: μ1 = μ2 → H0: μ1 - μ2 = 0
(𝑝
̂−𝑝
1 ̂)−(𝑝
2 1 −𝑝2 ) (𝑝
̂−𝑝
1 ̂)−(𝑝
2 1 −𝑝2 )
+ H1: μ1 ≠ μ2 → H0: μ1 - μ2 ≠ 0 - 𝑧𝑠𝑐𝑜𝑟𝑒 = 𝑝1 𝑞 1 𝑝2 𝑞 2
= 1 1
√ 𝑛 + √(𝑝𝑞
̅̅̅̅)×( + )
1 𝑛2 𝑛1 𝑛2
2
* z test: 𝛿1 & 𝛿22 𝑎𝑟𝑒 𝑘𝑛𝑜𝑤𝑛, 𝛿12 = 𝛿22 , n1 & n2 > 30.
̂𝑞
𝑝1̂ 1 ̂𝑞
𝑝2̂ 2
(𝑥
̅̅̅̅−𝑥
1 ̅̅̅̅)−(𝜇
2 1 −𝜇2 )
- CI: 𝑝1 − 𝑝2 ∈ (𝑝
̂1 − 𝑝
̂)
2 ± 𝑧𝑐𝑟𝑖𝑡 √ 𝑛1
+ 𝑛2
+ 𝑧𝑠𝑐𝑜𝑟𝑒 = → compare with zcrits.
𝛿2 𝛿2
√ 1+ 2 1 1
𝑛1 𝑛2
Or: 𝑝1 − 𝑝2 ∈ (𝑝
̂1 − 𝑝
̂)
2 ± 𝑧𝑐𝑟𝑖𝑡 √𝑝𝑞
̅̅̅ × (𝑛 + 𝑛 )
1 2

𝛿2 𝛿22
+ CI: 𝜇1 − 𝜇2 ∈ [𝑥
̅̅̅1 − ̅̅̅
𝑥2 ± 𝑧𝑐𝑟𝑖𝑡 × √ 1
𝑛1
+ 𝑛2
] c) Test for difference in 2 population
variances(σ2):
2
* t test: 𝛿1 & 𝛿22 𝑎𝑟𝑒 𝑢𝑛𝑘𝑛𝑜𝑤𝑛, 𝛿12 = 𝛿22 , n1 or n2 < 30, - Hypotheses:
independent samples. 2
+ H0: 𝛿1 = 𝛿22
(𝑥
̅̅̅̅−𝑥
1 ̅̅̅̅)−(𝜇
2 1 −𝜇2 )
+ 𝑡𝑠𝑐𝑜𝑟𝑒 = 2
𝑠2 (𝑛 −1)−𝑠2
√1 1 2 (𝑛2 −1)× 1 + 1
+ H1: 𝛿1 ≠ 𝛿22
𝑛1 +𝑛2 −2
√𝑛 𝑛
1 2
𝑠2
- 𝐹𝑠𝑐𝑜𝑟𝑒 = 𝑠12 (𝑑𝑓1 = 𝑛1 − 1; 𝑑𝑓2 = 𝑛2 − 1)
→ Compare with tcrits (𝑑𝑓 = 𝑛1 + 𝑛2 − 2) 2

+ CI for 𝜇1 − 𝜇2: - Compare 𝐹𝑠𝑐𝑜𝑟𝑒 with 𝐹𝑐𝑟𝑖𝑡𝑠 :


𝛼
+ Left: 𝐹. 𝐼𝑁𝑉( 2 , 𝑑𝑓1 , 𝑑𝑓2 )
𝑠12 (𝑛1 − 1) − 𝑠22 (𝑛2
− 1) 1 1
[𝑥 𝑥2 ± 𝑡𝑐𝑟𝑖𝑡 × √
̅̅̅1 − ̅̅̅ ×√ + ] 𝛼
𝑛1 + 𝑛2 − 2 𝑛1 𝑛2 + Right: 𝐹. 𝐼𝑁𝑉. 𝑅𝑇 ( 2 , 𝑑𝑓1 , 𝑑𝑓2 )
* Paired t test: n1 = n2, dependent samples. 3/ ANOVA: analysis of variance
+ Hypotheses: (D: mean population difference) a) 1-way ANOVA: r samples, n observations
⸰ H0: D = 0 - Hypotheses:
⸰ H1: D ≠ 0 + H0: μ1 = μ2 = … = μr
+ H1: Not all μi are equal.
BIODEMIC – IU Academic Club of School of Biotechnology

- Use “ANOVA: single factor” in Data analysis (Excel). + Sum the ranking for each group: W1 and W2.
𝑀𝑆𝑇𝑅 (𝑜𝑟 𝑀𝑆 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑔𝑟𝑜𝑢𝑝𝑠) - Test statistic:
- 𝐹𝑠𝑐𝑜𝑟𝑒 = 𝑀𝑆𝐸 (𝑜𝑟 𝑀𝑆 𝑤𝑖𝑡ℎ𝑖𝑛 𝑔𝑟𝑜𝑢𝑝𝑠)
𝑛1 (𝑛1 + 1)
- 𝐹𝑐𝑟𝑖𝑡 = F.INV(1- 𝛼, r-1, n-r). 𝑈1 = 𝑛1 𝑛2 + − 𝑊1
2
- Compare: If 𝐹𝑠𝑐𝑜𝑟𝑒 > 𝐹𝑐𝑟𝑖𝑡 ⇒ Reject H0 ⇒ Not all means
𝑛2 (𝑛2 + 1)
are the same. ⇒ Conduct Tukey pairwise comparisons 𝑈2 = 𝑛1 𝑛2 + − 𝑊2
2
test to find the different sample.
=> Uscore = min (U1, U2)
+ Test each pair of samples (3 samples ⇒ 3 pairs).
- Find the value of Ucrit.
+ Test statistic1 = ABS (Average 1 - Average 2)
- Compare: If Uscore > Ucrit → Do not reject H0.
Test statistic2 = ABS (Average 1 - Average 3)
b) Large sample procedure: n1 or n2 >20
Test statistic3 = ABS (Average 2 - Average 3)
- zscore: 𝜇𝑣 = 𝑛12𝑛2
𝑀𝑆𝐸
+ Critical point: T = q𝛼√
𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑛 𝑓𝑟𝑜𝑚 𝑟 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 2 𝑛1 𝑛2 (𝑛1 + 𝑛2 + 1)
𝛿𝑣 =
12
+ Compare each test statistic with critical point:
Test statistic 1 > Critical point ⇒ Sample 1 is 𝑈𝑠𝑐𝑜𝑟𝑒 − 𝜇𝑣
=> 𝑧𝑠𝑐𝑜𝑟𝑒 =
different from sample 2. √𝛿𝑣2

Test statistic 2 < Critical point ⇒ Sample 1 is the - Compare zscore and zcrit (zα/2 or z1-α/2).
same as sample 3. 2/ Wilcoxon matched-pair sign rank test:
Test statistic 3 > Critical point ⇒ Sample 3 is - Compare the means of 2 dependent populations (n
different from sample 2. matched pairs).
+ Conclusion: Sample 2 is different from the others. - Hypotheses:
b) 2-way ANOVA: + H0: μ1 = μ2
Use “ANOVA: 2-factor with replication” in Data analysis + H1: μ1 ≠ μ2
(Excel). a) Small sample procedure: n ≤ 15
- Find the difference of each pair => Divide results into 2
III. Nonparametric tests: groups: Gr(+) and Gr(-).
1/ Mann – Whitney U test: - Find the absolute difference of each pair.
- Compare the means of 2 independent populations - Rank the absolute difference:
(assume at least ordinal data). + Rank all the data of all groups at the same time
- Hypotheses: (=RANK.AVG).
+ H0: μ1 = μ2 + Sum the ranking:
+ H1: μ1 ≠ μ2 ⸰ T(+) = Sum rank of Gr(+).
a) Small sample procedure: n1 & n2 ≤ 20 ⸰ T(-) = Sum rank of Gr(-).
- Rank the data: - Test statistic: Tscore = min (T(+), T(-)).
+ Rank all the data of all groups at the same time - Find the value of Tcrit.
(=RANK.AVG). - Compare: If Tscore > Tcrit → Do not reject H0.
BIODEMIC – IU Academic Club of School of Biotechnology

12
b) Large sample procedure: n > 15 - Test statistic: 𝜒2𝑟 = Sum(R2) - 3B(C+1)
𝐵𝐶(𝐶+1)
𝑛(𝑛+1)
- zscore: 𝜇𝑟 = 4 - 𝜒2𝑐𝑟𝑖𝑡 = 𝐶𝐻𝐼𝑆𝑄. 𝐼𝑁𝑉. 𝑅𝑇(𝛼, 𝐶 − 1)
2 𝑛(𝑛 + 1)(2𝑛 + 1) - Compare 𝜒2𝑟 and 𝜒2𝑐𝑟𝑖𝑡 : If 𝜒2𝑟 >𝜒2𝑐𝑟𝑖𝑡 → Reject H0.
𝛿𝑣 =
24

𝑇𝑠𝑐𝑜𝑟𝑒 − 𝜇𝑟
=> 𝑧𝑠𝑐𝑜𝑟𝑒 = C – REGRESSION:
√𝛿𝑟2
- Bivariate linear regression:
- Compare zscore and zcrit (zα/2 or z1-α/2). + y: dependent variable
3/ Kruskal-Wallis test: + x: independent variable
- Compare the means of more than 2 independent - Squared of correlation coefficient (R2):
populations (C populations). + 0 ≤ R2 ≤ 0.3: no relationship ⇒ no need to find the
- Hypotheses: regression line.
+ H0: C populations are identical. + 0.3 ≤ R2 < 0.5: weak relationship ⇒ Find the
+ H1: At least 1 of C populations is different. regression line.
- Rank the data: + 0.5 ≤ R2 < 0.8: moderate relationship ⇒ Find the
+ Rank all the data of all groups at the same time regression line.
(=RANK.AVG). + 0.8 ≤ R2 < 1: strong relationship ⇒ Find the
+ Sum the ranking for each group (Ti). regression line.
+ Calculate Ti2/ni for each group → Sum(Ti2/ni). - Use “Regression” in Data analysis (Excel).
12 - Equation: 𝑦̂= b0 +b1x
- Test statistic: K = 𝑛(𝑛+1)Sum(Ti2/ni) - 3(n+1)
+ 𝑦̂: predicted value of y
- 𝜒2𝑐𝑟𝑖𝑡 = 𝐶𝐻𝐼𝑆𝑄. 𝐼𝑁𝑉. 𝑅𝑇(𝛼, 𝐶 − 1)
+ b0: sample intercept
- Compare K and 𝜒2𝑐𝑟𝑖𝑡 : If K >𝜒2𝑐𝑟𝑖𝑡 → Reject H0.
+ b1: sample slope
4/ Friedman test:
- Prediction interval to estimate y for a given value of x
- Assumptions:
(x0):
+ Blocks are independent. (∑ 𝑥)2
+ SSxx = ∑ 𝑥 2 - 𝑛
+ There is no interaction between B blocks and C
treatments. + 𝑥̅ = AVERAGE(Table of x)

+ Observations within each block can be ranked. + t = T.INV(𝛼/2, n-2)


- Hypotheses: + Se = Standard Error (see at Summary output)
+ H0: The treatment populations are equal. + 𝑦̂ = (Slope coefficient)x0 + (Intercept coefficient)
+ H1: At least 1 treatment population yields larger 1 (𝑥0 −𝑥̅ )
2
̂ ± t × 𝑠𝑒 √1 + +
⇒ Interval for y: [𝑦 ]
values than at least 1 other treatment population. 𝑛 𝑆𝑆𝑥𝑥

- Rank the data:


+ Rank the data of each block.
+ Sum the ranking for a particular treatment level (R).

You might also like