You are on page 1of 3

QM3-IB (EBS2001), 2016-2017: Statistical Formulas

The first page of this formula sheet contains


- some basic, non-inferential formulas;
- some fundamental principles of inferential statistics.
The second page lists all the inferential tools which have been discussed in either QM1 or QM2.
The third page reports, for all these tools and in the same order, the required assumptions and the associated
conditions.

Basic non-inferential formulas Chapter


åy
y= 3
n
å ( y - y)
2
s= 3
n -1
y-μ y- y
z= (model based) z= (data based) 3
σ s
å zx z y
r= 4
n -1
P ( A and B)
P (B A ) = 5
P( A )
Events A and B are independent whenever P (B A ) = P(B) 5
y = μ y + ε = β 0 + β1 x + ε 15
sy
yˆ = b0 + b1 x where b1 = r and b0 = y - b1 x 4, 15
sx
e = y - yˆ 4, 16
y = μ y + ε = β 0 + β1 x1 + β 2 x 2 + ... + β k x k + ε 17
yˆ = b0 + b1 x1 + b2 x 2 + ... + bk x k 17

SSE
se = 17
n - k -1
SST = SSR + SSE with SST = å ( y - y ) 2 , SSR = å ( yˆ - y ) 2 , SSE = å ( y - yˆ ) 2 = å e 2 17
SSR SSE
R2 = =1- 17
SST SST
SSE /(n - k - 1)
2
R adj =1- 17
SST /(n - 1)
1
VIF j = 18
1 - R 2j

Some fundamental principles of inferential statistics


Sampling distribution of y :
σ
(CLT) As n grows, the sampling distribution approaches the Normal model with μ ( y ) = μ and SD( y ) =
n
confidence interval = point estimate ± critical value * standard error
point estimate - hypothesized value
test statistic = (only for t and z, not for F and χ2)
standard error

1
Quick Guide to Inference

Plan Do Report
Inference One group
Procedure Model Parameter Estimate SE Chapter
about? or more?
p0 q0
1-proportion z-test 10
n
One group z p p̂
1-proportion pˆ qˆ
Proportions 9
z-interval n
Two
2-proportion pˆ1qˆ1 pˆ 2 qˆ2
independent z p1 - p2 pˆ1 - pˆ 2 + 14.5
z-interval n1 n2
groups
1-sample t-test t s
One group μ y 11
1-sample t-interval df = n – 1 n
t
2-sample t-test s12 s22
df from +
2-sample t-interval n1 n2
technology
Two
Means independent μ1 - μ2 y1 - y2 1 1
s pooled +
groups Pooled t-test t n1 n2 13
Pooled t-interval df = n1 + n2 – 2 (n - 1)s12 + (n2 - 1) s22
s 2pooled = 1
n1 + n2 - 2

Matched Paired t-test t sd


μd d
pairs Paired t-interval df = n – 1 n
Goodness-of-fit χ2
One group
Distributions χ2-test df = # cells – 1
(one categorical Several
Homogeneity
variable) independent (Obs - Exp) 2
χ2-test å 14
groups χ2 Exp
Independence Independence
df = (r–1)(c–1)
(two categorical One group
χ2-test
variables)
t-test or se
confidence β1 b1 sx n - 1
Simple interval for β1 (compute with technology)
regression t
(one quantitative One group Confidence
μv μ̂v se2 15
df = n – 2 SE 2 (b1 ) ´ ( xv - x ) 2 +
variable modeled interval for μv n
by another)
Prediction interval se2
yv ŷv SE 2 (b1 ) ´ ( xv - x )2 + + se2
for yv n
t-test or
t
confidence βj bj (from technology)
Multiple interval for each βj
df = n – k – 1
regression 17
F
SSR / k MSR
(one quantitative One group Overall F-test df = k and F= =
variable modeled SSE /( n - k - 1) MSE
n–k–1
by k quantitative Partial F-test for F
(SSER - SSEC ) /( k - g ) Extra text
variables) comparing nested df = k – g and Fpartial =
SSEC /(n - k - 1) week 6
regression models n–k–1

SSTr /(k - 1) MSTr


Several (k) F F= =
SSE /( N - k ) MSE
Means independent One-way ANOVA df = k – 1 and 20
k k
groups N–k with SSTr = å ni ( y i - y ) and SSE = å (ni - 1)si
2 2
i =1 i =1

2
And the Conditions That Support or Override Them
Assumptions for Inference
(SRS = simple random sample)

Proportions
● One group: 1-proportion z
1. Individuals are independent. 1. SRS and n < 10% of the population.
2. Sample is sufficiently large. 2. Successes and failures each ≥ 10.
● Two independent groups: 2-proportion z
1. Groups are independent. 1. Think about the design.
2. Data in each group are independent. 2. SRSs and n < 10% of the population OR random allocation.
3. Samples are sufficiently large. 3. Successes and failures each ≥ 10 in both samples.

Means
● One group: 1-sample t
1. Individuals are independent. 1. SRS and n < 10% of the population.
2. Population has a Normal model. 2. Histogram is approximately unimodal and symmetric.*
● Two independent groups: 2-sample-t or Pooled t
1. Groups are independent. 1. Think about the design.
2. Data in each group are independent. 2. SRSs and n < 10% of the population OR random allocation.
3. Both populations are Normal. 3. Both histograms are approximately unimodal and symmetric.*
Sample variance in both groups is similar;
4. Equal variance (only for Pooled t) 4.
histograms and boxplots per group show similar spreads.
● Matched pairs: Paired t
1. Data are matched. 1. Think about the design.
2. Individuals are independent. 2. SRS and n < 10% of the population OR random allocation.
3. Population of differences is Normal. 3. Histogram of differences is unimodal and symmetric.*

Distributions / Independence
● One group: Goodness of fit χ2
1. Data are counts. 1. Are they?
2. Individuals are independent. 2. SRS and n < 10% of the population.
3. Sample is sufficiently large. 3. All expected counts ≥ 5.
● Several independent groups: Homogeneity χ2
1. Data are counts. 1. Are they?
2. Individuals in groups are independent. 2. SRSs and n < 10% of the population OR random allocation.
3. Groups are sufficiently large. 3. All expected counts ≥ 5.
● One group: Independence χ2
1. Data are counts. 1. Are they?
2. Individuals are independent. 2. SRS and n < 10% of the population.
3. Sample is sufficiently large. 3. All expected counts ≥ 5.
Regression with k predictors
Scatterplots of y against each x are straight enough. Scatterplot of residuals
1. Form of relationship is linear. 1.
against predicted values shows no special structure.
Cross-section: Think about the design.
2. Errors are independent. 2.
Time series: Check for autocorrelation.
Plots of residuals against predicted values and each x have constant spread,
3. Variability of errors is constant. 3.
don’t “thicken”.
4. Errors follow a Normal model. 4. Histogram of residuals is approximately unimodal and symmetric.*

Means
● Several independent groups: One-way ANOVA
1. Individuals are independent. 1. Randomized experiment or other suitable randomization.
Sample variance per level/group is similar;
2. Equal variance across treatment levels. 2.
histograms and boxplots per level/group show similar spreads.
3. The response follows a Normal model. 3. Histograms per level/group are approximately unimodal and symmetric.

(* Less critical as n increases)

You might also like