Professional Documents
Culture Documents
Basic Statistics Formula Sheet PDF
Basic Statistics Formula Sheet PDF
Steven W. Nydick
May 25, 2012
This document is only intended to review basic concepts/formulas from an introduction to statistics course. Only mean-based
procedures are reviewed, and emphasis is placed on a simplistic understanding is placed on when to use any method. After reviewing
and understanding this document, one should then learn about more complex procedures and methods in statistics. However, keep
in mind the assumptions behind certain procedures, and know that statistical procedures are sometimes flexible to data that do not
necessarily match the assumptions.
Descriptive Statistics
Elementary Descriptives (Univariate & Bivariate)
Name
Population Symbol
Sample Symbol
Mean
Sample Calculation
x
=
Main Problems
x
N
P
(x
x)2
N 1
Sensitive to outliers
Median, Mode
Sensitive to outliers
MAD, IQR
Variance
x2
s2x
s2x =
Standard Dev
sx
sx =
Covariance
xy
sxy
sxy =
P
(x
x)(y
y)
N 1
Correlation
xy
rxy
rxy =
sxy
sx sy
rxy =
P
(zx zy )
N 1
nonlinearity
z-score
zx
zx
zx =
2
sx
x
x
;
sx
Biased
z = 0; s2z = 1
Alternatives
MAD
Correlation
Population Symbol
Sample Symbol
Sample Calculation
Meaning
yi = + xi + i
yi = a + bxi + ei
yi = a + bxi
Predict y from x
sxy
s2
x
Slope
b=
Intercept
a = y b
x
zyi = xy zxi + i
Standardized Equation
Slope
Intercept
Effect Size
xy
rxy
None
None
P2
R2
P
(x
x)(y
y)
P
(x
x)2
Predicted y for x = 0
sxy
sx sy
=b
Predict zy from zx
sx
sy
Predicted zy for zx = 0 is 0
2
ry2y = rxy
Inferential Statistics
t-tests (Categorical IV (1 or 2 Groups); Quantitative DV)
Test
Statistic
x
One Sample
Paired Samples
Independent Samples
x
1 x
2
Parameter
D
1 2
Standard Deviation
sx =
sD =
sp =
Standard Error
qP
(x
x)2
N 1
sx
qP
2
(DD)
ND 1
sD
ND
2
(n1 1)s2
1 +(n2 1)s2
n1 +n2 2
sp
1
n1
df
t-obt
N 1
tobt =
ND 1
tobt =
D
D0
s
D
tobt =
(
x1
x2 )(1 2 )0
q
sp n1 + n1
x
0
sx
ND
1
n2
n1 + n2 2
=0
a&b
&
Correlation
Regression (FYI)
NA
N 2
tobt =
r r
sa & sb
N 2
tobt =
a0
sa
NA
e =
qP
(y
y )2
N 2
1r 2
N 2
& tobt =
t-tests Hypotheses/Rejection
Question
One Sample
Paired Sample
Independent Sample
Greater Than?
H0 : #
H0 : D #
H 0 : 1 2 #
H1 : > #
H1 : D > #
H 1 : 1 2 > #
H0 : #
H0 : D #
H 0 : 1 2 #
H1 : < #
H1 : D < #
H 1 : 1 2 < #
H0 : = #
H0 : D = #
H 0 : 1 2 = #
H1 : 6= #
H1 : D 6= #
H1 : 1 2 6= #
Less Than?
When to Reject
t-tests Miscellaneous
Test
One Sample
Paired Samples
Independent Samples
Confidence Interval: % = (1 )%
x
tN 1; crit(2-tailed)
sx
x
0
tN 1; crit(2-tailed) sD
D
D
ND
(
x1 x
2 ) tn1 +n2 2; crit(2-tailed) sp
1
n1
1
n2
x
1 x
2
x
0
sx
d =
d =
D
sD
x
1
x2
sp
b0
sb
Sums of Sq.
Pg
j=1
nj (
xj x
G )2
df
Mean Sq.
F -stat
g1
SSB/df B
M SB/M SW
SSW/df W
Within
Pg
j=1 (nj
1)s2j
N g
Total
i,j (xij
x
G )2
N 1
Effect Size
2 =
SSB
SST
1. We perform ANOVA because of family-wise error -- the probability of rejecting at least one true H0 during multiple tests.
2. G is grand mean or average of all scores ignoring group membership.
3. x
j is the mean of group j; nj is number of people in group j; g is the number of groups; N is the total number of people.
Hypotheses
When to Reject
H 0 : 1 = 2 = = k
Remember Post-Hoc Tests: LSD, Bonferroni, Tukey (what are the rank orderings of the means?)
Hypotheses
H0 : Vars are Independent
Observed
From Table
df
Expected
N pj p k
(Cols - 1)(Rows - 1)
2 Stat
PR PC
i=1
j=1
When to Reject
(fO ij fE ij )
fE ij
H0 : Model Fits
From Table
N pi
Cells - 1
PC
i=1
(fO i fE i )2
fE i
Remember: the sum is over the number of cells/columns/rows (not the number of people)
For Test of Independence: pj and pk are the marginal proportions of variable j and variable k respectively
For Goodness of Fit: pi is the expected proportion in cell i if the data fit the model
N is the total number of people
Regression
1. Relationship is linear
2. Estimating: No outliers
2. Bivariate normality
2. Homogeneity of variance (both groups have the same variance in the population)
One-Way ANOVA
Chi Square (2 )
2. Homogeneity of variance
2. Independence of observations
Each individual is only in ONE cell of the table
Possible Decisions/Outcomes
H0 True
H0 False
Given a population distribution with a mean and a variance 2 , the sampling distribution of the mean using sample size N (or, to put it another way, the distribution
Rejecting H0
Type I Error ()
Correct Decision (1 ; Power)
2
of sample means) will have a mean of x = and a variance equal to x2 = N ,
Not Rejecting H0
Correct Decision (1 )
Type II Error ()
which implies that x = N . Furthermore, the distribution will approach the normal
2
distribution as N , the sample size, increases.
Power Increases If: N , , , Mean Difference , or One-Tailed Test