You are on page 1of 3

Statistics formula sheet

Summarising data
Sample mean:
x =
1
n
n

i=1
x
i
.
Sample variance:
s
2
x
=
1
n 1
n

i=1
(x
i
x)
2
=
1
n 1
_
n

i=1
x
2
i
nx
2
_
.
Sample covariance:
g =
1
n 1
n

i=1
(x
i
x)(y
i
y) =
1
n 1
_
n

i=1
x
i
y
i
nxy
_
.
Sample correlation:
r =
g
s
x
s
y
.
Probability
Addition law:
P(A B) = P(A) +P(B) P(A B).
Multiplication law:
P(A B) = P(A)P(B|A) = P(B)P(A|B).
Partition law: For a partition B
1
, B
2
, . . . , B
k
P(A) =
k

i=1
P(A B
i
) =
k

i=1
P(A|B
i
)P(B
i
).
Bayes formula:
P(B
i
|A) =
P(A|B
i
)P(B
i
)
P(A)
=
P(A|B
i
)P(B
i
)

k
i=1
P(A|B
i
)P(B
i
)
.
Discrete distributions
Mean value:
E(X) = =

x
i
S
x
i
p(x
i
).
Variance:
Var(X) =

x
i
S
(x
i
)
2
p(x
i
) =

x
i
S
x
2
i
p(x
i
)
2
.
The binomial distribution:
p(x) =
_
n
x
_

x
(1 )
nx
for x = 0, 1, . . . , n.
This has mean n and variance n(1 ).
The Poisson distribution:
p(x) =

x
exp()
x!
for x = 0, 1, 2, . . . .
This has mean and variance .
Continuous distributions
Distribution function:
F(y) = P(X y) =
_
y

f(x) dx.
Density function:
f(x) =
d
dx
F(x).
Evaluating probabilities:
P(a < X b) =
_
b
a
f(x) dx = F(b) F(a).
Expected value:
E(X) = =
_

xf(x) dx.
Variance:
Var(X) =
_

(x )
2
f(x) dx =
_

x
2
f(x) dx
2
.
Hazard function:
h(t) =
f(t)
1 F(t)
.
Normal density with mean and variance
2
:
f(x) =
1

2
2
exp
_

1
2
_
x

_
2
_
for x [, ].
Weibull density:
f(t) = t
1
exp(t

) for t 0.
Exponential density:
f(t) = exp(t) for t 0.
This has mean
1
and variance
2
.
Test for population mean
Data: Single sample of measurements x
1
, . . . , x
n
.
Hypothesis: H : =
0
.
Method:
Calculate x, s
2
, and t = |x
0
|

n/s.
Obtain critical value from t-tables, df = n 1.
Reject H at the 100p% level of signicance if |t| > c,
where c is the tabulated value corresponding to col-
umn p.
Paired sample t-test
Data: Single sample of n measurements x
1
, . . . , x
n
which
are the pairwise dierences between the two original sets
of measurements.
Hypothesis: H : = 0.
Method:
Calculate x, s
2
and t = x

n/s.
Obtain critical value from t-tables, df = n 1.
Reject H at the 100p% level of signicance if |t| > c,
where c is the tabulated value corresponding to col-
umn p.
Two sample t-test
Data: Two separate samples of measurements x
1
, . . . , x
n
and y
1
, . . . , y
m
.
Hypothesis: H :
x
=
y
.
Method:
Calculate x, s
2
x
, y, and s
2
y
.
Calculate
s
2
=
_
(n 1)s
2
x
+ (m1)s
2
y
_
/(n +m2).
Calculate t =
xy
_
s
2
_
1
n
+
1
m
_
.
Obtain critical value from t-tables, df = n +m2.
Reject H at the 100p% level of signicance if |t| > c,
where c is the tabulated value corresponding to col-
umn p.
CI for population mean
Data: Sample of measurements x
1
, . . . , x
n
.
Method:
Calculate x, s
2
x
.
Look in t-tables, df = n 1, column p. Let the
tabulated value be c say.
100(1 p)% condence interval for is x cs
x
/

n.
CI for dierence in population means
Data: Separate samples x
1
, . . . , x
n
and y
1
, . . . , y
m
.
Method:
Calculate x, s
2
x
, y, s
2
y
.
Calculate
s
2
=
_
(n 1)s
2
x
+ (m1)s
2
y
_
/(n +m2).
Look in t-tables, df = n +m2, column p. Let the
tabulated value be c say.
100(1 p)% condence interval for the dierence in
population means i.e.
x

y
, is
(x y) c
_
_
s
2
_
1
n
+
1
m
_
_
.
Regression and correlation
The linear regression model:
y
i
= +x
i
+z
i
.
Least squares estimates of and :

n
i=1
x
i
y
i
nxy
(n 1)s
2
x
, and = y

x.
Condence interval for
Calculate

as given previously.
Calculate s
2

= s
2
y

2
s
2
x
.
Calculate SE(

) =
_
s
2

(n 2)s
2
x
.
Look in t-tables, df = n 2, column p. Let the
tabulated value be c.
100(1 p)% condence interval for is

c SE(

).
Test for = 0
Hypothesis: H : = 0.
Calculate
t = r
_
n 2
1 r
2
_
1/2
.
Obtain critical value from t-tables, df = n 2.
Reject H at 100p% level of signicance if |t| > c,
where c is the tabulated value corresponding to col-
umn p.
Approximate CI for proportion
p 1.96
_
p(1 p)
n 1
where p is the observed proportion in the sample.
Test for a proportion
Hypothesis: H : =
0
.
Test statistic z =
p
0
_

0
(1
0
)
n
.
Obtain critical value from normal tables.
Comparison of proportions
Hypothesis: H :
1
=
2
.
Calculate
p =
n
1
p
1
+n
2
p
2
n
1
+n
2
.
Calculate
z =
p
1
p
2
_
p(1 p)
_
1
n
1
+
1
n
2
_
Obtain appropriate critical value from normal tables.
Goodness of t
Test statistic

2
=
m

i=1
(o
i
e
i
)
2
e
i
where m is the number of categories.
Hypothesis H : F = F
0
.
Calculate the expected class frequencies under F
0
.
Calculate the
2
test statistic given above.
Determine the degrees of freedom, say.
Obtain critical value from
2
tables, df = .
Reject H : F = F
0
at the 100p% level of signicance
if
2
> c where c is the tabulated critical value.

You might also like