Statistical Formulas

Statistics formula sheet
Summarising data
Sample mean:
x =
1
n
n
i=1
x
i
.
Sample variance:
s
2
x
=
1
n 1
n
i=1
(x
i
x)
2
=
1
n 1
_
n
i=1
x
2
i
nx
2
_
.
Sample covariance:
g =
1
n 1
n
i=1
(x
i
x)(y
i
y) =
1
n 1
_
n
i=1
x
i
y
i
nxy
_
.
Sample correlation:
r =
g
s
x
s
y
.
Probability
Addition law:
P(A B) = P(A) +P(B) P(A B).
Multiplication law:
P(A B) = P(A)P(B|A) = P(B)P(A|B).
Partition law: For a partition B
1
, B
2
, . . . , B
k
P(A) =
k
i=1
P(A B
i
) =
k
i=1
P(A|B
i
)P(B
i
).
Bayes formula:
P(B
i
|A) =
P(A|B
i
)P(B
i
)
P(A)
=
P(A|B
i
)P(B
i
)
k
i=1
P(A|B
i
)P(B
i
)
.
Discrete distributions
Mean value:
E(X) = =
x
i
S
x
i
p(x
i
).
Variance:
Var(X) =
x
i
S
(x
i
)
2
p(x
i
) =
x
i
S
x
2
i
p(x
i
)
2
.
The binomial distribution:
p(x) =
_
n
x
_
x
(1 )
nx
for x = 0, 1, . . . , n.
This has mean n and variance n(1 ).
The Poisson distribution:
p(x) =

x
exp()
x!
for x = 0, 1, 2, . . . .
This has mean and variance .
Continuous distributions
Distribution function:
F(y) = P(X y) =
_
y
f(x) dx.
Density function:
f(x) =
d
dx
F(x).
Evaluating probabilities:
P(a < X b) =
_
b
a
f(x) dx = F(b) F(a).
Expected value:
E(X) = =
_

xf(x) dx.
Variance:
Var(X) =
_

(x )
2
f(x) dx =
_

x
2
f(x) dx
2
.
Hazard function:
h(t) =
f(t)
1 F(t)
.
Normal density with mean and variance
2
:
f(x) =
1
2
2
exp
_
1
2
_
x
_
2
_
for x [, ].
Weibull density:
f(t) = t
1
exp(t
) for t 0.
Exponential density:
f(t) = exp(t) for t 0.
This has mean
1
and variance
2
.
Test for population mean
Data: Single sample of measurements x
1
, . . . , x
n
.
Hypothesis: H : =
0
.
Method:
Calculate x, s
2
, and t = |x
0
|
n/s.
Obtain critical value from t-tables, df = n 1.
Reject H at the 100p% level of signicance if |t| > c,
where c is the tabulated value corresponding to col-
umn p.
Paired sample t-test
Data: Single sample of n measurements x
1
, . . . , x
n
which
are the pairwise dierences between the two original sets
of measurements.
Hypothesis: H : = 0.
Method:
Calculate x, s
2
and t = x
n/s.
umn p.
Two sample t-test
Data: Two separate samples of measurements x
1
, . . . , x
n
and y
1
, . . . , y
m
.
Hypothesis: H :
x
=
y
.
Method:
Calculate x, s
2
x
, y, and s
2
y
.
Calculate
s
2
=
_
(n 1)s
2
x
+ (m1)s
2
y
_
/(n +m2).
Calculate t =
xy
_
s
2
_
1
n
+
1
m
_
.
Obtain critical value from t-tables, df = n +m2.
umn p.
CI for population mean
Data: Sample of measurements x
1
, . . . , x
n
.
Method:
Calculate x, s
2
x
.
Look in t-tables, df = n 1, column p. Let the
tabulated value be c say.
100(1 p)% condence interval for is x cs
x
/
n.
CI for dierence in population means
Data: Separate samples x
1
, . . . , x
n
and y
1
, . . . , y
m
.
Method:
Calculate x, s
2
x
, y, s
2
y
.
Calculate
s
2
=
_
(n 1)s
2
x
+ (m1)s
2
y
_
/(n +m2).
Look in t-tables, df = n +m2, column p. Let the
tabulated value be c say.
100(1 p)% condence interval for the dierence in
population means i.e.
x
y
, is
(x y) c
_
_
s
2
_
1
n
+
1
m
_
_
.
Regression and correlation
The linear regression model:
y
i
= +x
i
+z
i
.
Least squares estimates of and :
n
i=1
x
i
y
i
nxy
(n 1)s
2
x
, and = y

x.
Condence interval for
Calculate

as given previously.
Calculate s
2
= s
2
y
2
s
2
x
.
Calculate SE(
) =
_
s
2
(n 2)s
2
x
.
Look in t-tables, df = n 2, column p. Let the
tabulated value be c.
100(1 p)% condence interval for is

c SE(
).
Test for = 0
Hypothesis: H : = 0.
Calculate
t = r
_
n 2
1 r
2
_
1/2
.
Reject H at 100p% level of signicance if |t| > c,
umn p.
Approximate CI for proportion
p 1.96
_
p(1 p)
n 1
where p is the observed proportion in the sample.
Test for a proportion
Hypothesis: H : =
0
.
Test statistic z =
p
0
_
0
(1
0
)
n
.
Obtain critical value from normal tables.
Comparison of proportions
Hypothesis: H :
1
=
2
.
Calculate
p =
n
1
p
1
+n
2
p
2
n
1
+n
2
.
Calculate
z =
p
1
p
2
_
p(1 p)
_
1
n
1
+
1
n
2
_
Obtain appropriate critical value from normal tables.
Goodness of t
Test statistic
2
=
m
i=1
(o
i
e
i
)
2
e
i
where m is the number of categories.
Hypothesis H : F = F
0
.
Calculate the expected class frequencies under F
0
.
Calculate the
2
test statistic given above.
Determine the degrees of freedom, say.
Obtain critical value from
2
tables, df = .
Reject H : F = F
0
at the 100p% level of signicance
if
2
> c where c is the tabulated critical value.

Statistical Formulas

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Formulas

Uploaded by

Copyright:

Available Formats

Statistics formula sheet

You might also like