466 Chapter7s

1
MATH 466: Theory of Statistics

http:
//math.arizona.edu/
~
yueniu/yueniu/Math466_Fall12.html
Yue Selena Niu
yueniu@math.arizona.edu
Oce: 520 Math Building
2
Course Outline:
1. Sampling Distributions and the Central Limit Theorem
2. Estimation and Condence Intervals
3. Properties of Point Estimators and Estimation Methods
4. Hypothesis Testing
5. Linear Regression
7 Statistics, Sampling distributions, and Central Limit Theorem 3
7 Statistics, Sampling distributions, and
Central Limit Theorem
7.1 Sampling Distributions
Denition:
If n random variables (rv) are independent and identically distributed,
we refer them as a random sample and denote them by i.i.d. rv.
Example 7.1.1
Toss a coin n times. The results X
1
, , X
n
are a random sample
from Bin(1,p). Here 1 for Head and 0 for tail; p is Pr(Head).
Example 7.1.2
Take n = 100 measurement of the outdoor temperature at noon. The
readings X
1
, , X
100
can be regarded as a random sample from
N(,
2
), where is the true temperature and the measurment error
is assumed to follow a normal.
Denition:
A Statistic is a function of the observable random variables in a
sample.
Examples of Statistics:
sample total
n
i=1
X
i
, sample mean

X =
1
n
n
i=1
X
i
sample variance S
2
=
1
n1
n
i=1
(X
i

X)
2
X
(1)
, X
(n)
In 7.1.1, a commonly-used statistic is the sample total
n
i=1
X
i
.
In 7.1.2, a commonly-used statistic is sample mean

X = n
1
n
i=1
X
i
.
Note that,

X is not a statistic because it involves the unknown
parameter .
One of the goals of statistical theory is to estimate the unknown
popular parameters by statistics, e.g., can be estimated by

X.
Examples:
In 7.1.1, we want to use the data to estimate p.
In 7.1.2, we want to use the data to estimate .
Denition:
The probability distribution of a statistic (such as

X) is called the
sampling distribution of that statistic.
Examples:
In 7.1.1, the sampling distribution o
n
i=1
X
i
is Bin(n, p).
In 7.1.2, the sampling distribution of

X is N(,
2
/100). (Show
details later)
Sampling distributions can be used to express the uncertainty of the
estimators.
7.2 Sampling Distributions Related to the
Normal Distribution
Denition:
The expression
U = a
1
X
1
+ +a
n
X
n
is called a linear combination of the rvs X
1
, , X
n
.
It is important to understand the properties (e.g. mean, variance,
sampling distribution) of these because many statistics are linear
combinations or functions of linear combinations.
Examples of linear combinations:
n
i=1
X
i
= X
1
+ +X
n
,
X =
1
n
X
1
+ +
1
n
X
n
.
Sampling Dist. of Linear Combinations of Normal rvs
Recall Theorem 6.3: Let X
1
, , X
n
be independent rvs from
N(
i
,
2
i
), i = 1, , n, and let a
1
, , a
n
be known constants. Then
U = a
1
X
1
+ +a
n
X
n
N(
U
,
2
U
), where
U
= a
1
1
+ +a
n
2
U
= a
2
1
2
1
+ +a
2
n
2
n
.
Proof: Use the method of moment generating functions.
m
X
i
(t) = exp
_
i
t +

2
i
t
2
2
_
.
And the fact that
m
U
(t) = m
X
1
(a
1
t) m
X
2
(a
2
t) m
X
n
(a
n
t).
Sampling Distribution of the Sample Mean of Normals
Let X
1
, , X
N
be a random sample from N(,
2
). Apply Theorem
6.3 to determine the distribution of the sample mean:
X = (X
1
+ +X
n
)/n.
Answer:

X N(,
2
/n)
Example 7.2.1 SAT scores of entering students at UofA follow the
normal distribution with mean of 575 and standard deviation of 40. A
random sample of 25 students is selected. Find the probability that the
sample mean of the 25 SAT scores is less than 585.
Example 7.2.2 In Example 7.2.1, how many observations should be
included in the sample if we wish the sample mean to dier from the
population mean by no more than 10 with probability 0.95?
Sampling Distribution of Other Linear Combinations of
Normals
Note: Linear combinations of Normal random variables are still normal.
Let X
1
, , X
n
be a random sample from N(,
2
). Then
U
1
= (X
1
+ +X
n
)/n N(,
2
/n)
U
2
= X
1
+ +X
n

U
3
= X
1
X
2

U
4
=
X
1
2

X
2
2
N(0, 1)
U
5
= X
1
+X
2
+X
3
3X
4
N(0, 12
2
)
Example 7.2.3 Suppose that random variables Y
1
, Y
2
and Y
3
are a
random sample from the normal distribution with = 0 and
2
= 1.
State the distribution with associated parameter values of each of the
following functions of Y
1
, , Y
3
.
1. U
1
= (Y
1
+ 2Y
2
)/3 Y
3
2. U
2
= Y
1
+Y
2
+Y
3
Sampling Distribution of the Chi-square Statistic
The sum of squares of v independent standard normal rvs is a
Chi-square with v degrees of freedom. That is, if Z
1
, , Z
v
are
i.i.d. N(0, 1) random variables, then
U = Z
2
1
+Z
2
2
+ +Z
2
v
follows
2
v
, the chi-square distribution with v degrees of freedom.
Proof: Use the method of moment generating functions.
m
Z
2
i
(t) = (1 2t)
1/2
.
And the fact that
m
U
(t) = m
Z
2
1
(t) m
Z
2
2
(t) m
Z
2
n
(t) = (1 2t)
n/2
.
Recall:
2
is the same as Gamma distribution with = v/2 and

= 2.
2
v
Gamma( = v/2, = 2)
Sampling Distribution of the Sample Variance
Let X
1
, , X
n
be i.i.d. N(,
2
) rvs, and dene the sample mean
X =
1
n
(X
1
+ +X
n
)
and the sample variance
S
2
=
1
n 1
n
i=1
(X
i

X)
2
.
Then
(n 1)S
2
2

2
n1
,
i.e. it follows the chi-square distribution with v = n 1 degrees of
freedom. That is,
n
i=1
(X
i

X)
2
2

2
n1
.
Proof (for n = 2):
First show that
2
i=1
(X
i
X)
2
2
=
_
X
1
X
2
2
_
2
Whats the distribution of
X
1
X
2
2
?
Properties of the chi-square distribution
The percentage points for the chi-square distribution are tabulated in
Table 6 of Appendix 3. Suppose
2
is a rv following the chi-square
distribution with v degrees of freedom. For a given , the table gives
the value
2
v,
that solves:
P(
2

2
v,
) = .
The value
2
v,
is the th percentage point, and the (1 )th
quantile of the chi-square distribution with v degrees of freedom.
Example 7.2.4 SAT scores of entering students at UofA follow the
normal distribution with mean of 575 and standard deviation of 40. A
random sample of 25 students is selected. Find the probability that the
sample variance of the 25 SAT scores is less than 2200.
If X
1
, , X
n
are i.i.d. N(,
2
) rvs, then

X and S
2
are in-
dependent random variables.
Justication for the independence of

X and S
2
(n = 2):
U
1
= X
1
+X
2
and U
2
= X
1
X
2
can be shown to be
independent (see Example 6.13 on WMS text)

X is only a function of U
1
, and S
2
is only a function of U
2
, so

X
and S
2
are also independent.
Properties of

X and S
2
for N(,
2
) r.s.:
1.

X N(,
2
/n)
2. (n 1)S
2
/
2

2
n1
3.

X and S
2
are independent
For a formal proof, refer to Section 4.8 of Introduction to
Mathematical Statistics by Hogg and Craig.
Example 7.2.5 Suppose that random variables Y
1
, Y
2
and Y
3
are a
random sample from the normal distribution with = 1 and
2
= 4.
Find the distribution of U =
3
i=1
(Y
i

Y )
2
, where

Y = 1/3
3
i=1
Y
i
.
Students t-statistic
Denition: If Z N(0, 1) and
2
v
chi-square with v degrees of
freedom, and if Z and
2
v
are independent, then the statistic
t = t
v
=
Z
_
2
v
/v
is a Students t-statistic with v degrees of freedom, and it has the pdf:
f(t) = K
_
1 +
t
2
v
_
(v+1)/2
, < t < +,
where
K =
((v + 1)/2)
v(v/2)
.
See Exercise 7.98 for the derivation of the pdf.
Properties of the t-distribution
Like the standard Normal, it is symmetric about 0.
Centered at 0, and has a shape similar to that of the Normal.
Has a fatter tail and larger variation than the standard Normal
As v , t
v
N(0, 1)
The mean and variance of the t-distribution are:
E(t
v
) = 0, if v > 1
V (t
v
) =
v
v 2
, if v > 2
t
1
: Cauchy(0,1) distribution, which has no mean, variance or
higher moments dened.
A comparison of histograms of N(0, 1), t
3
and t
10
:
Properties of the t-distribution
The percentage points for the t-distribution are tabulated in Table 5 of
Appendix 3. For a given , the table gives the value t
v,
that solves:
P(t
v
t
v,
) = .
The value t
v,
is the th percentage point, and the (1 )th quantile
of the t-distribution with v degrees of freedom.
The t-statistic in Normal Samples
If

X and s
2
are the sample mean and variance of a random sample
from N(,
2
), then
Z =

X
/
n
N(0, 1) (1)
W =
(n 1)S
2
2

2
n1
(2)
t =

X
S/
n
t
n1
(3)
Verify (3)!
William Gosset aka Student
William Gosset who worked at Guinness brewery published an article in
1908 on Biometrika describing the t-statistic and its distribution. He
published under the pseudonym Student.
See: http://en.wikipedia.org/wiki/William_Gosset
Example 7.2.6 Suppose T t
5
.
1. Calculate P(T > 2)
2. Find c such that P(|T| < c) = 0.90
3. Find c such that P(T > c) = 0.25
F DistributionDenition
Let W
1
and W
2
be independent chi-square random variables with v
1
and v
2
degrees of freedom, respectively. Then the statistic
F =
W
1
/v
1
W
2
/v
2
is said to follow the F distribution with v
1
and v
2
df. The v
1
is called
the numerator df and v
2
is the denominator df. We denote
F F(v
1
, v
2
).
The pdf of F Distribution
The pdf for F(v
1
, v
2
) can be derived:
f
F
(x) =
(
v
1
+v
2
2
)
(v
1
/2)(v
2
/2)
_
v
1
v
2
_
x
v
1
/21
_
1 +
v
1
v
2
x
_
(v
1
+v
2
)/2
for 0 < x < +, where v
1
and v
2
are positive integers.
F DistributionPercentage Points
The percentage points for the F distribution are tabulated in Table 7
of Appendix 3. For given values of v
1
, v
2
and , the table gives the
value F
v
1
,v
2
,
that solves:
P(F
v
1
,v
2
F
v
1
,v
2
,
) = .
Note that F
v
1
,v
2
represent an F rv with v
1
and v
2
df, and F
v
1
,v
2
,
denotes the th percentage point of the F distribution with v
1
, v
2
df.
Histogram of F distribution, and the percentage point.
F Distribution
For v
2
> 2,
E(F
v
1
,v
2
) = v
2
/(v
2
2).
One useful conclusion: If F F(v
1
, v
2
), then 1/F F(v
2
, v
2
).
Why?
Percentage Point relationship (extends tables):
F
v
1
,v
2
,1
=
1
F
v
2
,v
1
,
F DistributionConnection to Normal
Notice that if we have two independent samples of sizes n
1
and n
2
respectively from normal distributions with a common variance, then
S
2
1
/S
2
2
has a F distribution with (n
1
1) numerator df and (n
2
1)
denominator df.
More formally, suppose X
11
, , X
1n
1
and X
21
, , X
2n
2
are
independent random samples from N(
1
,
2
) and N(
2
,
2
),
respectively. Let
S
2
i
=
1
n
i
1
n
i
j=1
(X
ij

X
i
)
2
, i = 1, 2.
Then
W
i
=
(n
i
1)S
2
i
2

2
n
i
1
, i = 1, 2
and W
1
and W
2
are independent. From these we can form
F =

2
n
1
1
/(n
1
1)
2
n
2
1
/(n
2
1)
=
W
1
/(n
1
1)
W
2
/(n
2
1)
=
(n
1
1)S
2
1
/[
2
(n
1
1)]
(n
2
1)S
2
2
/[
2
(n
2
1)]
=
S
2
1
S
2
2
.
Therefore, under the stated condition, S
2
1
/S
2
2
follows the F
distribution with v
1
= n
1
1 and v
2
= n
2
1 df.
Example 7.2.7 Compute P(S
1
cS
2
) for c
2
= 2, n
1
= 11, n
2
= 18
using Appendix 3.
Sir Ronald Fisher (1890-1962)
The F stands for Fisher. Sir Ronald Fisher was a British geneticist and
statistician who is often referred to as the father of modern statistics.
http://en.wikipedia.org/wiki/Ronald_Fisher
The F statistic is used to test important hypotheses in the analysis of
variance and in regression analysis.
Review for normal samples
Suppose X
1
, , X
n
i.i.d. N(
x
,
2
x
) and
Y
1
, , Y
m
i.i.d. N(
y
,
2
y
) are independent. Then we have

X N(
x
,
2
x
/n)
(n 1)S
2
x
/
2
x

2
n1
, S
2
x
= (n 1)
1
n
i=1
(X
i

X)
2
.

X and S
2
are independent
T =

X
x
S
x
/
n
t
n1
F =
S
2
x
2
y
S
2
y
2
x
F
n1,m1
Example 7.2.8 Suppose that Y
1
, , Y
3
are independent standard
normal random variables. State the distribution with associated
parameter values of each of the following functions of Y
1
, Y
2
and Y
3
.
1. U
1
=

Y
2. U
2
= Y
2
1
+Y
2
2
+Y
2
3
3. U
3
=
(Y
1
+Y
2
)/
Y
2
3
4. U
4
=
Y
1
0.5(Y
2
2
+Y
2
3
)
5. U
5
=
2Y
2
1
Y
2
2
+Y
2
3
7.3 Central Limit Theorem
Central Limit Theorem
Assume that X
1
, , X
n
are i.i.d. rvs with nite mean and
variance
2
. Suppose that
2
< +. Then as n +, the
distribution of
U
n
=

X
/
n
approaches that of the standard normal N(0, 1).
Thus, for large n we may use the approximation:
P
_

X
/
n
t
_
P(Z t),
where Z is a standard normal rv. The approximation improves as n
increases.
When the CLT holds, we say

X is asymptotically normal (AN)
with mean and variance

2
n
, i.e.
X = AN
_
,

2
n
_
.
Similarly, when the CLT holds, we say
n
i=1
X
i
= n

X is
asymptotically normal (AN) with mean n and variance n
2
, i.e.
n
i=1
X
i
= AN(n, n
2
).
Figure 1: In this simulation experiment random samples of size n(=
1, 10, 20, 30, 40, 100) were simulated from a Uniform(0, 1) and the sam-
ple mean x was calculated. The histograms are based on the 1000
simulated sample means. Normality is achieved with n = 10!
1, 10, 20, 30, 40, 100) were simulated from a Exponential(1) and the
sample mean x was calculated. The histograms are based on the 1000
simulated sample means. Normality is achieved with n = 20!
1, 10, 20, 30, 40, 100) were simulated from a t
1
= Cauchy(0, 1) and the
sample mean x was calculated. The histograms are based on the 1000
simulated sample means. Clearly the CLT fails here!! Why?
Example 7.3.1 Let X
1
, , X
n
be a random sample of size n of
inter-arrival times between calls to a switchboard. It is known that the
X
i
s are exponentially distributed with mean arrival rate 2 seconds.
Find the probability that the sample mean of 36 observations will be
less than 2.1.
Continuity Correction: Approximate Distribution of a
Discrete rv
Suppose that X AN(,
2
) but is discrete, and suppose X is
measured to the nearest whole unit. For example, suppose X =weight
of female patients measured to the nearest lb and suppose
X AN( = 125,
2
= 25). Then
P(X 130) = P(X < 131) P(Z
130.5 125
5
) = P(Z 1.1) = 0.8643,
where 0.5 is added to 130 as a correction for continuity. A continuity
correction can also be applied when other discrete distributions
supported on the integers are approximated by the normal distribution.
Approximating the Binomial with the Normal
Suppose Y Binomial(n, p). Let
X
i
=
_
_
_
1 if trial i is a success
0 otherwise
Then Y = X
1
+ +X
n
. Furthermore, X
1
, , X
n
are independent
and X
i
Binomial(1, p), i = 1, , p. Thus, E(X
i
) = p and
V (X
i
) = p(1 p) 0.25 < +. By the CLT,
X AN(p, p(1 p)/n)

and
Y = n

X AN(np, np(1 p)).
Therefore, if n is large, with the continuity correction,
P(Y y) P
_
Z
(y + 0.5) np
_
np(1 p)
_
.
Criteria for Approximating Binomial(n, p) with
N(np, np(1 p))
The approximation is acceptable when
0 < p 3
_
p(1 p)/n
and
p + 3
_
p(1 p)/n < 1
both hold. These hold when n is moderately large and p is not near 0
or 1. In some other texts, the following criteria is used:
np 10 and n(1 p) 10.
Example 7.3.2 Suppose that Y Binomial(50, 0.25), then
calculate P(Y 10) with the normal approximation.

466 Chapter7s

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

466 Chapter7s

Uploaded by

Copyright:

Available Formats

1

MATH 466: Theory of Statistics

is the same as Gamma distribution with = v/2 and

X AN(p, p(1 p)/n)

You might also like