Professional Documents
Culture Documents
Foundations of Probability and Mathematical Statistics Gus W. Haggstrom PDF
Foundations of Probability and Mathematical Statistics Gus W. Haggstrom PDF
0 .
oe
Negative :
Exponential (4) wa wn
a>o
Gamma (En) 2
Ferro LS zh
cntmsquare (@) L__w2-A? 59g an
28ru/2)
Cauchy Gud)
neo pokey : a
OE +00p)?5
Laplace (a) elealn
n> 0 ce *
z
Pareto (0) @ yh eit a> e>
e>0,as0 QO ere Artem. weaken ter?os
Table 2
‘A COMPARISON OF HYPERGEOMETRIC, BINOMIAL,
AND POISSON PROBABILITIES
Sample Hypergeometric
size op x M510 20 50 100 Binomial Poisson
3 0.4 0 = 024 051 067.073 -078 +135
ie ee) 236) 255/225) 255) 2259 1271
2 1.0 476 1397 1364-354 2346 1271
3 = 238.238.234.232 £230 +180
a 02s) 058) 086. 073) 1077 +090
5 = 1004 2007 009 £010 2036
10 0.2 0 = 043 083.095 -107 +135
1 = 1268 266.268 £268 :271
2 1.0 1418 337.318 £302 2271
3 ~ l248 218.209 2201 +180
4 = 1043 [078 084 2088 £090
5 = "= lore :022 2026 1036
6 = = 1002 004 £006 2012
7 = = 000.000 -001 ‘003
8 = = +000 000 +000 +001
9 - = +000 000 +000 +000
10 5 8 4m) a0 :000 +000
20 O12 0 +067 .095 -122 135
1 = 1259-268 2270 .271
2 1.0 1364 1318 2285 2271
3 = 1234 209 +190 +180
4 = 1069 084 +090 +090
5 = 1007 022 +032 2036
6 - = 008 £009 2012
7 = = L000 +002 +003
8 = 000 +000 +001
9-20 = -000 £000 +000-56-
sample size, then the hypergeometric probabilities P(X =k) differ
Little from the corresponding binomial probabilities, and as N >=
the hypergeometric probabilities tend to the binomial probabilities.
Table 2 compares the two sets of probabilities for N= 100 and for
three sample sizes n= 5, 10, and 20.
Poisson, Suppose that events of a certain type (such as traffic
accidents, arrivals at a checkout counter, emissions of a-particles fron
a radioactive source, vacancies in the Supreme Court during a year, ete.)
are occurring randomly over tine in such a way that certain assumptions
are satisfied (e.g., the events occur singly, and the numbers of occur-
rences in disjoint time intervals are "independent"). Then the number
of occurrences X ina unit time interval can be assumed to have a
Poisson distribution with parameter , where A is the mean nunber of
occurrences in an interval of length one? The number of occurrences in
a time interval of length t has a Poisson distribution with parameter t-
Ihe Poisson distribution also arises as a limit of binomial distribu-
tions as n+ © and p> 0 in such a way that np >. Table 2 gives
the Poisson probabilities for = 2, Compare these probabilities with
() n= 10, p= 0.25
and (c) n= 20, p= 0.1. In all three cases, np = 2. Note that as
the binomial probabilities for (a) n= 5, p= 0.
n increases, the differences between the binomial and Poisson probabilities
become smaller.
Geometric and Negative Binomial. These distributions occur in con-
sidering the nusber of Bernoulli trials required until a certain number
of successes occur. If X is the number of trials required until 1
successes occur, then X has a negative binomial distribution with para~
meters x and p, where p 4s the probability of a success on each
trial. If X 4s the waiting time for the first success (1-e., the
speciel case where r= 1), then X has a geometric distribution. For
example, if two fair dice are tossed again and again until a total of
seven occurs for the first time, then the number of the trial on which
seven occurs has a geometric distribution with parameter p = 1/6, and
the expected number of trials is 6.
Meyer, op: cit., pp. 166-168.-37-
Uniform. A random variable U has a uniform distribution on an interval
(a,b) 4 the probability that U takes on values ia any subinterval (c,d) of
(a,b) is proportional to the length of the subinterval, and the probability
that U takes on values outside the interval (a,b) is zero. For example, the
randon variable in the spinner problem has a uniform distribution on (0,1).
Yormal. This distribution is che most frequently used of all distribu
tions in statistical applications for two reasons: (a) many statistical cal-
culations are greatly simplified if the random variables involved are assumed
to have normal distributions, (>) the normal distribution proviées a reason-
able approximation for distributions of repeated measurements of many physical
phenonena
cranial lengths, ballistic nescurenents (coordinates of deviations
fron the target), logarithms of incones, heights, 1Q scores, suns or averages
of several test scores, etc. The normal distribution is also the liniting
distribution of many distributions (binomial, hypergeonetric, Poisson, negative
binomial, and distributions of suns and averages of random variables that
satisfy certain properties).
A random variable Z is said to have a standard normal distribution
if 2 has density function lz) = (n)!/? 62? for pe <2<=. This
bell-shaped density function is symmetric about zero. It 1s easily verified
that E(Z) = 0 and Var(2) = 1. The distribution function of 2, commonly
denoted by 4 in the statistical Literature, 4s tabulated in Table 3. For
example, P(Z <2) = 4(2) = 0.9772. The values of (2) for negative values
of 2 can be computed using the formula (2) = 1- ¢(-2), which follows
from the symmetry of the distribution about zero. For example,
P(Z <-2) = 1 9(2) = 0.0226. Note that P(-2 ¢), the probability that 6 misestinates the
parameter g(g) by more than c units. Each of these measures of
closeness is an instance of E, L(6,g(8)) where L is a “loss function”
that specifies the loss suffered if the estimated value is 6 and the
parameter value is g(8). Although this more general approach would
seem to apply in more situations, in actual practice loss functions can
racely be specified precisely, and we shall not pursue this approach. For
the purposes of this presentation, we shall concentrate most of our
atcention on the first of the three measures of closeness above. It is
the easiest to work with, since the mean squared error of an estinator
bears a simple relationship to its bias and its variance.
Theorem 9-1. If § is an estimator of g(e) with bias B(6),
the mean squared error of 6 satisfies
EgLs - g()I= vary(s) + (BC@)17.
Proof: This theoren is merely a restatement of the easily verified
fact that, for any random variable Y having mean y and finite vari-
2
ance 02,
BOY = 6)?
of + @-o.
‘The verification is left as an exercise.-116-
It follows from the theorem that, if § is unbiased for g(@),
then the mean squared error of § is just the variance of §. In
many cases, there is a unique unbiased estimator of a parameter that
has minimum variance for every possible paraneter value @.
Definition. An unbiased estimator 6" {s said to be the uniformly
*
minimum variance unbiased (UAVU) estimator of a parameter g(e) if 6
has minimum variance for all values of @. In this case the ef!
ency
of any other estimator § relative to §" is defined to be the ratio
Var (6 )/Varg(6).
For example, suppose XysXpye+44X, are Lit-d., each N(wo).
Of the nine estimators of that were listed earlier, the first six
are all unbiased, Estimators (3)-(6) are all instances of weighted
averages Dy,X(y of the order statistics X().X¢yys++sK(qy where
De = Land y= Wey
for k= 1,2,.49m- That d6, Xqy end Xi
receive the sane weight, X() and X¢q_y) receive the same weight, ete.
If n= 20, the variances of the first six estimators of y are
(1) var(®) = 07/20 = 0.050".
2) var(%) =o.
2
@) Var(IX ay + X¢qy]/2)* 0.1430!
(4) var(ndn(x™)) = 0.0730".
2
(3) Var(IX¢g) + X(q5)]/2) # 0.06107.
a5:
(6) Var (IX egy + X(gy + +++ + Xcqqy 1/18) = 9.05107.
as)!
{See W. J. Dixon and Frank J. Massey, Jr., Introduction to Statistical
Analysis, Second Edition, McGraw-Hill, New York, p. 406.) Thus, among
these unbiased estimators, X has smallest variance.-117-
Te will be shown later in this section that ¥ is the UMW esti-
nator of y in this case. The efficiency of the median relative to
X is 0.05/0.073 = 0.68. It can be shown that for large values of n the
variance of the sample median is approximately (q/2)o"/a, so that for
large n the efficiency of the median relative to X is approxinately
2/r = 0.64, The implication of this is that X achieves approxinately
the same precision as the sample median using only 64 percent as any
observations.
We note in passing that che "trimmed mean" estinator (6) above
has efficiency 0.98. This estimator is almost as efficient as ¥, and
it affords sone protection against gross recording errors and “wild
shots" in the data by eliminating the largest and smallest observation
from the calculation of the estinate.
The other estimators (7)-(9) are biased estimators of ,, bu: each
of then has smaller mean squared error than X for certain values of
The mean squared error of 6,, the estimator which estimates 4 :0 be
equal to © for all values of the observations, is equal to
By - w? (ce - wy which is less than the mean squared error of
X, namely o%/n, for values of p close to c.
The estimator 6 = pe + (1-p)X has bias
E(6) ~ w= pe + (pe ~ w= Plows
and ite variance is
var(6) = (+p)*var@ = (1-p)?o"/a.
Therefore, by Theoren 9-1, the mean-squared error of 6 {6
£06 - w? = Cp)? o'/n + p(w)?
Note that this estimator has smaller variance than X, so that if the bias
of § is not too large (1.e., if ¢ is close to wy), then 6 has onaller-118-
mean squared error than X,
The biased estimator § = max(X,0) has smaller mean squared error
than X for all positive values of y since
(- ws &- wy?
for all possible sample values with strict inequality holding wherever
<0. It follows that
(0 ~ 9°