Professional Documents
Culture Documents
Introduction
The main purpose of the notes is to quickly summarize facts from Probability Theory (STAT 4351
or 5351) and Mathematical Statistics (STAT 4352 or 5352) courses. The notes also provide further
generalizations of the basic notions that will be extensively utilized throughout this course.
Detailed descriptions can be found in the first chapter of your textbook.
1
where µ = µX = E [X] . Indeed,
2
(n + 1) (2n + 1) n+1 n+1 (n + 1) (n − 1)
2
σX = Var [X] = − = × [2 (2n + 1) − 3 (n + 1)] = ,
6 2 12 12
which can be simplified to
n2 − 1
2
σX = Var [X] = . (3)
12
follows the distribution denoted as Bin (n, p) . It is known from Probability Theory course that
n!
P [Y = k] = pk (1 − p)n−k for k = 0, 1, . . . , n. (6)
k! (n − k)!
If n = 1, this distribution is the Bernoulli one, with the success rate equal to p. That is why Bernoulli can
be denoted as Bin (1, p) . It was shown in Probability Theory that
It is important to realize that a single Bernoulli variable is usually identified with the result of one game,
while Y (or the sum of n independent Bernoulli variables) measures the number of victories after a set of
n games.
2
1.4 Geometric and Negative Binomial Distributions
A random variable (X) has a Geometric distribution with parameter (also referred to as a success rate)
p if it takes an integer value from X = {0, 1, . . . , } , with probabilities:
Under this definition, X is interpreted as a number of losses (or failures) prior to the first success. If you
consider a shifted variable, W = X + 1, then W indicates the time when first success occurs. Equivalently,
W = m is and only if the first (m − 1) trials result in failures, and the mth trial is a success.
Expected value and variance of W can be found as follows:
1 (1 − p)
E [W ] = and Var [W ] = . (9)
p p2
Using the relationship, X = W − 1, we obtain:
1 (1 − p) (1 − p)
E [X] = −1= and Var [X] = . (10)
p p p2
Notice that Var [X] = Var [W ] .
The negative binomial distribution relates to the geometric one in a way similar to what the binomial
was for the Bernoulli variable. If {Wj : j = 1, 2, . . . , r} are independent random variables, each having
the same geometric distribution, Geom [p] , then their sum, Y = rj=1 Wj , is said to have the negative
P
binomial distribution denoted as NB [r, p] . The event, [Y = k] , states that the rth success occurs at the
trial number k. The formula for negative binomial probabilities is as follows:
(k − 1)!
P [Y = k] = pr (1 − p)k−r , for k = r, r + 1, . . . . (11)
(r − 1)! (k − r)!
The variable T = Y − r, measuring the total number of failures before the rth success occurs, has a shifted
distribution,
(m + r − 1)! r
P [T = m] = p (1 − p)m , for m = 0, 1, . . . . (12)
(r − 1)! (m)!
The expected value and variance for either version of the negative binomial distribution can be found
similar to what you have seen in the Probability Theory course. Using the same convention, that is Y is
the time of the rth success and T is the count of failures that preceded this success, you can obtain:
r r (1 − p)
E [Y ] = and E [T ] = , (13)
p p
while
(1 − p)
Var [Y ] = Var [T ] = r · . (14)
p2
Remark: Both geometric and negative binomial distributions have a countable supporting set, unlike
the previously considered examples.
3
1.5 Poisson Distributions
This is another type of a discrete distribution, also with the countable set of distinct values. A random
variable (Y ) follows a Poisson distribution with the intensity parameter = λ if
λk −λ
P [Y = k] = e for k = 0, 1, 2, . . . . (15)
k!
The interpretation of Poisson distributions is simple: when Yn has the binomial distribution, Bin [n, p] ,
with n → ∞ and p = pn → 0, in such a way that np → λ > 0, or simply n pn ≈ λ > 0, then the probabilities
converge:
λk −λ
lim P [Yn = k] = e = P [Y = k] for k = 0, 1, 2, . . . .
n→∞ k!
The expectation and variance of the Poisson distribution were derived in Probability Theory course as
Commonly used continuous distributions will be summarized in the same way as we did with their discrete
analogs. The scenario is aimed at making you refresh your memories from the Probability course is such
that the density function is introduced first. After that, the first and second moments will be outlined.
4
2.1 Uniform Distributions
A real valued random variable (U ) follows the uniform distribution over the unit interval, (0, 1) if its density
is
f (x) = 1 for 0 < x < 1 and f (x) = 0 elsewhere (or otherwise) (19)
Often we need to extend the uniform distribution by considering it extended to an interval, (a, b) , where
a < b. The general uniform distribution has the density function,
1
f (x) = f (x |a, b ) = for a < x < b and f (x) = 0 elsewhere (20)
b−a
It is easy to see equivalence of two statements:
• Variable,
X −a
U=
b−a
has the standard uniform on the interval (0, 1).
X = a + (b − a) U. (21)
Recall that the moments of a continuous random variable, say X, are defined as integrals. If T = T (X) is
a transformed random variable, such a power of X or a polynomial of X, then
Z ∞
E [T ] = T (x) f (x) dx, (22)
−∞
provided this (generally, improper) integral converges to a finite value. Fortunately, the uniform distribu-
tion does not create such a trouble as divergent integral. The expected value and variance are:
a+b (b − a)2
E [X] = and Var [X] = . (23)
2 12
In particular, if U is uniformly distributed over the unit interval, (0, 1) , we obtain:
1 1
E [U ] = and Var [U ] = .
2 12
It is also helpful to realize that
1
E [U r ] = for any r > −1,
r+1
not only for moments of integer order. If r ≤ −1, the integral above diverges.
5
2.2 Gamma Distributions
Recall the definition of Γ function introduced by Leonard Euler for argument a > 0.
Z ∞
Γ (a) = xa−1 e−x dx. (24)
0
Γ (a + 1) = a · Γ (a) ,
Γ (n) = (n − 1)!,
Notice that integration is actually performed over the interval, (0, ∞) . The first two moments of Gamma
distributed and standardized random variable, say X, are:
h i
E [X] = Γ (a + 1) ÷ Γ (a) = a and E X 2 = Γ (a + 2) ÷ Γ (a) = a (a + 1) . (26)
A general Gamma distribution has two parameters, a being viewed as the shape and b that indicates the
scale. It density function is:
1 x
f (x) = f (x |a, b ) = xa−1 e− b for 0 < x < ∞. (28)
ba Γ (a)
It is easy to notice that if Y has the density (28), then X = Yb follows the distribution (25). Therefore,
the first two moments of a general Gamma distributed random variable, Y, can be found as
6
2.2.1 Special Case: Exponential Distributions
The exponential distribution is a particular case of Gamma, with a = 1 and b > 0. Frequently, the density
function defined by (28) is presented in the form:
1 x
f (x) = f (x | b) = exp − , (30)
b b
with the scale parameter b replaced by its reciprocal, λ = 1b ., so equation (30) is replaced by
7
3 Transformed Continuous Variables
A common setup includes a real-valued continuous variable, X, with density function, f (x) = f X (x), and
transformed variable, Y = T (X). Several facts related to CDF and density for Y covered in STAT 4351
are reminded here.
3.2 Reflection
Assume that Y = −X.
1. CDF for Y is:
F Y (y) = P [−X ≤ y] = 1 − F (−y)
8
3.4 Scale and Shift Transformation
Assume that b > 0 and a is an arbitrary real number. Consider Y = a + b · X. Combining two previously
considered situations, obtain results for Y as follows.
1. CDF for Y is:
y−a
Y X
F (y) = F
b
2. Density function for Y is:
1 y−a
Y
f (y) = · f X
b b
3. Expectation and variance for Y are:
E [Y ] = a + b · E [X] and Var [Y ] = b2 · Var [X]
If the scale parameter b < 0, you can derive all characteristics of Y = a + bX combining the above formulas
with reflection.
3.5 Reciprocal
Assume for the sake of simplicity that X ≥ 0 with probability one and consider Y = X −1 . Moments of Y
may fail to exist, so we shall limit our curiosity by CDF and density only.
1. CDF for Y is:
1
Y X
F (y) = 1 − F
y
2. Density of Y is:
1 1
f (y) = 2 · f X
Y
y y
4 Bivariate Distributions
Assume that observable are pairs, (X, Y ), where X and Y are random variables, discrete or continuous.
Even a mixed case is covered here, as variances and covariance are concerned.
Suppose that all necessary moments are finite.
9
4.2 Covariance and Correlation
Covariance between X and Y is defined as follows:
Using variances and their square roots (standard deviations), the correlation can be introduced as follows:
Cov [X, Y ]
Corr [X, Y ] = ,
σX · σY
p p
where σX = (σX )2 and σY = (σY )2
Combining this formula with other properties of the variance, conclude that
If (X, Y ) are independent, then their covariance is zero, and therefore, its correlation also vanishes. The
converse generally is not true. For uncorrelated variables, (X, Y ), the variances of their sum or difference
coincide:
Var [X ± Y ] = Var [X] + Var [Y ]
10