You are on page 1of 10

Outline

1 Introduction
2 Random numbers
Probability
IN5340 / IN9340 Lecture 1 Ensemble averages
Random variables, vectors and sequences Moments
Useful random variables
3 Random vectors
Joint distribution
Roy Edgar Hansen January 2022 Sum of random variables
Central limit theorem
Joint moments
4 Summary

1 / 37

What do we learn Deterministic and Random numbers


Introduction to probability, random variables and random vectors Deterministic number:
Can be reproduced repeatedly
Important building block in statistical signal processing
Recommended literature: Random number:
Cannot be repeatable in a predictable manner
R. M. Gray and L. D. Davisson.
An Introduction to Statistical Signal Processing.
Cambridge University Press, 2004.
URL https://ee.stanford.edu/~gray/sp.pdf.
M. H. Hayes.
Statistical Digital Signal Processing and Modeling.
John Wiley & Sons, 1996.

Any book on statistical signal processing or probability and random variables


A lot of relevant information on Wikipedia
Barry Van Veen and John Buck on Youtube
2 / 37 3 / 37
Probability Probability cont.
Probability We define Pr {x } as the probability that the experiment (coin flipping) will result in a specific
value x
Probability is a measure or estimation of likelihood of occurrence of an event. Probabilities are
given a value between 0 and 1. The higher the degree of probability, the more likely the event Probability assignment: Pr {H } = 0.5 and Pr {T } = 0.5
is to happen. The set of all experimental outcomes is called the sample space Ω where Pr {Ω} = 1

Example: coin flipping The coin flipping experiment: Ω = {H , T } and Pr {H , T } = 1

Equally likely to result in Heads and Tails


Flipping the coin Nt times would result in nH Heads and nT Tails
We should find
nH nT
≈ 0.5, ≈ 0.5
Nt Nt
Two experimental outcomes H = {Heads} and T = {Tails}

4 / 37 5 / 37

Probability distribution function Probability density function


The probability distribution function is defined as the probability that a random variable x is The probability density function (PDF) is defined as
less than or equal to a specific value
dFx (α)
fx (α) =
Fx (α) = Pr {x ≤ α} dα

Also called cumulative distribution function (CDF) or probability distribution This gives Z α
Example: Coin flipping Fx (α) = fx (u )du
−∞
Pr {x = −1} = 0.5 (Tails)
Probability density function
Pr {x = 1} = 0.5 (Heads)
The relative likelihood for a random variable to take on a given value. The probability of
CDF  the random variable falling within a particular range of values is given by the integral of the
 0 ; α < −1
variable’s density over that range.
Fx (α) = 0.5 ; −1 ≤ α < 1
1 ; 1≤α

6 / 37 7 / 37
Expectation Expected value of a function of a random variable
Expectation: Assume a random variable x with a known PDF fx
The expected value of x
The expectation of any function g (x ) becomes
The mathematical expectation of x
The statistical average of x Z ∞
The mean value of x E {g (x )} = g (α)fx (α)d α
−∞
Expected value of a discrete random variable
Sometimes called The law of the unconscious statistician (LOTUS)
X
E {x } = αk Pr {x = αk } Excellent description in wikipedia.org
k

In terms of the probability density function Used as a rule later on


Z ∞
E {x } = αfx (α)d α
−∞

Example: dice experiment on blackboard


8 / 37 9 / 37

Moments Central Moments


Assume the function g (x ) = x n Assume the function g (x ) = (x − x )n
The nth moment mn is defined The nth central moment µn is defined
Z ∞ Z ∞
mn = E {x n } = αn fx (α)d α µn = E {(x − x )n } = (α − x )n fx (α)d α
−∞ −∞

m1 is the mean value also denoted x or mx . The second central moment µ2 is a very important statistical average referred to as the
m2 is also an important statistical average referred to as the mean squared value variance
The square root of the variance is called the standard deviation of the random variable

σx = µ2

10 / 37 11 / 37
Central Moments cont. Uniform distribution
The third central moment is a measure of the asymmetry of the probability density function
fx (α) called skew
µ3 = E {(x − x )3 }
Equal probability of all values within bounds
The normalized third central moment is known as the skewness of the density function
Matlab function rand
γ1 = µ3 /σx3
Probability density function
The fourth central moment, known as the kurtosis, is a measure of the heaviness of the tail of 
 0 ; α<a
the distribution
fx (α) = 1/(b − a) ; a ≤ α ≤ b
µ4 = E {(x − x )4 }
0 ; α>b

See wikipedia or other sources Example: a = −1, b = 1

12 / 37 13 / 37

Normal random variable Recap probability functions


Probability Pr {x }
Cumulative distribution function (CDF) Fx (α)
Matlab function randn Probability density function (PDF) fx (α)
Probability density function Several properties of the density functions may be stated:
1
(α − mx )2
 
1
fx (α) = p exp − fx (α) ≥ 0 for all α
2πσx2 2σx2
2 Z ∞
Also called Gaussian random variable fx (α)d α = 1
−∞
Often denoted as x (ζ) ∼ N (mx , σx2 )
3
Example: mx = 0.5, σx2 = 1 Z x2
Pr {x1 < x ≤ x2 } = fx (α)d α
x1

14 / 37 15 / 37
Random vectors Joint probability distribution function
Assume two random variables X and Y defined on a sample space S with the specific values Consider the events A = {X ≤ α} and B = {Y ≤ β}
x and y
The probability distribution functions
Any ordered pair (x , y ) may be considered a random point in the xy plane
Fx (α) = Pr {X ≤ α}
or, a vector random variable or a random vector
Fy (β) = Pr {Y ≤ β}
Note
New concept: the probability of the joint event {X ≤ α, Y ≤ β} is described by a joint
Although the random vector can contain several random variables, we first consider the two- probability distribution function
element case.
Fx ,y (α, β) = Pr {X ≤ α, Y ≤ β}

På norsk: simultan

16 / 37 17 / 37

Example conditional probability: coin flipping with Joint probability density function
uneven coins The joint probability density function (PDF)
Consider three different coins:
1 a coin with equal probability of H and T
∂ 2 Fx ,y (α, β)
fx ,y (α, β) =
2 a coin which is more likely to produce H ∂α∂β
3 a coin which is more likely to produce T
and the joint cumulative distribution function (CDF)
Flip the first coin (random variable X ) Z α Z β
If the first coin results in a H, then flip coin 2 (random variable Y ) Fx ,y (α, β) = fx ,y (u , v )dudv
−∞ −∞
If the first coin results in a T , then flip coin 3 (random variable Y )
The random variable Y is then statistically dependent on the random variable X
We specifically see that if X is H, it is more likely that Y also is H. And vice versa.

18 / 37 19 / 37
Properties of the joint density Statistical independence
Two random variables X and Y are said to be statistically independent if (and only if)
fx ,y (α, β) ≥ 0
Fx ,y (α, β) = Fx (α)Fy (β)
Z ∞ Z ∞
fx ,y (α, β)d αd β = 1 From the definition of the density function, this gives
−∞ −∞
Z ∞
fx ,y (α, β) = fx (α)fy (β)
fx (α) = fx ,y (α, v )dv
−∞ Statistical independence
Z ∞ The occurrence of one random event does not affect the probability of occurrence of the other
fy (β) = fx ,y (u , β)du random event.
−∞

20 / 37 21 / 37

Sums of independent random variables 1 Sums of independent random variables 2


Consider two random variables X and Y and their joint PDF fx ,y (α, β) The CDF is related to the probability through
Let Z be the sum of the two random variables
Fz (ζ) = Pr {z ≤ ζ} = Pr {x + y ≤ ζ}
Z =X +Y
A fundamental property of the joint PDF is that one can integrate over a region of interest to
obtain the probability x
What is the PDF of Z assuming that we know the joint PDF?
Fz (ζ) = fx ,y (α, β)d αd β
Strategy: Find the CDF Fz (ζ) first and then take the derivative Dζ

dFz (ζ) Now using Y = Z − X , or β = ζ − α


fz (ζ) =
dζ Z ∞ Z ζ−α

Finding the CDF is often a well defined problem, while finding the PDF might be more difficult Fz (ζ) = fx ,y (α, β)d β d α
α=−∞ β=−∞
Material inspired by John Buck from Umass Dartmouth on youtube and Introduction to Probability by
Grinstead and Snell
22 / 37 23 / 37
Sums of independent random variables 3 Sums of independent random variables 4
The PDF is the derivative Insert into Leibniz rule and consider each term.
∞ Z ζ−α  d d
a(ζ) = (−∞) = 0
Z
dFz (ζ) d
fz (ζ) = = fx ,y (α, β)d β d α dζ dζ
dζ α=−∞ dζ β=−∞ d d
b(ζ) = (ζ − α) = 1
In order to solve this, we turn to calcululs and find Leibniz rule: dζ dζ

Z b(x )
! Z b(x ) fx ,y (α, β) = 0
d d d ∂ ∂ζ
f (x , t )dt = f (x , b(x )) · b(x ) − f (x , a(x )) · a(x ) + f (x , t )dt
dx a(x ) dx dx a(x ) ∂x Hence, only the upper limit contributes. From Leibniz:
d
See wikipedia or a suitable book in calculus f (x , b(x )) · b(x ) = fx ,y (α, ζ − α) · 1
dx
Which gives Z ζ−α 
d
fx ,y (α, β)d β = fx ,y (α, ζ − α)
dζ β=−∞

24 / 37 25 / 37

Sums of independent random variables 5 The sum again using the characteristic function
The PDF of Z then becomes Let W be a random variable equal to the sum of two statistically independent random
Z ∞ variables X and Y
fz (ζ) = fx ,y (α, ζ − α)d α W =X +Y
α=−∞
Consider the characteristic function defined as
If X and Y are statistically independent, the joint PDF is the product of the marginals Z ∞
fx ,y (α, β) = fx (α)fy (β) Z ∞ Φx (ω) = E {ej ωx } = fx (u )ej ωu du
−∞
fz (ζ) = fx (α)fy (ζ − α)d α
α=−∞
Recap the expected value of a function of a random variable
This is recognized as a convolution fz (ζ) = fx (ζ) ∗ fy (ζ) Z ∞

PDF of the sum of two random variables E {g (x )} = g (α)fx (α)d α


−∞
The probability density function of a sum of two statistically independent random variables is
This is simply the Fourier transform of the PDF with the sign reversed
the convolution of the individual PDFs

26 / 37 27 / 37
The sum again using the characteristic function 2 Central limit theorem
The characteristic function of W becomes The Central limit theorem loosely defined states as follows:

Φw (ω) = E {ej ωw } = E {ej ω(x +y ) } Central limit theorem


j ωx j ωy
= E {e }E {e } The probability distribution function of a sum of a large number of statistically independent
= Φx (ω)Φy (ω) random variables approaches the normal probability distribution function

From the convolution property of the Fourier transform Equivalent for probability density functions

fw (α) = fx (α) ∗ fy (α) Variants and proof of the classical CLT found in Wikipedia (using the characteristic function)
Z ∞
Test on the computer/blackboard
= fx (u )fy (α − u )du
−∞

See any suitable textbook or wikipedia for the Properties of the Fourier Transform

28 / 37 29 / 37

Central limit theorem example: Uniform distribution Joint moments


When more than a single random variable is involved, expectation must be taken with respect
to all variables Z ∞Z ∞
E {g (x , y )} = g (α, β)fx ,y (α, β)d αd β
−∞ −∞
Joint moments: Z ∞ Z ∞
n k
mnk = E {x y } = αn β k fx ,y (α, β)d αd β
−∞ −∞

We see that mn0 = E {x n } and m0k = E {y k }


The sum n + k is called the order of the moment

30 / 37 31 / 37
Correlation Joint central moments
The second order moment m11 is called the correlation The joint central moments are defined
Z ∞ Z ∞
Rxy = E {xy } = αβ fx ,y (α, β)d αd β µnk = E {(x − x )n (y − y )k }
Z ∞Z ∞
−∞ −∞
= (α − x )n (β − y )k fx ,y (α, β)d αd β
If the correlation can be written −∞ −∞
Rxy = E {x }E {y }
where x = E {x } and y = E {y }
then x and y is said to be uncorrelated. The second order moments
Statistically independent random variables are uncorrelated
µ20 = E {(x − x )2 } = σx2
Easy to prove by inserting fx ,y (α, β) = fx (α)fy (β)
µ02 = E {(y − y )2 } = σy2
Not vice versa.
If the correlation is Rxy = 0 the two random variables are said to be orthogonal. are simply the variances of the random variables x and y

32 / 37 33 / 37

Covariance Correlation coefficient


The second order central moment µ11 is called the covariance The correlation coefficient is the normalised covariance

Cxy = E {(x − x )(y − y )} E {(x − x )(y − y )}


ρxy =
Z ∞ Z ∞ σx σy
= (α − x )(β − y )fx ,y (α, β)d αd β Rxy − xy
−∞ −∞ =
σx σy
By direct expansion, we get
Due to the normalisation, the correlation coefficient is bounded
Cxy = Rxy − xy = Rxy − E {x }E {y }
|ρxy | ≤ 1
If the random variables x and y are independent or uncorrelated: Rxy = E {x }E {y } and
Cxy = 0

34 / 37 35 / 37
Summary of lecture I På Norsk
Engelsk Norsk
random variable random vector probability sannsynlighet
probability joint probability distribution fordeling
density tetthet
probability distribution statistical dependence
expectation forventning
probability density function characteristic function standard deviation standardavvik
expectation central limit theorem variance varians
joint simultan
moments joint moments correlation korrelasjon
central moments correlation and covariance covariance kovarians
mean and variance uncorrelated and orthogonal https://folk.ntnu.no/bakke/ordliste.pdf
standard deviation correlation coefficient
normal/uniform distribution
36 / 37 37 / 37

Roy Edgar Hansen

IN5340 / IN9340 Lecture 1


Random variables, vectors and sequences

You might also like