STSnew

'
Gdansk University
of Technology
Random Processes –
Theory for the Practician
by
Maciej Niedźwiecki
Gdańsk University of Technology
Departament of Automatic Control
Gdańsk, Poland
Does God play dice ?
Czy Bóg gra w kości ?
“Quantum mechanics is certainly imposing. But an inner
voice tells me that it is not yet the real thing. The theory
says a lot, but does not really bring us any closer to the
secret of the ”old one.” I, at any rate, am convinced that
He does not throw dice.”
Albert Einstein
Yes, He does !
to get the modern view watch Richard Feynman’s
The Douglas Robb Memorial Lectures 1979
http://vega.org.uk/video/subseries/8
Feynmans theory states that the probability of an event is

determined by summing together all the possible histories of
that event. For example, for a particle moving from point
A to B we imagine the particle traveling every possible
path, curved paths, oscillating paths, squiggly paths, even
backward in time and forward in time paths. Each path has
an amplitude, and when summed the vast majority of all
these amplitudes add up to zero, and all that remains is the
comparably few histories that abide by the laws and forces
of nature. Sum over histories indicates the direction of our
ordinary clock time; it is simply a path in space which is
more probable than the more exotic directions time might
have taken otherwise.
Scalar random variables
Skalarne zmienne losowe
random variable / zmienna losowa
A random variable is a function X(ξ) that associates a

number with each outcome ξ of a certain experiment. For
each outcome ξ ∈ Ξ the probability P (ξ) is defined.
X : Ξ −→ ΩX ⊆ R, {X(ξ), ξ ∈ Ξ}
where:
Ξ - sample space / zbiór zdarzeń elementarnych
ξ - outcome / zdarzenie elementarne
ΩX - observation space / zbiór obserwacji
realization of a random variable

realizacja zmiennej losowej
x = X(ξ)
Ξ is the set of all different possibilities that could happen

ξ is one of different possibilities that did happen
discrete random variable – when ΩX takes finite or

countably infinite number of values
continuous random variable – when ΩX takes uncountably

infinite number of values
Cumulative distribution function (CDF)
Dystrybuanta
FX (x) = P (X ≤ x)
P (·), 0≤P ≤1
probability of an event / prawdopodobieństwo zdarzenia
Properties:
• FX (−∞) = 0, FX (∞) = 1
• 0 ≤ FX (x) ≤ 1
• if x1 < x2 then
P (x1 < X ≤ x2) = FX (x2) − FX (x1) ≥ 0

Probability density function (PDF)
Gȩstość rozkladu prawdopodobieństwa
P (x ≤ X < x + ǫ) dFX (x)

pX (x) = lim =
ǫ→0 ǫ dx
Properties:
• pX (x) ≥ 0
Z x
• FX (x) = pX (z)dz
−∞
Z ∞
• pX (x)dx = 1
−∞
Z x2
• P (x1 < X ≤ x2) = pX (z)dz
x1
Is it possible that
pX (x) > 1 ?
P (X = x) 6= 0 ?
Characteristics of random variables
Charakterystyki zmiennych losowych
mean (expected) value / wartość oczekiwana (średnia)

Z ∞ Z ∞
mX = E[X] = x dFX (x) = x pX (x)dx
−∞ −∞
median / mediana
αX = med[X] : P (X ≤ αX ) = P (X ≥ αX )
Z αX
1
= pX (x)dx =
−∞ 2
variance / wariancja
2
2

σX= var[X] = E (X − mX )
Z ∞ Z ∞
= (x − mX )2 dFX (x) = (x − mX )2 pX (x)dx
−∞ −∞
standard deviation / średnie odchylenie standardowe

p
σX = var[X]
n-th central moment / moment centralny n-tego rzȩdu

Z ∞
µnX n
= E [(X − mX ) ] = (x − mX )n pX (x)dx
−∞
coefficient of skewness / wspólczynnik asymetrii

3

E (X − mX ) µ3X
γX = 3 = 3
σX σX
for a symmetric distribution γX = 0
coefficient of kurtosis (excess kurtosis) / kurtoza
miara splaszczenia rozkladu (od gr. kurtos = rozdȩty)

4

E (X − mX ) µ4X
κX = 4 −3= 4 −3
σX σX
for a Gaussian distribution κX = 0

κX > 0 – leptocurtic distribution / rozklad leptokurtyczny
κX = 0 – mesocurtic distribution / rozklad mezokurtyczny
κX < 0 – platocurtic distribution / rozklad platokurtyczny
γX < 0 γX > 0
Probability density function with negative skew (left) and

positive skew (right).
κX = 3 κX = 0 κX = −1.5
Super-Gaussian (leptocurtic) [left], Gaussian [middle] and
sub-Gaussian (platycurtic) [right] probability density
functions.
Important random variables
Ważne zmienne losowe
uniform random variable / zmienna losowa o rozkladzie

równomiernym (jednostajnym)
X ∼ U (a, b), a < b

1
b−a a≤x<b
pX (x) =
0 elsewhere

 0 x<a
x−a
FX (x) = b−a a≤x<b

1 x≥b
a+b 2 (b−a)2
mX = 2 , σX = 12
MATLAB : U (0, 1) = rand
Probability density function (left) and cumulative

distribution function (right) of a uniform random variable.
Gaussian (normal) random variable / zmienna losowa o

rozkladzie gaussowskim (normalnym)
2 2
X ∼ N (mX , σX ), σX >0
2

1 (x − mX )
pX (x) = p exp − 2
2πσX 2 2σX
Z x
1 (z − mX )2
FX (x) = p exp − 2 dz
−∞
2
2πσX 2σ X

x − mX
=Φ
σX
√ Rx
where Φ(x) = (1/ 2π) −∞ exp{−z 2/2}dz
MATLAB : N (0, 1) = randn

distribution function (right) of a Gaussian random variable.
Central limit theorem (CLT)
Centralne twierdzenie graniczne
THEOREM
Let Xi, i = 1, 2, . . ., be a sequence of independent random

variables with identical (any!) distribution, mean mX and
2
variance σX < ∞. Define a new random variable Zn:
n
1 X Xi − mX
Zn = √
n i=1 σX
In the limits as n approaches infinity, the random variable

Zn converges in distribution to a standard normal random
variable N (0, 1):
n
!
1 X Xi − mX
lim pZn √ = N (0, 1)
n→∞ n i=1
σX
CLT explains why the Gaussian distribution is of such a

great importance, and why it occurs so frequently
The central limit theorem (CLT)
PDF of the sum of independent uniform random variables:

n = 2, 3, 4, 5.
Laplace random variable / zmienna losowa o rozkladzie

Laplace’a (dwustronnym wykladniczym)
X ∼ L(a, b), b > 0

a – location parameter, b – scale parameter

1 |x − a|
pX (x) = exp −
2b b
1 x

FX (x) = 2 exp b x < 0
1 − 12 exp xb x≥0
2
mX = a , σX = b2/2
Y ∼ U (− 12 , 21 ) ⇒ X = sgn[Y ] ln(1 − 2|Y |) ∼ L(0, 1)

distribution function (right) of a Laplace random variable.
Cauchy random variable / zmienna losowa o rozkladzie

Cauchy’ego
X ∼ C(a, b), b > 0
a – location parameter, b – scale parameter
b/π
pX (x) =
b2 + (x − a)2

1 1 x−a
FX (x) = + tan−1
2 π b
2
mX =? , σX =?
Y ∼ N (0, 1), Z ∼ N (0, 1) ⇒ X = Y /Z ∼ C(0, 1)

distribution function (right) of a Cauchy random variable.
Cauchy random variables
The Cauchy distribution is an example of a distribution
which has no mean, variance or higher moments defined.
Its median is well defined and equal to a.
standard Cauchy distribution / standardowy rozklad

Cauchy’ego
1
X ∼ C(0, 1) : pX (x) =
π(1 + x2)
mean value:
Z ∞ Z ∞
x pX (x)dx = x pX (x)dx
−∞ 0
Z 0
− |x| pX (x)dx = ∞ − ∞
−∞
note that due to symmetry of pX (x)

Z a Z a
lim x pX (x)dx 6= lim x pX (x)dx
a→∞ −a a→∞ −2a
variance:
Z ∞ 2 Z ∞
1 x 1
E[X 2] = 2
dx = dx − 1 = ∞
π −∞ 1 + x π −∞
Pairs of random variables
Pary zmiennych losowych
X(ξ) ←→ Y (ξ)
joint cumulative distribution function / la̧czna dystrybuanta
FXY (x, y) = P (X ≤ x, Y ≤ y)
Properties:
• FXY (−∞, −∞) = 0
• FXY (−∞, y) = FXY (x, −∞) = 0
• FXY (∞, ∞) = 1
• 0 ≤ FXY (x, y) ≤ 1
• FXY (x, ∞) = FX (x)
• FXY (∞, y) = FY (y)
• if x1 < x2 and y1 < y2 then
P (x1 < X ≤ x2, y1 < Y ≤ y2)

= FXY (x2, y2) − FXY (x1, y2)
− FXY (x2, y1) + FXY (x1, y1) ≥ 0
joint probability density function / funkcja la̧cznej gȩstości
prawdopodobieństwa
pXY (x, y)
P (x ≤ X < x + ǫx, y ≤ Y < y + ǫy )
= lim
ǫx →0,ǫy →0 ǫx ǫy
∂2
= FXY (x, y)
∂x∂y
Joint probability density function two of random variables.

Properties:
• pXY (x, y) ≥ 0
Z x Z y
• FXY (x, y) = pXY (x, y)dxdy
−∞ −∞
Z ∞ Z ∞
• pXY (x, y)dxdy = 1
−∞ −∞
• if x1 < x2 and y1 < y2 then
P (x1 < X ≤ x2, y1 < Y ≤ y2)

Z x2 Z y 2
= pXY (x, y)dxdy
x1 y1
marginal distributions / rozklady brzegowe

Z ∞
pXY (x, y)dy = pX (x)
−∞
Z ∞
pXY (x, y)dx = pY (y)
−∞
Joint distribution of two random variables and their

marginal distributions
note: there exist many different joint distributions that

produce the same marginal distributions
conditional probability density function / warunkowa

gȩstość rozkladu prawdopodobieństwa
X|Y (ξ) = y
pXY (x, y)
pX|Y (x|y) =
pY (y)
Joint distribution of random variables X and Y (top) and

two conditional distributions: pY |X (y|x = 1.27) (bottom
left) and pX|Y (x|y = −0.37) (bottom right).
correlation between random variables / wspólczynnik
korelacji zmiennych losowych
Z ∞Z ∞
rXY = E[XY ] = xy pXY (x, y)dxdy
−∞ −∞
covariance between random variables / wspólczynnik

kowariancji zmiennych losowych
cXY = E[(X − mX )(Y − mY )]

Z ∞Z ∞
= (x − mX )(y − mY ) pXY (x, y)dxdy
−∞ −∞
= rXY − mX mY
correlation coefficient of two random variables /

unormowany wspólczynnik kowariancji dwóch zmiennych
losowych
normalized random variables / unormowane zmienne

losowe
e = X − mX , Ye = Y − mY
X
σX σY
2 2
mXe = mYe = 0 , σX
e = σY
e = 1
e Ye ] = cXY
ρXY = E[X
σX σY
−1 ≤ ρXY ≤ 1
THEOREM
The correlation coefficient is less than 1 in magnitude.
PROOF
Consider taking the second moment of X + aY , where a is

a real constant:
E[(X + aY )2] = E[X 2] + 2aE[XY ] + a2E[Y 2] ≥ 0.
Since this is true for any a, we can tighten the bound by

choosing the value of a that minimizes the left-hand side.
This value turns out to be a = −E[XY ]/E[Y 2]. Plugging
in this value gives
2 (E[XY ])2 (E[XY ])2

E[X ] + −2
E[Y 2] E[Y 2]
2
(E[XY ])
= E[X 2] − ≥ 0.
E[Y 2]
After replacing X with X − mX and Y with Y − mY , one

arrives at
ρ2XY ≤ 1
which is the desired result.

orthogonal random variables / ortogonalne zmienne losowe
rXY = 0
uncorrelated random variables / nieskorelowane zmienne
losowe
cXY = 0
independent random variables / niezależne zmienne losowe

FXY (x, y) = FX (x)FY (y), pXY (x, y) = pX (x)pY (y)
note: independent random variables are always

uncorrelated but the converse is not true.
EXAMPLE (independence versus uncorrelatedness)
Case 1
Consider random variables X and Y that are uniformly
distributed on the square defined by 0 ≤ x, y ≤ 1, i.e.,

1 0 ≤ x, y ≤ 1
pXY (x, y) =
0 elsewhere
The marginal probability density functions of X and Y turn

out to be

1 0≤x≤1 1 0≤y≤1
pX (x) = pY (y) =
0 elsewhere 0 elsewhere
These random variables are statistically independent since

pXY (x, y) = pX (x)pY (y).
Case 2
Consider random variables X and Y that are uniformly
distributed over the unit circle, i.e.,
1 2 2
π x + y ≤1
pXY (x, y) =
0 elsewhere
The marginal probability density function of X can be found

as follows
Z ∞ Z √1−x2
1
pX (x) = pXY (x, y)dy = √ dy
−∞ π − 1−x2
2p
= 1 − x2, −1 ≤ x ≤ 1
π
By symmetry, the marginal PDF of Y must take on the
same functional form. Hence
4p
pX (x)pY (y) = 2 (1 − x2)(1 − y 2) 6= pXY (x, y)
π
The correlation between X and Y is
Z
xy
E[XY ] = dxdy
2 2
x +y ≤1 π
 √ 
Z 1 Z 1−x2
1
= x √ ydy  dx = 0
π −1 − 1−x2
Hence X and Y are uncorrelated but not independent !

Jointly Gaussian random variables
Zmienne losowe o la̧cznym rozkladzie
gaussowskim
1
pXY (x, y) = p ×
2πσX σY 1 − ρXY 2
 2 2 

 x−m X
− 2ρ x−mX y−mY
+ y−mY 

σX XY σX σY σY
exp −

 2(1 − ρ2XY ) 

marginal distributions of a bivariate Gaussian distribution

are also Gaussian
Z ∞ 2

1 (x − mX )
pX (x) = pXY (x, y)dy = p exp − 2
−∞ 2πσX 2 2σX
Z ∞ 2

1 (y − mY )
pY (y) = pXY (x, y)dx = p exp −
−∞ 2πσY 2 2σY2
Probability density function of two jointly Gaussian

random variables.
equidensity contours / izolinie rozkladu
2
x − mX x − mX y − mY
− 2ρXY
σX σX σY
2
y − mY
+ = 1 − ρ2XY
σY
covariance ellipse / elipsa kowariancji
Contours of the bivariate Gaussian PDF corresponding to

the positive (left) and negative (right) correlation
coefficient ρXY .
note: for mX = mY = 0 and ρXY = 0, one obtains

x2 y2
2 + 2 =1
σX σY
covariance ellipse is a counterpart of the interval

[mX − σX , mX + σX ] for a one-dimensional Gaussian
distribution
2
if X ∼ N (mX , σX ) then
P (X ∈ [mX − σX , mX + σX ]) = 0.68
2
if (X, Y ) ∼ N (mX , mY ; σX , σY2 , ρXY ) then
P [(X, Y ) ∈ DXY ] = 0.68
where DXY denotes the two-dimensional region bounded

by the covariance ellipse
conditional distribution / rozklad warunkowy
2
[X|Y (ξ) = y] ∼ N (mX|Y , σX|Y )
σX
mX|Y = mX + ρXY (y − mY )
σY
2
σX|Y = (1 − ρ2XY )σX
2
note: two uncorrelated jointly Gaussian variables are
independent
if ρXY = 0 then
 2 2 

 x−mX
+ y−mY 

1 σX σY
pXY (x, y) = exp −
2πσX σY 
 2 


1 (x − mX )2
=p exp − 2
2
2πσX 2σX
2

1 (y − mY )
×p exp −
2πσY2 2σY2
= pX (x)pY (y)
note: two normally distributed random variables need not

be jointly normal – normally distributed and uncorrelated
does not imply independent
EXAMPLE
Let X ∼ N (0, 1). Define Y in the form

−X if |X| < c
Y = c>0
X if |X| ≥ c
Note that
1. Y is Gaussian – Y ∼ N (0, 1) – because
P (Y ≤ x) =
P ({|X| < c and − X < x} or {|X| ≥ c and X < x})
= P (|X| < c and − X < x) + P (|X| ≥ c and X < x)
= P (|X| < c and X < x) + P (|X| ≥ c and X < x)
= P (X < x)
This follows from the symmetry of the distribution of X

and the symmetry of the condition that |X| < c.
2. Since X completely determines Y , X and Y are not

independent.
3. There exists such c > 0 for which X and Y are

uncorrelated. If c is very small, then the correlation
cXY is close to 1. If c is very large, then cXY is close
to -1. Since the correlation is a continuous function of
c, the intermediate value theorem implies there is some
particular value of c that makes the correlation 0. That
value is approximately 1.54.
Estimation of characteristics of
a random variable
Estymacja charakterystyk
zmiennych losowych
Consider a sequence of independent random variables

X1, . . . , Xn and their realizations x1, . . . , xn. Suppose that
all random variables have the same distribution FX (x, θ),
where θ is an unknown parameter (e.g. mean value,
variance, kurtosis etc. - assuming that the corresponding
characteristic is well-defined).
How can we estimate θ based on Xn = {x1, . . . , xn} ?
b
θ(n) b n)
= θ(X
In statistics, an estimator (estymator ) is a rule (przepis)

b for calculating an estimate (estymata) θ(X
θ(·) b n) of a given
quantity θ based on observed data Xn.
Since the available data set Xn consists of realizations of

random variables, the estimator itself is also a random
variable.
Desired properties of an estimator
Poża̧dane wlasności estymatora
Unbiasedness / Brak obcia̧żenia
The estimator θ(n) b is called unbiased (estymator

nieobcia̧żony ) if its expected value is equal to the true
value h i
b
EXn θ(n) =θ
The quantity
h i
b
B[θ(n)] b −θ
= EXn θ(n)
is called the estimation bias (obcia̧żenie estymacji).

b
When it holds that limn→∞ B[θ(n)] = 0, the estimator is
called asymptotically unbiased (estymator asymptotycznie
nieobcia̧żony ).
Consistency / Zgodność
b
The estimator θ(n) is called consistent (estymator zgodny )
if it converges, in a stochastic sense (e.g. with probability
one), to the true parameter value as the number of data
points increases to infinity
b
lim θ(n) = θ w.p.1
n→∞
Stochastic convergence modes
Typy zbieżności stochastycznej
Convergence in probability
Zbieżność wedlug prawdopodobieństwa

lim P |θ(n)b − θ| > ε = 0
n→∞
Convergence in the mean square sense

Zbieżność średniokwadratowa
h i
lim EXn (θ(n) b − θ) = 0
2
n→∞
Convergence with probability 1 (almost sure)

Zbieżność z prawdopodobieństwem 1 (prawie na pewno)

b
P lim θ(n) =θ =1
n→∞
Interpretation: the probability of finding a realization Xn

b
for which θ(n) does not converge to θ is zero.
Mean square convergence and almost sure convergence

imply convergence in probability. The converse is not true.
Mean square convergence and almost sure convergence
cannot be directly related.
Estimation of the mean and variance
Estymacja wartości oczekiwanej i wariancji
Mean / Wartość oczekiwana
n
1X
b X (n) =
m xi
n i=1
Note that " #

n
X n
1 1X
b X (n)] = E
EXn [m Xi = E[Xi] = mX
n i=1
n i=1
b X (n) is unbiased.
which means that the estimator m
Variance / Wariancja
 1
Pn 2
 n i=1(xi − mX ) if mX is known
2
bX
σ (n) =
 1 Pn 2
n−1 i=1 [xi − b
m X (n)] otherwise
Note that
2
EXn σbX (n)
" n # n
1X 1 X
=E (Xi − mX )2 = E[(Xi − mX )2] = σX
2
n i=1 n i=1
which means that in the case where the mean value mX

2
bX
is known the estimator σ (n) is unbiased. Show that the
same holds true when the value of mX is unknown.
Estimation of cumulative
distribution function
Estymacja dystrybuanty
FX (x) = P (X ≤ x)

 1 if Xi ≤ x
Yi(x) =

0 if Xi > x
n
b 1X
FX (x) = Yi(x)
n i=1
Estimate of the CFD of a uniform random variable

obtained from n IID random variables
Xi ∼ U (0, 1), i = 1, . . . , n.
Independent component analysis (ICA)
Analiza skladowych niezależnych
Consider two independent and non-Gaussian random

variables X1 and X2. Suppose we can observe realizations
of two other random variables that are mixtures of X1 and
X2 :
Y1 = a11X1 + a12X2
Y2 = a21X1 + a22X2
but we do not know the values of the mixing coefficients

a11, a12, a21, a22.
Given y1 = Y1(ξ) and y2 = Y2(ξ) can we find out

x1 = X1(ξ) and x2 = X2(ξ)?
Real-world example: If two people speak at the same time

in a room containing two microphones, the output of each
microphone is a mixture of two voice signals. Given these
two signal mixtures can we recover the two original voices
or source signals?
blind source separation / ślepa separacja sygnalów
−1
x1 a11 a12 y1
=
x2 a21 a22 y2
But how would we know a11, a12, a21, a22 ?

What results can ICA provide ?
input signals (mixtures)
output signals (sources)

Limitations:
• We cannot determine the variances (energies) of the

independent components.
• We cannot determine the order of the independent

components.
What makes it possible ?
Jak to możliwe ?
sources:
X1 ∼ U (−0.5, 0.5), X2 ∼ U (−0.5, 0.5)
2 2
mX1 = mX2 = 0, σX 1
= σX 2
= 1/12, cX1X2 = 0
mixtures:
Y1 = √1 X1 + √12 X2 , Y2 = √1 X1 − √12 X2
2 2
mY1 = mY2 = 0, σY2 1 = σY2 2 = 1/12, cY1Y2 = 0
One of the possible approaches to ICA
Consider n > 1 sources and n mixtures
n
X
Yi = aij Xj , i = 1, . . . , n
j=1
Two-step procedure:
• STEP 1: Prewhite the available measurement data

- the new random variables Ye1, . . . , Yen should be
(approximately) mutually uncorrelated
Pn
yei = j=1 bij (yj − mYj ), var[Yei] = 1, i = 1, . . . , n
cYeif
Yj
= 0, i 6= j
• STEP 2: Recover source signals using the relationship

n
X
xbi = b
cij yej , i = 1, . . . , n
j=1
where {b cin} are the coefficients chosen

ci1, . . . , b
(iteratively) so as to maximize non-Gaussianity of
c Pn
Xi = j=1 b cij Yej :
 
X n

{b cin} = arg max κX
ci1, . . . , b ci
 cij yej

ci1 ,...,cin
j=1
FastICA algorithm
Picture guide to ICA
joint distribution of input signals (mixtures)
joint distribution of whitened mixtures
joint distribution of output signals (sources)

Picture guide to ICA
joint distribution of a mixture of Laplace variables
joint distribution of independent Laplace variables

Example of ICA application
- fetal hart monitoring
The magnetic field generated by electrical activity in the

heart is measured at 37 different locations using fetal
megnetocardiography (FMCG). Each measured signal is a
different mixture of fetal and maternal cardiac signals.
Example of ICA application
- fetal hart monitoring
Detail of 7 out of 37 measured signals
Separated fetal and maternal cardiac signals

Vector random variables
Wektorowe zmienne losowe
 
X1
X =  ..  , x = X(ξ)
Xn n×1
mean (expected) value / wartość oczekiwana (średnia)

 
mX1
mX = E[X] =  .. 
mXn n×1
covariance matrix / covariance matrix

T

ΣX = cov[X] = E (X − mX )(X − mX )
 2 
σ1 σ12 . . . σ1n
 σ21 σ22 . . . σ2n 
= .. ... .. 

σn1 σn2 . . . σn2 n×n
where σij = E[(Xi − mXi )(Xj − mXj )] = cXiXj

and σi2 = var[Xi] = cXiXi
note: covariance matrix in nonnegative definite
ΣX ≥ 0 ⇐⇒ yTΣX y ≥ 0, ∀y
covariance ellipsoid / elipsoida kowariancji
(x − mX )TΣ−1
X (x − mX ) = 1
• ellipsoid centered at mX
• the directions of the principal axes are given by the

eigenvectors of the covariance matrix ΣX
• the squared lengths of the semi-principal axes are given

by the corresponding eigenvalues
Covariance ellipse (n = 2).

partial ordering of covariance matrices
? ? ? ?
ΣX <ΣY , ΣX ≤ΣY , ΣX >ΣY , ΣX ≥ΣY
T −1

D X = x : x ΣX x ≤ 1
T −1

D Y = y : y ΣY y ≤ 1
ΣX ≤ ΣY ⇐⇒ ΣY − ΣX ≥ 0 ⇐⇒ DX ⊆ DY
covariance ellipsoid of ΣX is located inside

(possibly touching) the covariance ellipsoid of ΣY
singular distributions / rozklady osobliwe

singular distribution is a probability distribution
concentrated on an uncountable set ΩX ⊂ Rn
of Lebesque measure 0
• distribution defined on a line in a

two-dimensional space
• distribution defined on a circle in a

two-dimensional space
• distribution defined on a plane in a

three-dimensional space
• distribution defined on a sphere in a

three-dimensional space
multivariate Gaussian (normal) distribution /

wielowymiarowy rozklad gaussowski (normalny)
X ∼ N (mX , ΣX ), ΣX > 0
1
pX (x) = p
(2π)n|ΣX |

1
× exp − (x − mX )TΣ−1
X (x − mX )
2
|ΣX | = 0: singular Gaussian distribution
note: the equidensity contours of a nonsingular

multivariate Gaussian distribution are ellipsoids
centered at the mean

P [X − mX ] T
Σ−1
X [X − mX ] ≤ 1 = 0.68
conditional distribution / rozklad warunkowy

X n
Z= ∼ N (mZ , ΣZ )
Y m

mX ΣX ΣXY
mZ = , ΣZ = >0
mY ΣY X ΣY
T

ΣXY = E (X − mX )(Y − mY ) = ΣT
YX
cross-covariance matrix between

random variables X and Y
[X|Y(ξ) = y)] ∼ N (mX|Y , ΣX|Y )

mX|Y = mX + ΣXY Σ−1
Y (y − mY )
ΣX|Y = ΣX − ΣXY Σ−1

Y ΣY X
higher-order moments / momenty wyższych rzȩdów

higher-order moments of jointly Gaussian variables can be
expressed in terms of the first order and second order
moments
EXAMPLE
Consider 4 zero-mean and jointly normally distributed
random variables X1, . . . , X4. It holds that
E[X1X2X3X4] = E[X1X2]E[X3X4] + E[X1X3]E[X2X4]

+ E[X1X4]E[X2X3] = σ12σ34 + σ13σ24 + σ14σ23
Suppose that W is a symmetric matrix. Show that

E[XXTWXXT] = ΣX tr{WΣX } + 2ΣX WΣX .
Linear transformations of vector random
variables
Liniowe przeksztalcenia wektorowych
zmiennych losowych
Xn×1, Ym×1
x = X(ξ), y = Y(ξ)
Suppose that
Y = AX + b, Am×n, bm×1
y = Ax + b
then
E[Y] = AE[X] + b
=⇒ mY = AmX + b
Y − mY = A(X − mX )
and
cov[Y] = E[A(X − mX )(X − mX )TAT]

=⇒ ΣY = AΣX AT
note: if X is Gaussian then Y is also Gaussian

Project 1
Record and mix signals obtained from two independent

speech sources (30 second long recordings, 2 linear mixtures
with different mixing coefficients).
Perform blind source separation using the FastICA
algorithm:
1. Center the data to make its mean zero.
2. Whiten the data to obtain uncorrelated mixture signals.
3. Initialize w0 = [w1,0, w2,0]T, ||w0|| = 1 (e.g. randomly)
4. Perform an iteration of a one-source extraction algorithm
n T 3 o
e i = avg z(t) wi−1z(t)
w − 3wi−1
where z(t) = [z1(t), z2(t)]T denotes the vector of whitened

mixture signals and avg(·) denotes time averaging
N
1 X
avg{x(t)} = x(t)
N t=1
e i by dividing it by its norm

5. Normalize w
ei
w
wi =
e i||
||w
6. Determine a unit-norm vector vi orthogonal to wi.
wiTvi = 0, ||vi|| = 1
7. Display and listen to the current results of source

separation

b1(t)
x wiT z1(t)
= , t = 1, . . . , N
b2(t)
x viT z2(t)
8. If wiTwi−1 is not close enough to 1, go back to step 4.

Otherwise stop.
Literature: A. Hyvärinen, E. Oja. “A fast fixed-point

algorithm for independent component analysis”, Neural
Computation, vol. 9, pp. 1483-1492, 1997.
To obtain the highest grade (5), a procedure that extracts

n > 2 source signals from n mixtures must be implemented.
Facts about matrices
Fact 1
Given a square matrix An×n, an eigenvalue λ and its

associated eigenvector v are a pair obeying the relation
Av = λv
The eigenvalues λ1, . . . , λn of A are the roots of the

characteristic polynomial of A
det(A − λI)
For a positive definite matrix A it holds that λ1, . . . , λn > 0,

i.e., all eigenvalues are positive real.
Fact 2
Let A be a matrix with linearly independent eigenvectors

v1, . . . , vn. Then A can be factorized as follows
A = VΛV−1
where Λn×n = diag{λ1, . . . , λn} and Vn×n = [v1| . . . |vn].

Fact 3
Let A = AT be a real symmetric matrix. Such matrix has

n linearly independent real eigenvectors. Moreover, these
eigenvectors can be chosen such that they are orthogonal
to each other and have norm one
wiTwj = 0, ∀i 6= j
wiTwi = ||wi||2 = 1, ∀i
A real symmetrix matrix A can be decomposed as
A = WΛWT
where W is an orthogonal matrix: Wn×n = [w1| . . . |wn],

WWT = WTW = I.
The eigenvectors wi can be obtained by normalizing the

eigenvectors vi
vi
wi = , i = 1, . . . , n
||vi||
Fact 4
Consider a zero-mean vector random variable X with
covariance matrix
ΣX = E[XXT] = WΛWT
where Λ is a diagonal matrix made up of eigenvalues of ΣX

and W is an orthogonal matrix.
Let
Y = BX = WΛ−1/2WTX
Then it holds that

ΣY = I
i.e., Y is the vector random variable made up of uncorrelated

components.
E[YY T] = E[BXXTBT] = BE[XXT]BT

= WΛ−1/2WTE[XXT]WΛ−1/2WT
= WΛ−1/2WTWΛWTWΛ−1/2WT
= WΛ−1/2ΛΛ−1/2WT = WWT = I
How to estimate the covariance matrix ΣX
Jak estymować macierz kowariancji ΣX
Denote by x1, . . . , xn realizations of a zero-mean vector

random variable X. The estimate of ΣX can be obtained
from
X n
b X (n) = 1
Σ xi xT
i
n i=1
Discrete-time random (stochastic) processes
Procesy losowe (stochastyczne) z czasem
dyskretnym
{X(t, ξ), ξ ∈ Ξ, t ∈ T }, T = {. . . , −1, 0, 1, . . .}
X(t, ξ) = Xc(tc = tTs, ξ)
where t denotes the normalized (dimensionless) discrete

time – the number of sampling intervals Ts – and Xc(tc, ξ)
denotes a continuous-time random process
x(t) = X(t, ξ)
realization of a random process / realizacja procesu

losowego
Two interpretations of a random process
• a sequence of random variables
• an ensemble (collection, family) of realizations,

i.e., functions of time
Examples of random processes
Przyklady procesów losowych
Example 1: first-order autoregressive process
x(t) = ax(t − 1) + n(t)
where {n(t)} denotes a sequence of zero-mean uncorrelated

random variables with variance σn2 .
Example 2: random sinusoidal signal
x(t) = A sin(ωt + ϕ)
where ϕ, denoting initial phase, is a random variable with

uniform distribution over [0, 2π): ϕ ∼ U (0, 2π).
note: this is an example of a strictly deterministic
(singular) random process, perfectly predictable
from its past
Example 3: pseudo-random binary signal (PRBS)
x(t) ∈ {−A, A}, S1 : −A, S2 : A
transition probabilities:
P11 = 0.8, P12 = 0.2, P22 = 0.6, P21 = 0.4
Electroencephalographic (EEG) signals
Sygnaly elektroencefalograficzne (EEG)
H. Berger, “Uber das Elektroenzephalogramm des

Menschen”, Arch. Psychiatr. Nervenkrankheiten, vol. 87,
pp. 527–570, 1929.
Localization of electrodes on the patient’s scull
EEG signals carry information about the state of the brain

and its reaction to internal and external stimuli. Despite
advances made recently in the area of computer-aided
tomography (CAT), EEG remains a practically useful tool for
studying the functional states of the brain (e.g. sleep-stage
analysis) and for diagnosing functional brain disturbances
(e.g. epilepsy).
Electroencephalographic (EEG) signals
Example of normal (upper plots) and pathological

(lower plots) EEG recordings
According to the traditional clinical nomenclature
introduced in a pioneering work of Berger, EEG signals
are described and classified in terms of several rhythmic
activities called alpha (8–13 Hz), beta (14–30 Hz), delta
(1–3 Hz) and theta (4–7 Hz). The normal EEG of an adult
in a state called relaxed wakefulness is usually dominated by
rhythmic alpha activity and a less pronounced beta activity.
The contribution of delta and theta activities is arrhythmic
and much smaller.
Frequency analysis of EEG signals
Analiza czȩstotliwościowa sygnalów EEG
Spectral density function of a typical

EEG recording
Speech signals
Sygnaly mowy
Vocabulary Template
hello
yes
no
bye
A simple dictionary of speech templates for speech

recognition
Variation in speech templates for different speakers

Description of a random processes
Opis procesów losowych
To fully describe a random process, one should specify

all finite-dimensional (n ∈ N ) joint distributions of
random variables X(t1), . . . , X(tn) for all possible values
of t1, . . . , tn ∈ T .
n-dimensional cumulative distribution function
F (x1, . . . , xn; t1, . . . , tn)

= P (X(t1) < x1, . . . , X(tn) < xn)
n-dimensional probability density function
p(x1, . . . , xn; t1, . . . , tn)

∂ nF (x1, . . . , xn; t1, . . . , tn)
=
∂x1 · · · ∂xn
(assuming that all derivatives exist)

First-order and second-order characteristics
Charakterystyki pierwszego i drugiego rzȩdu
first-order characteristics: X(t)

first-order cumulative distribution and probability density
functions (characterize the “amplitude” distribution)
F (x; t), p(x; t)
mean value / wartość oczekiwana

Z ∞
mX (t) = E[X(t)] = x p(x; t)dx
−∞
second-order characteristics: X(t1), X(t2)

second-order cumulative distribution and probability
density functions (characterize the joint “amplitude”
distribution)
F (x1, x2; t1, t2), p(x1, x2; t1, t2)
autocorrelation function / funkcja autokorelacji
RX (t1, t2) = E[X(t1)X(t2)]

Z ∞Z ∞
= x1x2 p(x1, x2; t1, t2)dx1dx2
−∞ −∞
First-order and second-order characteristics
autocovariance function / funkcja autokowariancji
CX (t1, t2) = E{[X(t1) − mX (t1)][X(t2) − mX (t2)]}

= RX (t1, t2) − mX (t1)mX (t2)
variance / wariancja
2
σX (t) = var[X(t)] = E{[X(t) − mX (t)]2}
Z ∞
= [x(t) − mX (t)]2p(x; t)dx = CX (t, t)
−∞
normalized autocovariance function / unormowana funkcja

autokowariancji
CX (t1, t2)
ρX (t1, t2) =
σX (t1)σX (t2)
−1 ≤ ρX (t1, t2) ≤ 1
Stationary random processes
Stacjonarne procesy losowe
Qualitative (but not precise) description: a random process

is said to be stationary (time stationary) if its density
functions are invariant under a translation of time
strict sense stationarity / stacjonarność w ścislym sensie
A discrete-time random process X(t) is strict sense
stationary if the joint distribution of random variables
X(t1), . . . , X(tn) is identical with the joint distribution of
random variables X(t1 + ∆t), . . . , X(tn + ∆t) for any
values of n ∈ N , ∆t ∈ T and t1, . . . , tn ∈ T :
p(x1, . . . , xn; t1, . . . , tn)

= p(x1, . . . , xn; t1 + ∆t, . . . , tn + ∆t)
consequences of invariance to a time shift
p(x; t) = p(x; t + ∆t)

=⇒ p(x; t) = p(x)
p(x1, x2; t1, t2) = p(x1, x2; t1 + ∆t, t2 + ∆t)

=⇒ p(x1, x2 ; t1, t2) = p(x1, x2; τ )
where τ = t2 − t1.
Stationary random processes
using the first property one obtains
mX (t) = mX
using the second property one obtains
RX (t1, t2) = RX (τ )
CX (t1, t2) = CX (τ )
2 2
σX (t) = CX (0) = σX
wide sense stationarity / stacjonarność w szerokim sensie
A discrete-time random process X(t) is wide sense

stationary if its mean function is constant and its
autocorrelation function is a function only of τ
mX (t) = mX , RX (t1, t2) = RX (τ )
note: all strict sense stationary random processes are also

wide sense stationary, provided that their mean and
autocorrelation functions exist. The converse is not true.
Ergodic processes
Procesy ergodyczne
Qualitative (but not precise) description: a random process
is said to be ergodic if its statistical properties (such as its
mean and autocorrelation function) can be deduced from a
single, sufficiently long sample (realization) of the process.
One can discuss the ergodicity of various properties of a
random process.
mean-square ergodicity in the first-order moment /
ergodyczność, w sensie średniokwadratowym, wzglȩdem
wartości oczkiwanej
N
1 X
mX = E[X(t)] = lim X(t, ξi)
N →∞ N
i=1
ensemble average
T
1X
b X (T, ξ) =
m X(t, ξ)
T t=1
time average
What are the conditions under which

2

lim E [m b X (T, ξ) − mX ] = 0 ?
T →∞
Ergodic processes
THEOREM
The necessary and sufficient condition of a mean-square
ergodicity in the first moment of a wide sense stationary
process X(t) has the form
T T
1 XX
lim 2 CX (t2 − t1) = 0
T →∞ T
t =1 t =1
1 2
PROOF
" #2
 1X
T 
2
b X (T ) − mX ]
E [m =E X(t) − mX
 T 
t=1
 
1 X T X T 
=E [X(t1) − mX ][X(t2) − mX ]
T 2 
t =1 t =1
1 2
T T
1 XX
= 2 E {[X(t1) − mX ][X(t2) − mX ]}
T t =1 t =1
1 2
T T T T
1 XX 1 XX
= 2 CX (t1, t2) = 2 CX (t2 − t1)
T t =1 t =1 T t =1 t =1
1 2 1 2
sufficient condition
lim |CX (τ )| = 0
τ →∞
Ergodic processes
mean-square ergodicity in the second-order moments /
ergodyczność, w sensie średniokwadratowym, wzglȩdem
momentów drugiego rzȩdu
RX (τ ) = E[X(t)X(t + τ )]
N
1 X
= lim X(t; ξi)X(t + τ ; ξi)
N →∞ N
i=1
XT
bX (τ, T, ξ) = 1
R X(t, ξ)X(t + τ, ξ)
T t=1
What are the conditions under which

n o
lim E [R bX (τ, T, ξ) − RX (τ )]2 = 0 ?
T →∞
for Gaussian processes the sufficient condition is
lim |CX (τ )| = 0
τ →∞
note: a process which is ergodic in the first and second

moments is sometimes called ergodic in the wide sense
example of a process that is not ergodic:
X(t, ξ) = A(ξ), A ∼ U [0, 1]
Properties of the autocorrelation function
Wlasności funkcji autokorelacji
The autocorrelation function of a wide sense stationary

random process X(t)
• is an even function, that is, RX (τ ) = RX (−τ ), ∀τ
• is maximum at the origin, that is, |RX (τ )| ≤ RX (0), ∀τ
• is a nonnegative definite function, that is, for any n ∈ N ,

any selection of t1, . . . , tn ∈ T , and any selection of
z1, . . . , zn ∈ R, it holds that
n X
X n
zizj RX (tj − ti) ≥ 0
i=1 j=1
which is equivalent to
F[RX (τ )] ≥ 0
where F[·] denotes discrete Fourier transform (DFT)
note: the nonnegative definiteness condition

is equivalent to R = [RX (tj − ti)]n×n ≥ 0
RX (τ ) = E[X(t)X(t + τ )] = E[X(t + τ )X(t)] = RX (−τ )
E[(X(t) − X(t + τ ))2 ] = E[X 2(t)] + E[X 2(t + τ )]

− 2E[X(t)X(t − τ )] = 2RX (0) − 2RX (τ ) ≥ 0
   
X(t1) z1
X= .. , z =  .. 
X(tn) zn
R = E[XXT]
 
RX (t1 − t1) RX (t2 − t1) . . . RX (tn − t1)
 RX (t1 − t2) 

= .. ... .. 

RX (t1 − tn) . . . RX (tn − tn)
RX (ti, tj ) = E[X(ti)X(tj )] = RX (tj − ti)
n X
X n
zizj RX (tj − ti) = zTRz = zTE[XXT]z
i=1 j=1
= E[zTXXTz] = E[(zTX)2] ≥ 0
Introduction to spectral analysis
Wprowadzenie do analizy widmowej
Consider a continuous-time band-limited periodic signal
xc(tc) = sin(2πfctc) = sin ωctc, fc ≤ F0
where tc ∈ R denotes continuous time measured in seconds

[sec] and
fc - frequency of a continuous-time signal
expressed in cycles per second [Hz]
ωc - angular frequency of a continuous-time signal
expressed in radians per second [rad/sec]
Suppose that the signal xc(tc) is sampled with the frequency
Fs = 1/Ts ≥ 2F0
x(t) = xc(tc = tTs) = sin(2πf t) = sin(ωt)
where t ∈ C denotes normalized discrete time measured in

samples [sa] and
f - normalized frequency / unormowana czȩstotliwość
expressed in cycles per sample [c/sa]
ω - normalized angular frequency / unormowana pulsacja
expressed in radians per sample [rad/sa]

fc 1 1 ωc
f= ∈ − , , ω= ∈ [−π, π]
Fs 2 2 Fs
energy spectrum of a deterministic signal /

widmo energii sygnalu deterministycznego
Let {x(t)} be a finite energy deterministic signal
∞
X
Ex = x2(t) < ∞
t=−∞
Then {x(t)} posesses a discrete-time Fourier transform

(DFT)
X∞
X(ω) = x(t)e−jωt
t=−∞
φX (ω) = |X(ω)|2
energy spectral density / gȩstość widmowa energii

describes distribution of the signal energy over different
frequency bands
Such interpretation stems from the fact that, according to

the Parseval’s theorem
Z π ∞
X
1
φX (ω)dω = x2 (t) = Ex
2π −π t=−∞
naı̈ve extension to random processes /

naiwne rozszerzenie na procesy losowe
Consider a zero-mean wide sense stationary and ergodic

random process X(t, ξ). It holds that
N
1 X 2 2
lim X (t, ξ) = σX
N →∞ N
t=1
power (mean energy) / moc (średnia energia)
Can we define power spectral density in the form analogous

to the energy spectral density
SX (ω) = lim PX (ω, ξ; N ) ?

N →∞
where N 2
1 X
−jωt
PX (ω, ξ; N ) = X(t, ξ)e
N t=1
periodogram / periodogram
No, in the general case the limit above does not exist and/or
it is realization-dependent, i.e., it is a random variable !
Power spectral density
Widmowa gȩstość mocy
power spectral density shows distribution of the signal
power over different frequency bands
Definition 1
SX (ω) = lim E[PX (ω, ξ; N )]

N →∞
mean periodogram
Definition 2 (Wiener - Khintchine)
∞
X
SX (ω) = F[RX (τ )] = RX (τ )e−jωτ
τ =−∞
Fourier transform of the autocorreletion function
THEOREM
Both definitions given above are equivalent provided that
the autocorrelation function of {X(t)} decays sufficiently
rapidly with growing lag, namely
N
1 X
lim |τ |RX (τ ) = 0
N →∞ N
τ =−N
Power spectral density
PROOF
lim E[PX (ω, ξ; N )]

N →∞
 2

1 XN

= lim E  X(t)e−jωt 
N →∞ N t=1

N N
1 XX
= lim E[X(t1)X(t2)]e−jω(t1−t2)
N →∞ N
t =1 t =1
1 2
N
X −1
1
= lim (N − |τ |)RX (τ )e−jωτ
N →∞ N
τ =−(N −1)
∞
X N
X −1
1
= RX (τ )e−jωτ − lim |τ |RX (τ )
N →∞ N
τ =−∞ τ =−(N −1)
∞
X
= RX (τ )e−jωτ = SX (ω)
τ =−∞
note: power spectral density function carries the same

information about a random process as its autocorrelation
function because
Z π
1
RX (τ ) = F −1 [SX (ω)] = SX (ω)ejωτ dω
2π −π
Properties of power spectral density function
Wlasności funkcji widmowej gȩstości mocy
Z π
2 1
σX = SX (ω)dω
2π −π
The power spectral density function of a wide sense

stationary random process X(t)
• is an even function, that is, SX (ω) = SX (−ω), ∀ω
• is a nonnegative function, that is, SX (ω) ≥ 0, ∀ω
• is a periodic function, namely SX (ω + 2kπ) = SX (ω),

∀k ∈ C
interpretation of different frequency bands:

ω ∈ [0, π] - region of physical significance
ω ∈ [−π, 0) - symmetric fold
|ω| > π - periodic extension
DFT-free assessment of spectral density
function
woman, age 19, healthy
Doctor’s evaluation: persistent, regular alpha

activity with a dominant 9 Hz component
DFT-free approaches:
• Count cycles per second; take into account the relative

amplitudes of different rhythmic components [qualitative
assessment].
• Pass the analyzed signal through a bank of narrowband

bandpass filters with different center frequencies ωi ∈
[0, π], i = 1, . . . , k. Evaluate SX (ωi) as the mean power
of the signal observed at the output of the i-th filter
[quantitative assessment].
Power cepstrum of a signal
Cepstrum mocy sygnalu
A cepstrum is the result of taking the Fourier transform
of the log signal spectrum. Effectively we are treating the
signal spectrum as another signal and looking for periodicity
in the spectrum itself. The term cepstrum was derived by
reversing the first four letters of “spectrum”. The cepstrum
is so-called because it turns the spectrum inside-out. The
x-axis of the cepstrum has units of quefrency, and peaks in
the cepstrum (which relate to periodicities in the spectrum)
are called rahmonics.
Z π
1
KX (q) = F −1 [log |X(ω)|] = log |X(ω)|ejωq dω
2π −π
real cepstrum / cepstrum rzeczywiste

Relationships between random processes
Zwia̧zki pomiȩdzy procesami losowymi
X(t, ξ) ←→ Y (t, ξ)
crosscorelation function / funkcja korelacji wzajemnej
RXY (t1, t2) = E{X(t1)Y (t2)}
cross-covariance function / funkcja kowariancji wzajemnej
CXY (t1, t2) = E{[X(t1) − mX (t1)][Y (t2) − mY (t2)]}

= RXY (t1, t2) − mX (t1)mY (t2)
normalized cross-covariance function / unormowana

funkcja kowariancji wzajemnej
CXY (t1, t2)

ρ(t1, t2) = ∈ [−1, 1]
σX (t1)σY (t2)
uncorrelated processes / procesy nieskorelowane
CXY (t1, t2) = 0, ∀t1, t2
orthogonal processes / procesy ortogonalne
RXY (t1, t2) = 0, ∀t1, t2

Jointly wide sense stationary
random processes
Procesy losowe stacjonarne la̧cznie w
szerokim sensie
Processes X(t) and Y (t) are called jointly wide sense

stationary if each of them is wide sense stationary and
RXY (t1, t2) = RXY (τ )
where τ = t2 − t1.
For jointly wide sense stationary processes X(t) and Y (t)

one can define
cross spectral density function / wzajemna widmowa

gȩstość mocy
∞
X
SXY (ω) = F[RXY (τ )] = e−jωτ RXY (τ )
τ =−∞
note: since in general RXY (τ ) 6= RY X (τ ), the cross

spectral density function SXY (ω) is complex-valued
Independent random processes
Procesy losowe niezależne
Two discrete-time random processes X(t) and Y (t) are

statistically independent if the group of random variables
X(t1), . . . , X(tn) is statistically independent of random
variables Y (s1), . . . , Y (sm) for any values of n, m ∈ N ,
t1, . . . , tn ∈ T and s1, . . . , sm ∈ T :
p(x1, . . . , xn, y1 , . . . , ym; t1, . . . , tn, s1, . . . , sm)

= p(x1, . . . , xn; t1, . . . , tn)
× p(y1, . . . , ym; s1, . . . , sm)
note: two jointly Gaussian and mutually uncorrelated

random processes are independent
Relationships for the sum of two
random processes
Zależności dla sumy dwóch
procesów losowych
Suppose that X(t) and Y (t) are jointly wide sense stationary
random processes and
Z(t) = X(t) + Y (t)
Then
RZ (τ ) = RX (τ ) + RY (τ ) + RXY (τ ) + RY X (τ )
SZ (τ ) = SX (τ ) + SY (τ ) + SXY (τ ) + SY X (τ )
When the processes X(t) and Y (t) are also orthogonal, it

holds that
RZ (τ ) = RX (τ ) + RY (τ )
SZ (τ ) = SX (τ ) + SY (τ )
Linear transformations of random processes
Liniowe przeksztalcenia procesów losowych
Y (t, ξ) = L[X(t, ξ)]
mX (t), RX (t1, t2) −→ mY (t), RY (t1, t2), RXY (t1, t2)
condition of linearity:
L[k1x1 (t) + k2x2(t)] = k1L[x1 (t)] + k2L[x2(t)]
∀ t, k1, k2, {x1(·)}, {x2(·)}
t
X
y(t) = k(i)x(t − i)
i=0
{k(i), i = 1, 2, . . .} – impulse response
of a dynamic system
" t
#
X
mY (t) = E[Y (t, ξ)] = E k(i)X(t − i, ξ)
i=0
t
X t
X
= k(i)E[X(t − i, ξ)] = k(i)mX (t − i)
i=0 i=0
Linear transformations of random processes
In an analogous way one can show that:
RXY (t1, t2) = E[X(t1, ξ)Y (t2, ξ)]

t2
X
= k(i)RX (t1, t2 − i)
i=0
RY (t1, t2) = E[Y (t1, ξ)Y (t2, ξ)]

t1 X
X t2
= k(i)k(j)RX (t1 − i, t2 − j)
i=0 j=0
steady state relationships for a wide sense stationary

process X(t): mX (t) = mX , RX (t1, t2) = RX (τ )
∞
X
y(t) = k(i)x(t − i)
i=0
∞
X
mY = mX k(i)
i=0
∞
X
RXY (τ ) = k(i)RX (τ − i)
i=0
X∞ X∞
RY (τ ) = k(i)k(j)RX (τ + i − j)
i=0 j=0
note: the output process Y (t) is also wide sense stationary

Linear time-invariant systems
Uklady liniowe niezmiennicze w czasie
conditions of preservation of stationarity
at the output of a linear system
THEOREM
Suppose that a linear system L[·] is excited by a strictly
(or wide sense) stationary random process X(t). When the
system is time-invariant, i.e., the relationship y(t) = L[x(t)]
entails
y(t + ∆t) = L[x(t + ∆t)], ∀t, ∆t, {x(·)}
then the output random signal Y (t) is also strictly (or wide
sense) stationary.
Pt
The system governed by y(t) = L[x(t)] = i=0 k(i)x(t−i)
is not time-invariant because
t+∆t
X
y(t + ∆t) = k(i)x(t + ∆t − i)
i=0
t
X
6= L[x(t + ∆t)] = k(i)x(t + ∆t − i)
i=0
Show
P∞ that the system governed by y(t) = L[x(t)] =
i=0 k(i)x(t − i) is time-invariant.
Relationships between spectral
density functions
Zwia̧zki pomiȩdzy funkcjami widmowej
gȩstości mocy
Suppose that X(t) is a wide sense stationary random process
with a spectral density function SX (ω) and Y (t) is the
process observed at the output of a linear time-invariant
system
X∞
y(t) = k(i)x(t − i)
i=0
Derive expressions for SXY (ω) and SY (ω).
∞
X
SX (ω) = RX (τ )e−jωτ
τ =−∞
∞
X
SXY (ω) = RXY (τ )e−jωτ
τ =−∞
∞
X ∞
X
= e−jωτ k(s)RX (τ − s)
τ =−∞ s=0
∞
" ∞
#
X X
= e−jωsk(s) e−jω(τ −s) RX (τ − s)
s=0 τ =−∞
∞
X
= e−jωsk(s)SX (ω) = K(e−jω )SX (ω)
s=0
Relationships between spectral
density functions
where ∞
X
K(e−jω ) = e−jωsk(s)
s=0
denotes transfer function of an excited system.

∞
X
SY (ω) = RY (τ )e−jωτ
τ =−∞
∞
X ∞ X
X ∞
= e−jωτ k(s1)k(s2)RX (τ − s2 + s1)
τ =−∞ s1 =0 s2 =0
∞
X ∞
X
= ejωs1 k(s1) e−jωs2 k(s2)
s1 =0 s2 =0
" ∞
#
X
× e−jω(τ −s2+s1)RX (τ − s2 + s1)
τ =−∞

−jω 2
= K(e )K(e jω −jω
)SX (ω) = K(e ) SX (ω)
summary:
SXY (ω) = K(e−jω )SX (ω)

−jω 2

SY (ω) = K(e ) SX (ω)
Spectral subtraction
Metoda odejmowania widmowego
Goal:
Recover signal x(t) based on its noisy measurements y(t)
y(t) = x(t) + z(t)
Assumptions:
1. X(t) (recovered signal) and Z(t) (noise) are locally

wide-sense stationary random processes.
2. Processes X(t) and Z(t) are orthogonal.
3. Spectral density function of noise SZ (ω) is known or

can be estimated based on those fragments of Y (t) that
contain noise only.
S.F. Böll, “Suppression of acoustic noise in speech using

spectral subtraction”, IEEE Transactions on Acoustics,
Speech and Signal Processing, vol. 27, pp. 113-120, 1979.
Spectral subtraction
note: our perception of sounds depends primarily on their

spectral content – two sounds with the same spectral
density functions are perceived as (almost) identical
Under the assumptions made it holds that (locally)
SY (ω) = SX (ω) + SZ (ω)
Can one design a linear filter K(z −1) that would transform
a process Y (t) with spectral density function SY (ω) into
the process X(t) with spectral density function SX (ω)?

−jω 2

SX (ω) = K(e ) SY (ω)
Therefore the amplitude characteristic of the transforming

filter can be obtained from
s
SX (ω)

A(ω) = K(e−jω
) =
SY (ω)
s
SY (ω) − SZ (ω)
=
SY (ω)
power spectral subtraction

Discrete Fourier transform (DFT)
Dyskretna transformata Fouriera
N
X −1
X(ωi) = x(t)e−jωi t
t=0
2πi
ωi = , i = 0, . . . , N − 1
N
{X(ω0), . . . , X(ωN −1)}

= DFT{x(0), . . . , x(N − 1)}
Inverse Discrete Fourier Transform (IDFT)
N −1
1 X
x(t) = X(ωi)ejωit
N i=0
t = 0, . . . , N − 1
{x(0), . . . , x(N − 1)}

= IDFT{X(ω0), . . . , X(ωN−1)}
(Inverse) Fast Fourier Transform
N = 2k −→ FFT, IFFT
Power spectral subtraction
The signal is divided into frames (segments) of Identical
widths N (N should be short enough to guarantee local
signal stationarity, e.g., in case of speech signals it should
cover no more than 10-20 ms of speech). Each frame is
“denoised” separately and the results are combined for the
entire recording. Consider any frame (to simplify notation
the number of the frame was suppressed)
{y(0), . . . , y(N − 1)}
Step 1
Use a fragment of the signal that contains noise only
[y(t) = z(t)] to estimate the power spectral density of the
noise

1 2 N
SbZ (ωi) = |Z(ωi)| , i = 0, . . . ,
N 2
If many noise frames are available, apply averaging.

Step 2
Estimate the power spectral density function of the noisy
signal

1 N
SbY (ωi) = |Y (ωi)|2 , i = 0, . . . ,
N 2
Step 3
Estimate the power spectral density function of the noiseless
signal

b SbY (ωi) − SbZ (ωi) when nonnegative
SX (ωi) =
0 otherwise

N
i = 0, . . . ,
2
Step 4
Design denoising filter
s
b i) = SbX (ωi) N
A(ω , i = 0, . . . ,
b
SY (ωi) 2

b i) = A(ω
b N −1−i ), N
A(ω i= + 1, . . . , N − 1
2
Step 5
Evaluate denoised signal
b i) = A(ω
X(ω b i)Y (ωi), i = 0, . . . , N − 1
{b b(N − 1)}
x(0), . . . , x
b 0), . . . , X(ω
= IDFT{X(ω b N−1)}
Read next frame and return to Step 2.

Remarks and extensions
1. When using spectral subtraction one modifies the

amplitude distribution of noisy speech in different
frequency bands, but preserves its phase distribution.
2. When intensity of wideband noise corrupting speech is

large, the results of denoising may suffer from the so-
called musical noise (tin-like sound consisting of many
short-lived narrowband components appearing at random
frequencies). Musical noise can be reduced by eliminating
tones (i.e., periodogram spectral lines) that appear only
in isolated segments.
3. To eliminate signal discontinuities that may occur at the

beginning/end of each denoised speech segment, one
may use the overlap-add technique, e.g. for segments of
length N = 2n + 1 and 50% overlap one can use the
following data windows
|t − n|
w(t) = 1 − , t = 0, . . . , 2n
n
or
1 πt
w(t) = 1 − cos , t = 0, . . . , 2n
2 n
w(t) + w(t − n) = 1, t = n, . . . , 2n
Overlap-add synthesis

b(t)w(t) t = 0, . . . , 2n
x
bw (t) =
x
0 elsewhere
bkw (t) = x
x bw (t), t = kn, . . . , (k + 2)n
k−1
b(t) = . . . + x
x bw bkw (t) + x
(t) + x bk+1
w (t) + . . .
4. Generalized spectral subtraction

h i 1/β
 |Y (ωi)|β − α|Z(ωi)|β 
 
b i) =
X(ω +
Y (ωi)

 |Y (ωi )|β 

where β ≥ 1, α > 0 is the user-dependent coefficient,

(·) denotes time averaging, and [·]+ denotes “round to
nonnegative” operation.
β = 1 −→ magnitude spectral substraction

β = 2 −→ power spectral substraction
NOTE: in practice one often takes α > 1.

Project 2
Create a noisy audio recording (add artificially generated

noise to a clean music or speech signal). The first second
of the recording should contain noise only. Then denoise
recording using the method of spectral subtraction.
THE END
The longest run of heads and tails
Mark F. Schilling, “The Longest Run of Heads,”

Mathematical Association of America
http://shiny.stat.calpoly.edu/LongRun/

STSnew

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STSnew

Uploaded by

Copyright:

Available Formats

'

Gdańsk University of Technology

Departament of Automatic Control

Feynmans theory states that the probability of an event is

A random variable is a function X(ξ) that associates a

realization of a random variable

Ξ is the set of all diﬀerent possibilities that could happen

discrete random variable – when ΩX takes ﬁnite or

continuous random variable – when ΩX takes uncountably

P (x1 < X ≤ x2) = FX (x2) − FX (x1) ≥ 0

P (x ≤ X < x + ǫ) dFX (x)

mean (expected) value / wartość oczekiwana (średnia)

standard deviation / średnie odchylenie standardowe

n-th central moment / moment centralny n-tego rzȩdu

coeﬃcient of skewness / wspólczynnik asymetrii

for a symmetric distribution γX = 0

coeﬃcient of kurtosis (excess kurtosis) / kurtoza

miara splaszczenia rozkladu (od gr. kurtos = rozdȩty)

for a Gaussian distribution κX = 0

Probability density function with negative skew (left) and

uniform random variable / zmienna losowa o rozkladzie

X ∼ U (a, b), a < b

Probability density function (left) and cumulative

Gaussian (normal) random variable / zmienna losowa o

Probability density function (left) and cumulative

Let Xi, i = 1, 2, . . ., be a sequence of independent random

In the limits as n approaches inﬁnity, the random variable

CLT explains why the Gaussian distribution is of such a

PDF of the sum of independent uniform random variables:

Laplace random variable / zmienna losowa o rozkladzie

X ∼ L(a, b), b > 0

Y ∼ U (− 12 , 21 ) ⇒ X = sgn[Y ] ln(1 − 2|Y |) ∼ L(0, 1)

Probability density function (left) and cumulative

Cauchy random variable / zmienna losowa o rozkladzie

X ∼ C(a, b), b > 0

a – location parameter, b – scale parameter

Y ∼ N (0, 1), Z ∼ N (0, 1) ⇒ X = Y /Z ∼ C(0, 1)

Probability density function (left) and cumulative

standard Cauchy distribution / standardowy rozklad

note that due to symmetry of pX (x)

joint cumulative distribution function / la̧czna dystrybuanta

• FXY (−∞, −∞) = 0

• FXY (−∞, y) = FXY (x, −∞) = 0

• FXY (x, ∞) = FX (x)

• FXY (∞, y) = FY (y)

• if x1 < x2 and y1 < y2 then

P (x1 < X ≤ x2, y1 < Y ≤ y2)

Joint probability density function two of random variables.

P (x1 < X ≤ x2, y1 < Y ≤ y2)

marginal distributions / rozklady brzegowe

Joint distribution of two random variables and their

note: there exist many diﬀerent joint distributions that

conditional probability density function / warunkowa

Joint distribution of random variables X and Y (top) and

covariance between random variables / wspólczynnik

cXY = E[(X − mX )(Y − mY )]

correlation coeﬃcient of two random variables /

normalized random variables / unormowane zmienne

The correlation coeﬃcient is less than 1 in magnitude.

Consider taking the second moment of X + aY , where a is

E[(X + aY )2] = E[X 2] + 2aE[XY ] + a2E[Y 2] ≥ 0.

Since this is true for any a, we can tighten the bound by

2 (E[XY ])2 (E[XY ])2

After replacing X with X − mX and Y with Y − mY , one