You are on page 1of 96

'

Gdansk University
of Technology

Random Processes –
Theory for the Practician

by

Maciej Niedźwiecki

Gdańsk University of Technology

Departament of Automatic Control

Gdańsk, Poland
Does God play dice ?
Czy Bóg gra w kości ?
“Quantum mechanics is certainly imposing. But an inner
voice tells me that it is not yet the real thing. The theory
says a lot, but does not really bring us any closer to the
secret of the ”old one.” I, at any rate, am convinced that
He does not throw dice.”
Albert Einstein

Yes, He does !
to get the modern view watch Richard Feynman’s
The Douglas Robb Memorial Lectures 1979
http://vega.org.uk/video/subseries/8

Feynmans theory states that the probability of an event is


determined by summing together all the possible histories of
that event. For example, for a particle moving from point
A to B we imagine the particle traveling every possible
path, curved paths, oscillating paths, squiggly paths, even
backward in time and forward in time paths. Each path has
an amplitude, and when summed the vast majority of all
these amplitudes add up to zero, and all that remains is the
comparably few histories that abide by the laws and forces
of nature. Sum over histories indicates the direction of our
ordinary clock time; it is simply a path in space which is
more probable than the more exotic directions time might
have taken otherwise.
Scalar random variables
Skalarne zmienne losowe
random variable / zmienna losowa

A random variable is a function X(ξ) that associates a


number with each outcome ξ of a certain experiment. For
each outcome ξ ∈ Ξ the probability P (ξ) is defined.

X : Ξ −→ ΩX ⊆ R, {X(ξ), ξ ∈ Ξ}

where:
Ξ - sample space / zbiór zdarzeń elementarnych
ξ - outcome / zdarzenie elementarne
ΩX - observation space / zbiór obserwacji

realization of a random variable


realizacja zmiennej losowej
x = X(ξ)

Ξ is the set of all different possibilities that could happen


ξ is one of different possibilities that did happen

discrete random variable – when ΩX takes finite or


countably infinite number of values

continuous random variable – when ΩX takes uncountably


infinite number of values
Cumulative distribution function (CDF)
Dystrybuanta

FX (x) = P (X ≤ x)

P (·), 0≤P ≤1
probability of an event / prawdopodobieństwo zdarzenia

Properties:

• FX (−∞) = 0, FX (∞) = 1

• 0 ≤ FX (x) ≤ 1

• if x1 < x2 then

P (x1 < X ≤ x2) = FX (x2) − FX (x1) ≥ 0


Probability density function (PDF)
Gȩstość rozkladu prawdopodobieństwa

P (x ≤ X < x + ǫ) dFX (x)


pX (x) = lim =
ǫ→0 ǫ dx

Properties:

• pX (x) ≥ 0
Z x
• FX (x) = pX (z)dz
−∞

Z ∞
• pX (x)dx = 1
−∞

Z x2
• P (x1 < X ≤ x2) = pX (z)dz
x1

Is it possible that

pX (x) > 1 ?

P (X = x) 6= 0 ?
Characteristics of random variables
Charakterystyki zmiennych losowych

mean (expected) value / wartość oczekiwana (średnia)


Z ∞ Z ∞
mX = E[X] = x dFX (x) = x pX (x)dx
−∞ −∞

median / mediana

αX = med[X] : P (X ≤ αX ) = P (X ≥ αX )
Z αX
1
= pX (x)dx =
−∞ 2

variance / wariancja

2
 2

σX= var[X] = E (X − mX )
Z ∞ Z ∞
= (x − mX )2 dFX (x) = (x − mX )2 pX (x)dx
−∞ −∞

standard deviation / średnie odchylenie standardowe


p
σX = var[X]
Characteristics of random variables

n-th central moment / moment centralny n-tego rzȩdu


Z ∞
µnX n
= E [(X − mX ) ] = (x − mX )n pX (x)dx
−∞

coefficient of skewness / wspólczynnik asymetrii


 3

E (X − mX ) µ3X
γX = 3 = 3
σX σX

for a symmetric distribution γX = 0

coefficient of kurtosis (excess kurtosis) / kurtoza

miara splaszczenia rozkladu (od gr. kurtos = rozdȩty)


 4

E (X − mX ) µ4X
κX = 4 −3= 4 −3
σX σX

for a Gaussian distribution κX = 0


κX > 0 – leptocurtic distribution / rozklad leptokurtyczny
κX = 0 – mesocurtic distribution / rozklad mezokurtyczny
κX < 0 – platocurtic distribution / rozklad platokurtyczny
Characteristics of random variables

γX < 0 γX > 0

Probability density function with negative skew (left) and


positive skew (right).

κX = 3 κX = 0 κX = −1.5
Super-Gaussian (leptocurtic) [left], Gaussian [middle] and
sub-Gaussian (platycurtic) [right] probability density
functions.
Important random variables
Ważne zmienne losowe

uniform random variable / zmienna losowa o rozkladzie


równomiernym (jednostajnym)

X ∼ U (a, b), a < b


 1
b−a a≤x<b
pX (x) =
0 elsewhere

 0 x<a
x−a
FX (x) = b−a a≤x<b

1 x≥b

a+b 2 (b−a)2
mX = 2 , σX = 12
MATLAB : U (0, 1) = rand

Probability density function (left) and cumulative


distribution function (right) of a uniform random variable.
Important random variables

Gaussian (normal) random variable / zmienna losowa o


rozkladzie gaussowskim (normalnym)

2 2
X ∼ N (mX , σX ), σX >0
 2

1 (x − mX )
pX (x) = p exp − 2
2πσX 2 2σX
Z x  
1 (z − mX )2
FX (x) = p exp − 2 dz
−∞
2
2πσX 2σ X
 
x − mX

σX
√ Rx
where Φ(x) = (1/ 2π) −∞ exp{−z 2/2}dz
MATLAB : N (0, 1) = randn

Probability density function (left) and cumulative


distribution function (right) of a Gaussian random variable.
Central limit theorem (CLT)
Centralne twierdzenie graniczne

THEOREM

Let Xi, i = 1, 2, . . ., be a sequence of independent random


variables with identical (any!) distribution, mean mX and
2
variance σX < ∞. Define a new random variable Zn:

n
1 X Xi − mX
Zn = √
n i=1 σX

In the limits as n approaches infinity, the random variable


Zn converges in distribution to a standard normal random
variable N (0, 1):

n
!
1 X Xi − mX
lim pZn √ = N (0, 1)
n→∞ n i=1
σX

CLT explains why the Gaussian distribution is of such a


great importance, and why it occurs so frequently
The central limit theorem (CLT)

PDF of the sum of independent uniform random variables:


n = 2, 3, 4, 5.
Important random variables

Laplace random variable / zmienna losowa o rozkladzie


Laplace’a (dwustronnym wykladniczym)

X ∼ L(a, b), b > 0


a – location parameter, b – scale parameter
 
1 |x − a|
pX (x) = exp −
2b b
 1 x

FX (x) = 2 exp b  x < 0
1 − 12 exp xb x≥0

2
mX = a , σX = b2/2

Y ∼ U (− 12 , 21 ) ⇒ X = sgn[Y ] ln(1 − 2|Y |) ∼ L(0, 1)

Probability density function (left) and cumulative


distribution function (right) of a Laplace random variable.
Important random variables

Cauchy random variable / zmienna losowa o rozkladzie


Cauchy’ego

X ∼ C(a, b), b > 0

a – location parameter, b – scale parameter

b/π
pX (x) =
b2 + (x − a)2
 
1 1 x−a
FX (x) = + tan−1
2 π b

2
mX =? , σX =?

Y ∼ N (0, 1), Z ∼ N (0, 1) ⇒ X = Y /Z ∼ C(0, 1)

Probability density function (left) and cumulative


distribution function (right) of a Cauchy random variable.
Cauchy random variables
The Cauchy distribution is an example of a distribution
which has no mean, variance or higher moments defined.
Its median is well defined and equal to a.

standard Cauchy distribution / standardowy rozklad


Cauchy’ego

1
X ∼ C(0, 1) : pX (x) =
π(1 + x2)
mean value:
Z ∞ Z ∞
x pX (x)dx = x pX (x)dx
−∞ 0
Z 0
− |x| pX (x)dx = ∞ − ∞
−∞

note that due to symmetry of pX (x)


Z a Z a
lim x pX (x)dx 6= lim x pX (x)dx
a→∞ −a a→∞ −2a

variance:
Z ∞ 2 Z ∞
1 x 1
E[X 2] = 2
dx = dx − 1 = ∞
π −∞ 1 + x π −∞
Pairs of random variables
Pary zmiennych losowych
X(ξ) ←→ Y (ξ)

joint cumulative distribution function / la̧czna dystrybuanta

FXY (x, y) = P (X ≤ x, Y ≤ y)
Properties:

• FXY (−∞, −∞) = 0

• FXY (−∞, y) = FXY (x, −∞) = 0

• FXY (∞, ∞) = 1

• 0 ≤ FXY (x, y) ≤ 1

• FXY (x, ∞) = FX (x)

• FXY (∞, y) = FY (y)

• if x1 < x2 and y1 < y2 then

P (x1 < X ≤ x2, y1 < Y ≤ y2)


= FXY (x2, y2) − FXY (x1, y2)
− FXY (x2, y1) + FXY (x1, y1) ≥ 0
Pairs of random variables
joint probability density function / funkcja la̧cznej gȩstości
prawdopodobieństwa

pXY (x, y)
P (x ≤ X < x + ǫx, y ≤ Y < y + ǫy )
= lim
ǫx →0,ǫy →0 ǫx ǫy
∂2
= FXY (x, y)
∂x∂y

Joint probability density function two of random variables.


Properties:
• pXY (x, y) ≥ 0
Z x Z y
• FXY (x, y) = pXY (x, y)dxdy
−∞ −∞

Z ∞ Z ∞
• pXY (x, y)dxdy = 1
−∞ −∞
• if x1 < x2 and y1 < y2 then

P (x1 < X ≤ x2, y1 < Y ≤ y2)


Z x2 Z y 2
= pXY (x, y)dxdy
x1 y1

marginal distributions / rozklady brzegowe


Z ∞
pXY (x, y)dy = pX (x)
−∞
Z ∞
pXY (x, y)dx = pY (y)
−∞

Joint distribution of two random variables and their


marginal distributions

note: there exist many different joint distributions that


produce the same marginal distributions
Pairs of random variables

conditional probability density function / warunkowa


gȩstość rozkladu prawdopodobieństwa

X|Y (ξ) = y

pXY (x, y)
pX|Y (x|y) =
pY (y)

Joint distribution of random variables X and Y (top) and


two conditional distributions: pY |X (y|x = 1.27) (bottom
left) and pX|Y (x|y = −0.37) (bottom right).
correlation between random variables / wspólczynnik
korelacji zmiennych losowych
Z ∞Z ∞
rXY = E[XY ] = xy pXY (x, y)dxdy
−∞ −∞

covariance between random variables / wspólczynnik


kowariancji zmiennych losowych

cXY = E[(X − mX )(Y − mY )]


Z ∞Z ∞
= (x − mX )(y − mY ) pXY (x, y)dxdy
−∞ −∞

= rXY − mX mY

correlation coefficient of two random variables /


unormowany wspólczynnik kowariancji dwóch zmiennych
losowych

normalized random variables / unormowane zmienne


losowe

e = X − mX , Ye = Y − mY
X
σX σY
2 2
mXe = mYe = 0 , σX
e = σY
e = 1

e Ye ] = cXY
ρXY = E[X
σX σY
−1 ≤ ρXY ≤ 1
Pairs of random variables

THEOREM

The correlation coefficient is less than 1 in magnitude.

PROOF

Consider taking the second moment of X + aY , where a is


a real constant:

E[(X + aY )2] = E[X 2] + 2aE[XY ] + a2E[Y 2] ≥ 0.

Since this is true for any a, we can tighten the bound by


choosing the value of a that minimizes the left-hand side.
This value turns out to be a = −E[XY ]/E[Y 2]. Plugging
in this value gives

2 (E[XY ])2 (E[XY ])2


E[X ] + −2
E[Y 2] E[Y 2]
2
(E[XY ])
= E[X 2] − ≥ 0.
E[Y 2]

After replacing X with X − mX and Y with Y − mY , one


arrives at
ρ2XY ≤ 1

which is the desired result.


Pairs of random variables
orthogonal random variables / ortogonalne zmienne losowe
rXY = 0
uncorrelated random variables / nieskorelowane zmienne
losowe
cXY = 0

independent random variables / niezależne zmienne losowe


FXY (x, y) = FX (x)FY (y), pXY (x, y) = pX (x)pY (y)

note: independent random variables are always


uncorrelated but the converse is not true.
EXAMPLE (independence versus uncorrelatedness)
Case 1
Consider random variables X and Y that are uniformly
distributed on the square defined by 0 ≤ x, y ≤ 1, i.e.,

1 0 ≤ x, y ≤ 1
pXY (x, y) =
0 elsewhere

The marginal probability density functions of X and Y turn


out to be
 
1 0≤x≤1 1 0≤y≤1
pX (x) = pY (y) =
0 elsewhere 0 elsewhere

These random variables are statistically independent since


pXY (x, y) = pX (x)pY (y).
Pairs of random variables
Case 2
Consider random variables X and Y that are uniformly
distributed over the unit circle, i.e.,
 1 2 2
π x + y ≤1
pXY (x, y) =
0 elsewhere

The marginal probability density function of X can be found


as follows
Z ∞ Z √1−x2
1
pX (x) = pXY (x, y)dy = √ dy
−∞ π − 1−x2
2p
= 1 − x2, −1 ≤ x ≤ 1
π
By symmetry, the marginal PDF of Y must take on the
same functional form. Hence
4p
pX (x)pY (y) = 2 (1 − x2)(1 − y 2) 6= pXY (x, y)
π
The correlation between X and Y is
Z
xy
E[XY ] = dxdy
2 2
x +y ≤1 π
 √ 
Z 1 Z 1−x2
1
= x √ ydy  dx = 0
π −1 − 1−x2

Hence X and Y are uncorrelated but not independent !


Jointly Gaussian random variables
Zmienne losowe o la̧cznym rozkladzie
gaussowskim

1
pXY (x, y) = p ×
2πσX σY 1 − ρXY 2
  2     2 

 x−m X
− 2ρ x−mX y−mY
+ y−mY 

σX XY σX σY σY
exp −

 2(1 − ρ2XY ) 

marginal distributions of a bivariate Gaussian distribution


are also Gaussian
Z ∞  2

1 (x − mX )
pX (x) = pXY (x, y)dy = p exp − 2
−∞ 2πσX 2 2σX
Z ∞  2

1 (y − mY )
pY (y) = pXY (x, y)dx = p exp −
−∞ 2πσY 2 2σY2

Probability density function of two jointly Gaussian


random variables.
Jointly Gaussian random variables
equidensity contours / izolinie rozkladu
 2   
x − mX x − mX y − mY
− 2ρXY
σX σX σY
 2
y − mY
+ = 1 − ρ2XY
σY

covariance ellipse / elipsa kowariancji

Contours of the bivariate Gaussian PDF corresponding to


the positive (left) and negative (right) correlation
coefficient ρXY .

note: for mX = mY = 0 and ρXY = 0, one obtains


x2 y2
2 + 2 =1
σX σY
Jointly Gaussian random variables

covariance ellipse is a counterpart of the interval


[mX − σX , mX + σX ] for a one-dimensional Gaussian
distribution

2
if X ∼ N (mX , σX ) then
P (X ∈ [mX − σX , mX + σX ]) = 0.68

2
if (X, Y ) ∼ N (mX , mY ; σX , σY2 , ρXY ) then
P [(X, Y ) ∈ DXY ] = 0.68

where DXY denotes the two-dimensional region bounded


by the covariance ellipse

conditional distribution / rozklad warunkowy

2
[X|Y (ξ) = y] ∼ N (mX|Y , σX|Y )
σX
mX|Y = mX + ρXY (y − mY )
σY
2
σX|Y = (1 − ρ2XY )σX
2
Jointly Gaussian random variables
note: two uncorrelated jointly Gaussian variables are
independent

if ρXY = 0 then

  2  2 

 x−mX
+ y−mY 

1 σX σY
pXY (x, y) = exp −
2πσX σY 
 2 

 
1 (x − mX )2
=p exp − 2
2
2πσX 2σX
 2

1 (y − mY )
×p exp −
2πσY2 2σY2
= pX (x)pY (y)

note: two normally distributed random variables need not


be jointly normal – normally distributed and uncorrelated
does not imply independent

EXAMPLE
Let X ∼ N (0, 1). Define Y in the form

−X if |X| < c
Y = c>0
X if |X| ≥ c
Jointly Gaussian random variables

Note that

1. Y is Gaussian – Y ∼ N (0, 1) – because

P (Y ≤ x) =
P ({|X| < c and − X < x} or {|X| ≥ c and X < x})
= P (|X| < c and − X < x) + P (|X| ≥ c and X < x)
= P (|X| < c and X < x) + P (|X| ≥ c and X < x)
= P (X < x)

This follows from the symmetry of the distribution of X


and the symmetry of the condition that |X| < c.

2. Since X completely determines Y , X and Y are not


independent.

3. There exists such c > 0 for which X and Y are


uncorrelated. If c is very small, then the correlation
cXY is close to 1. If c is very large, then cXY is close
to -1. Since the correlation is a continuous function of
c, the intermediate value theorem implies there is some
particular value of c that makes the correlation 0. That
value is approximately 1.54.
Estimation of characteristics of
a random variable
Estymacja charakterystyk
zmiennych losowych

Consider a sequence of independent random variables


X1, . . . , Xn and their realizations x1, . . . , xn. Suppose that
all random variables have the same distribution FX (x, θ),
where θ is an unknown parameter (e.g. mean value,
variance, kurtosis etc. - assuming that the corresponding
characteristic is well-defined).

How can we estimate θ based on Xn = {x1, . . . , xn} ?

b
θ(n) b n)
= θ(X

In statistics, an estimator (estymator ) is a rule (przepis)


b for calculating an estimate (estymata) θ(X
θ(·) b n) of a given
quantity θ based on observed data Xn.

Since the available data set Xn consists of realizations of


random variables, the estimator itself is also a random
variable.
Desired properties of an estimator
Poża̧dane wlasności estymatora

Unbiasedness / Brak obcia̧żenia

The estimator θ(n) b is called unbiased (estymator


nieobcia̧żony ) if its expected value is equal to the true
value h i
b
EXn θ(n) =θ
The quantity
h i
b
B[θ(n)] b −θ
= EXn θ(n)

is called the estimation bias (obcia̧żenie estymacji).


b
When it holds that limn→∞ B[θ(n)] = 0, the estimator is
called asymptotically unbiased (estymator asymptotycznie
nieobcia̧żony ).

Consistency / Zgodność
b
The estimator θ(n) is called consistent (estymator zgodny )
if it converges, in a stochastic sense (e.g. with probability
one), to the true parameter value as the number of data
points increases to infinity

b
lim θ(n) = θ w.p.1
n→∞
Stochastic convergence modes
Typy zbieżności stochastycznej

Convergence in probability
Zbieżność wedlug prawdopodobieństwa
 
lim P |θ(n)b − θ| > ε = 0
n→∞

Convergence in the mean square sense


Zbieżność średniokwadratowa
h i
lim EXn (θ(n) b − θ) = 0
2
n→∞

Convergence with probability 1 (almost sure)


Zbieżność z prawdopodobieństwem 1 (prawie na pewno)
 
b
P lim θ(n) =θ =1
n→∞

Interpretation: the probability of finding a realization Xn


b
for which θ(n) does not converge to θ is zero.

Mean square convergence and almost sure convergence


imply convergence in probability. The converse is not true.
Mean square convergence and almost sure convergence
cannot be directly related.
Estimation of the mean and variance
Estymacja wartości oczekiwanej i wariancji
Mean / Wartość oczekiwana
n
1X
b X (n) =
m xi
n i=1

Note that " #


n
X n
1 1X
b X (n)] = E
EXn [m Xi = E[Xi] = mX
n i=1
n i=1

b X (n) is unbiased.
which means that the estimator m

Variance / Wariancja
 1
Pn 2
 n i=1(xi − mX ) if mX is known
2
bX
σ (n) =
 1 Pn 2
n−1 i=1 [xi − b
m X (n)] otherwise

Note that
 2 
EXn σbX (n)
" n # n
1X 1 X
=E (Xi − mX )2 = E[(Xi − mX )2] = σX
2
n i=1 n i=1

which means that in the case where the mean value mX


2
bX
is known the estimator σ (n) is unbiased. Show that the
same holds true when the value of mX is unknown.
Estimation of cumulative
distribution function
Estymacja dystrybuanty

FX (x) = P (X ≤ x)


 1 if Xi ≤ x
Yi(x) =

0 if Xi > x

n
b 1X
FX (x) = Yi(x)
n i=1

Estimate of the CFD of a uniform random variable


obtained from n IID random variables
Xi ∼ U (0, 1), i = 1, . . . , n.
Independent component analysis (ICA)
Analiza skladowych niezależnych

Consider two independent and non-Gaussian random


variables X1 and X2. Suppose we can observe realizations
of two other random variables that are mixtures of X1 and
X2 :
Y1 = a11X1 + a12X2
Y2 = a21X1 + a22X2

but we do not know the values of the mixing coefficients


a11, a12, a21, a22.

Given y1 = Y1(ξ) and y2 = Y2(ξ) can we find out


x1 = X1(ξ) and x2 = X2(ξ)?

Real-world example: If two people speak at the same time


in a room containing two microphones, the output of each
microphone is a mixture of two voice signals. Given these
two signal mixtures can we recover the two original voices
or source signals?

blind source separation / ślepa separacja sygnalów

   −1  
x1 a11 a12 y1
=
x2 a21 a22 y2

But how would we know a11, a12, a21, a22 ?


What results can ICA provide ?

input signals (mixtures)

output signals (sources)


Limitations:

• We cannot determine the variances (energies) of the


independent components.

• We cannot determine the order of the independent


components.
What makes it possible ?
Jak to możliwe ?

sources:
X1 ∼ U (−0.5, 0.5), X2 ∼ U (−0.5, 0.5)
2 2
mX1 = mX2 = 0, σX 1
= σX 2
= 1/12, cX1X2 = 0

mixtures:
Y1 = √1 X1 + √12 X2 , Y2 = √1 X1 − √12 X2
2 2
mY1 = mY2 = 0, σY2 1 = σY2 2 = 1/12, cY1Y2 = 0
One of the possible approaches to ICA
Consider n > 1 sources and n mixtures
n
X
Yi = aij Xj , i = 1, . . . , n
j=1

Two-step procedure:

• STEP 1: Prewhite the available measurement data


- the new random variables Ye1, . . . , Yen should be
(approximately) mutually uncorrelated
Pn
yei = j=1 bij (yj − mYj ), var[Yei] = 1, i = 1, . . . , n
cYeif
Yj
= 0, i 6= j

• STEP 2: Recover source signals using the relationship


n
X
xbi = b
cij yej , i = 1, . . . , n
j=1

where {b cin} are the coefficients chosen


ci1, . . . , b
(iteratively) so as to maximize non-Gaussianity of
c Pn
Xi = j=1 b cij Yej :
 
X n

{b cin} = arg max κX
ci1, . . . , b ci
 cij yej

ci1 ,...,cin
j=1

FastICA algorithm
Picture guide to ICA

joint distribution of input signals (mixtures)

joint distribution of whitened mixtures

joint distribution of output signals (sources)


Picture guide to ICA

joint distribution of a mixture of Laplace variables

joint distribution of independent Laplace variables


Example of ICA application
- fetal hart monitoring

The magnetic field generated by electrical activity in the


heart is measured at 37 different locations using fetal
megnetocardiography (FMCG). Each measured signal is a
different mixture of fetal and maternal cardiac signals.
Example of ICA application
- fetal hart monitoring

Detail of 7 out of 37 measured signals

Separated fetal and maternal cardiac signals


Vector random variables
Wektorowe zmienne losowe

 
X1
X =  ..  , x = X(ξ)
Xn n×1

mean (expected) value / wartość oczekiwana (średnia)


 
mX1
mX = E[X] =  .. 
mXn n×1

covariance matrix / covariance matrix


 T

ΣX = cov[X] = E (X − mX )(X − mX )
 2 
σ1 σ12 . . . σ1n
 σ21 σ22 . . . σ2n 
= .. ... .. 

σn1 σn2 . . . σn2 n×n

where σij = E[(Xi − mXi )(Xj − mXj )] = cXiXj


and σi2 = var[Xi] = cXiXi

note: covariance matrix in nonnegative definite

ΣX ≥ 0 ⇐⇒ yTΣX y ≥ 0, ∀y
Vector random variables

covariance ellipsoid / elipsoida kowariancji

(x − mX )TΣ−1
X (x − mX ) = 1

• ellipsoid centered at mX

• the directions of the principal axes are given by the


eigenvectors of the covariance matrix ΣX

• the squared lengths of the semi-principal axes are given


by the corresponding eigenvalues

Covariance ellipse (n = 2).


partial ordering of covariance matrices

? ? ? ?
ΣX <ΣY , ΣX ≤ΣY , ΣX >ΣY , ΣX ≥ΣY
 T −1

D X = x : x ΣX x ≤ 1
 T −1

D Y = y : y ΣY y ≤ 1

ΣX ≤ ΣY ⇐⇒ ΣY − ΣX ≥ 0 ⇐⇒ DX ⊆ DY

covariance ellipsoid of ΣX is located inside


(possibly touching) the covariance ellipsoid of ΣY

singular distributions / rozklady osobliwe


singular distribution is a probability distribution
concentrated on an uncountable set ΩX ⊂ Rn
of Lebesque measure 0

• distribution defined on a line in a


two-dimensional space

• distribution defined on a circle in a


two-dimensional space

• distribution defined on a plane in a


three-dimensional space

• distribution defined on a sphere in a


three-dimensional space
Vector random variables

multivariate Gaussian (normal) distribution /


wielowymiarowy rozklad gaussowski (normalny)

X ∼ N (mX , ΣX ), ΣX > 0

1
pX (x) = p
(2π)n|ΣX |
 
1
× exp − (x − mX )TΣ−1
X (x − mX )
2

|ΣX | = 0: singular Gaussian distribution

note: the equidensity contours of a nonsingular


multivariate Gaussian distribution are ellipsoids
centered at the mean

P [X − mX ] T
Σ−1
X [X − mX ] ≤ 1 = 0.68

conditional distribution / rozklad warunkowy


 
X n
Z= ∼ N (mZ , ΣZ )
Y m
   
mX ΣX ΣXY
mZ = , ΣZ = >0
mY ΣY X ΣY
Vector random variables

 T

ΣXY = E (X − mX )(Y − mY ) = ΣT
YX

cross-covariance matrix between


random variables X and Y

[X|Y(ξ) = y)] ∼ N (mX|Y , ΣX|Y )


mX|Y = mX + ΣXY Σ−1
Y (y − mY )

ΣX|Y = ΣX − ΣXY Σ−1


Y ΣY X

higher-order moments / momenty wyższych rzȩdów


higher-order moments of jointly Gaussian variables can be
expressed in terms of the first order and second order
moments

EXAMPLE
Consider 4 zero-mean and jointly normally distributed
random variables X1, . . . , X4. It holds that

E[X1X2X3X4] = E[X1X2]E[X3X4] + E[X1X3]E[X2X4]


+ E[X1X4]E[X2X3] = σ12σ34 + σ13σ24 + σ14σ23

Suppose that W is a symmetric matrix. Show that


E[XXTWXXT] = ΣX tr{WΣX } + 2ΣX WΣX .
Linear transformations of vector random
variables
Liniowe przeksztalcenia wektorowych
zmiennych losowych

Xn×1, Ym×1

x = X(ξ), y = Y(ξ)

Suppose that

Y = AX + b, Am×n, bm×1
y = Ax + b

then

E[Y] = AE[X] + b
=⇒ mY = AmX + b
Y − mY = A(X − mX )

and

cov[Y] = E[A(X − mX )(X − mX )TAT]


=⇒ ΣY = AΣX AT

note: if X is Gaussian then Y is also Gaussian


Project 1

Record and mix signals obtained from two independent


speech sources (30 second long recordings, 2 linear mixtures
with different mixing coefficients).
Perform blind source separation using the FastICA
algorithm:
1. Center the data to make its mean zero.
2. Whiten the data to obtain uncorrelated mixture signals.
3. Initialize w0 = [w1,0, w2,0]T, ||w0|| = 1 (e.g. randomly)
4. Perform an iteration of a one-source extraction algorithm
n  T 3 o
e i = avg z(t) wi−1z(t)
w − 3wi−1

where z(t) = [z1(t), z2(t)]T denotes the vector of whitened


mixture signals and avg(·) denotes time averaging

N
1 X
avg{x(t)} = x(t)
N t=1

e i by dividing it by its norm


5. Normalize w

ei
w
wi =
e i||
||w
6. Determine a unit-norm vector vi orthogonal to wi.

wiTvi = 0, ||vi|| = 1

7. Display and listen to the current results of source


separation
    
b1(t)
x wiT z1(t)
= , t = 1, . . . , N
b2(t)
x viT z2(t)

8. If wiTwi−1 is not close enough to 1, go back to step 4.


Otherwise stop.

Literature: A. Hyvärinen, E. Oja. “A fast fixed-point


algorithm for independent component analysis”, Neural
Computation, vol. 9, pp. 1483-1492, 1997.

To obtain the highest grade (5), a procedure that extracts


n > 2 source signals from n mixtures must be implemented.
Facts about matrices

Fact 1

Given a square matrix An×n, an eigenvalue λ and its


associated eigenvector v are a pair obeying the relation

Av = λv

The eigenvalues λ1, . . . , λn of A are the roots of the


characteristic polynomial of A

det(A − λI)

For a positive definite matrix A it holds that λ1, . . . , λn > 0,


i.e., all eigenvalues are positive real.

Fact 2

Let A be a matrix with linearly independent eigenvectors


v1, . . . , vn. Then A can be factorized as follows

A = VΛV−1

where Λn×n = diag{λ1, . . . , λn} and Vn×n = [v1| . . . |vn].


Facts about matrices

Fact 3

Let A = AT be a real symmetric matrix. Such matrix has


n linearly independent real eigenvectors. Moreover, these
eigenvectors can be chosen such that they are orthogonal
to each other and have norm one

wiTwj = 0, ∀i 6= j
wiTwi = ||wi||2 = 1, ∀i

A real symmetrix matrix A can be decomposed as

A = WΛWT

where W is an orthogonal matrix: Wn×n = [w1| . . . |wn],


WWT = WTW = I.

The eigenvectors wi can be obtained by normalizing the


eigenvectors vi

vi
wi = , i = 1, . . . , n
||vi||
Facts about matrices

Fact 4
Consider a zero-mean vector random variable X with
covariance matrix

ΣX = E[XXT] = WΛWT

where Λ is a diagonal matrix made up of eigenvalues of ΣX


and W is an orthogonal matrix.
Let
Y = BX = WΛ−1/2WTX

Then it holds that


ΣY = I

i.e., Y is the vector random variable made up of uncorrelated


components.

E[YY T] = E[BXXTBT] = BE[XXT]BT


= WΛ−1/2WTE[XXT]WΛ−1/2WT
= WΛ−1/2WTWΛWTWΛ−1/2WT
= WΛ−1/2ΛΛ−1/2WT = WWT = I
How to estimate the covariance matrix ΣX
Jak estymować macierz kowariancji ΣX

Denote by x1, . . . , xn realizations of a zero-mean vector


random variable X. The estimate of ΣX can be obtained
from

X n
b X (n) = 1
Σ xi xT
i
n i=1
Discrete-time random (stochastic) processes
Procesy losowe (stochastyczne) z czasem
dyskretnym
{X(t, ξ), ξ ∈ Ξ, t ∈ T }, T = {. . . , −1, 0, 1, . . .}
X(t, ξ) = Xc(tc = tTs, ξ)

where t denotes the normalized (dimensionless) discrete


time – the number of sampling intervals Ts – and Xc(tc, ξ)
denotes a continuous-time random process

x(t) = X(t, ξ)

realization of a random process / realizacja procesu


losowego

Two interpretations of a random process

• a sequence of random variables

• an ensemble (collection, family) of realizations,


i.e., functions of time
Examples of random processes
Przyklady procesów losowych
Example 1: first-order autoregressive process

x(t) = ax(t − 1) + n(t)

where {n(t)} denotes a sequence of zero-mean uncorrelated


random variables with variance σn2 .
Example 2: random sinusoidal signal

x(t) = A sin(ωt + ϕ)

where ϕ, denoting initial phase, is a random variable with


uniform distribution over [0, 2π): ϕ ∼ U (0, 2π).
note: this is an example of a strictly deterministic
(singular) random process, perfectly predictable
from its past
Example 3: pseudo-random binary signal (PRBS)

x(t) ∈ {−A, A}, S1 : −A, S2 : A

transition probabilities:
P11 = 0.8, P12 = 0.2, P22 = 0.6, P21 = 0.4
Electroencephalographic (EEG) signals
Sygnaly elektroencefalograficzne (EEG)

H. Berger, “Uber das Elektroenzephalogramm des


Menschen”, Arch. Psychiatr. Nervenkrankheiten, vol. 87,
pp. 527–570, 1929.

Localization of electrodes on the patient’s scull

EEG signals carry information about the state of the brain


and its reaction to internal and external stimuli. Despite
advances made recently in the area of computer-aided
tomography (CAT), EEG remains a practically useful tool for
studying the functional states of the brain (e.g. sleep-stage
analysis) and for diagnosing functional brain disturbances
(e.g. epilepsy).
Electroencephalographic (EEG) signals

Example of normal (upper plots) and pathological


(lower plots) EEG recordings
According to the traditional clinical nomenclature
introduced in a pioneering work of Berger, EEG signals
are described and classified in terms of several rhythmic
activities called alpha (8–13 Hz), beta (14–30 Hz), delta
(1–3 Hz) and theta (4–7 Hz). The normal EEG of an adult
in a state called relaxed wakefulness is usually dominated by
rhythmic alpha activity and a less pronounced beta activity.
The contribution of delta and theta activities is arrhythmic
and much smaller.
Frequency analysis of EEG signals
Analiza czȩstotliwościowa sygnalów EEG

Spectral density function of a typical


EEG recording
Speech signals
Sygnaly mowy

Vocabulary Template

hello

yes

no

bye

A simple dictionary of speech templates for speech


recognition

Variation in speech templates for different speakers


Description of a random processes
Opis procesów losowych

To fully describe a random process, one should specify


all finite-dimensional (n ∈ N ) joint distributions of
random variables X(t1), . . . , X(tn) for all possible values
of t1, . . . , tn ∈ T .

n-dimensional cumulative distribution function

F (x1, . . . , xn; t1, . . . , tn)


= P (X(t1) < x1, . . . , X(tn) < xn)

n-dimensional probability density function

p(x1, . . . , xn; t1, . . . , tn)


∂ nF (x1, . . . , xn; t1, . . . , tn)
=
∂x1 · · · ∂xn

(assuming that all derivatives exist)


First-order and second-order characteristics
Charakterystyki pierwszego i drugiego rzȩdu

first-order characteristics: X(t)


first-order cumulative distribution and probability density
functions (characterize the “amplitude” distribution)

F (x; t), p(x; t)

mean value / wartość oczekiwana


Z ∞
mX (t) = E[X(t)] = x p(x; t)dx
−∞

second-order characteristics: X(t1), X(t2)


second-order cumulative distribution and probability
density functions (characterize the joint “amplitude”
distribution)

F (x1, x2; t1, t2), p(x1, x2; t1, t2)

autocorrelation function / funkcja autokorelacji

RX (t1, t2) = E[X(t1)X(t2)]


Z ∞Z ∞
= x1x2 p(x1, x2; t1, t2)dx1dx2
−∞ −∞
First-order and second-order characteristics

autocovariance function / funkcja autokowariancji

CX (t1, t2) = E{[X(t1) − mX (t1)][X(t2) − mX (t2)]}


= RX (t1, t2) − mX (t1)mX (t2)

variance / wariancja

2
σX (t) = var[X(t)] = E{[X(t) − mX (t)]2}
Z ∞
= [x(t) − mX (t)]2p(x; t)dx = CX (t, t)
−∞

normalized autocovariance function / unormowana funkcja


autokowariancji

CX (t1, t2)
ρX (t1, t2) =
σX (t1)σX (t2)
−1 ≤ ρX (t1, t2) ≤ 1
Stationary random processes
Stacjonarne procesy losowe

Qualitative (but not precise) description: a random process


is said to be stationary (time stationary) if its density
functions are invariant under a translation of time
strict sense stationarity / stacjonarność w ścislym sensie
A discrete-time random process X(t) is strict sense
stationary if the joint distribution of random variables
X(t1), . . . , X(tn) is identical with the joint distribution of
random variables X(t1 + ∆t), . . . , X(tn + ∆t) for any
values of n ∈ N , ∆t ∈ T and t1, . . . , tn ∈ T :

p(x1, . . . , xn; t1, . . . , tn)


= p(x1, . . . , xn; t1 + ∆t, . . . , tn + ∆t)

consequences of invariance to a time shift

p(x; t) = p(x; t + ∆t)


=⇒ p(x; t) = p(x)

p(x1, x2; t1, t2) = p(x1, x2; t1 + ∆t, t2 + ∆t)


=⇒ p(x1, x2 ; t1, t2) = p(x1, x2; τ )

where τ = t2 − t1.
Stationary random processes

using the first property one obtains

mX (t) = mX

using the second property one obtains

RX (t1, t2) = RX (τ )
CX (t1, t2) = CX (τ )
2 2
σX (t) = CX (0) = σX

wide sense stationarity / stacjonarność w szerokim sensie

A discrete-time random process X(t) is wide sense


stationary if its mean function is constant and its
autocorrelation function is a function only of τ

mX (t) = mX , RX (t1, t2) = RX (τ )

note: all strict sense stationary random processes are also


wide sense stationary, provided that their mean and
autocorrelation functions exist. The converse is not true.
Ergodic processes
Procesy ergodyczne
Qualitative (but not precise) description: a random process
is said to be ergodic if its statistical properties (such as its
mean and autocorrelation function) can be deduced from a
single, sufficiently long sample (realization) of the process.
One can discuss the ergodicity of various properties of a
random process.
mean-square ergodicity in the first-order moment /
ergodyczność, w sensie średniokwadratowym, wzglȩdem
wartości oczkiwanej

N
1 X
mX = E[X(t)] = lim X(t, ξi)
N →∞ N
i=1
ensemble average
T
1X
b X (T, ξ) =
m X(t, ξ)
T t=1
time average

What are the conditions under which


 2

lim E [m b X (T, ξ) − mX ] = 0 ?
T →∞
Ergodic processes
THEOREM
The necessary and sufficient condition of a mean-square
ergodicity in the first moment of a wide sense stationary
process X(t) has the form
T T
1 XX
lim 2 CX (t2 − t1) = 0
T →∞ T
t =1 t =1
1 2

PROOF
" #2
  1X
T 
2
b X (T ) − mX ]
E [m =E X(t) − mX
 T 
t=1
 
1 X T X T 
=E [X(t1) − mX ][X(t2) − mX ]
T 2 
t =1 t =1
1 2

T T
1 XX
= 2 E {[X(t1) − mX ][X(t2) − mX ]}
T t =1 t =1
1 2

T T T T
1 XX 1 XX
= 2 CX (t1, t2) = 2 CX (t2 − t1)
T t =1 t =1 T t =1 t =1
1 2 1 2

sufficient condition

lim |CX (τ )| = 0
τ →∞
Ergodic processes
mean-square ergodicity in the second-order moments /
ergodyczność, w sensie średniokwadratowym, wzglȩdem
momentów drugiego rzȩdu

RX (τ ) = E[X(t)X(t + τ )]
N
1 X
= lim X(t; ξi)X(t + τ ; ξi)
N →∞ N
i=1

XT
bX (τ, T, ξ) = 1
R X(t, ξ)X(t + τ, ξ)
T t=1

What are the conditions under which


n o
lim E [R bX (τ, T, ξ) − RX (τ )]2 = 0 ?
T →∞

for Gaussian processes the sufficient condition is

lim |CX (τ )| = 0
τ →∞

note: a process which is ergodic in the first and second


moments is sometimes called ergodic in the wide sense
example of a process that is not ergodic:
X(t, ξ) = A(ξ), A ∼ U [0, 1]
Properties of the autocorrelation function
Wlasności funkcji autokorelacji

The autocorrelation function of a wide sense stationary


random process X(t)

• is an even function, that is, RX (τ ) = RX (−τ ), ∀τ

• is maximum at the origin, that is, |RX (τ )| ≤ RX (0), ∀τ

• is a nonnegative definite function, that is, for any n ∈ N ,


any selection of t1, . . . , tn ∈ T , and any selection of
z1, . . . , zn ∈ R, it holds that

n X
X n
zizj RX (tj − ti) ≥ 0
i=1 j=1

which is equivalent to

F[RX (τ )] ≥ 0

where F[·] denotes discrete Fourier transform (DFT)

note: the nonnegative definiteness condition


is equivalent to R = [RX (tj − ti)]n×n ≥ 0
RX (τ ) = E[X(t)X(t + τ )] = E[X(t + τ )X(t)] = RX (−τ )

E[(X(t) − X(t + τ ))2 ] = E[X 2(t)] + E[X 2(t + τ )]


− 2E[X(t)X(t − τ )] = 2RX (0) − 2RX (τ ) ≥ 0

   
X(t1) z1
X= .. , z =  .. 
X(tn) zn

R = E[XXT]
 
RX (t1 − t1) RX (t2 − t1) . . . RX (tn − t1)
 RX (t1 − t2) 

= .. ... .. 

RX (t1 − tn) . . . RX (tn − tn)

RX (ti, tj ) = E[X(ti)X(tj )] = RX (tj − ti)

n X
X n
zizj RX (tj − ti) = zTRz = zTE[XXT]z
i=1 j=1

= E[zTXXTz] = E[(zTX)2] ≥ 0
Introduction to spectral analysis
Wprowadzenie do analizy widmowej

Consider a continuous-time band-limited periodic signal

xc(tc) = sin(2πfctc) = sin ωctc, fc ≤ F0

where tc ∈ R denotes continuous time measured in seconds


[sec] and
fc - frequency of a continuous-time signal
expressed in cycles per second [Hz]
ωc - angular frequency of a continuous-time signal
expressed in radians per second [rad/sec]
Suppose that the signal xc(tc) is sampled with the frequency
Fs = 1/Ts ≥ 2F0

x(t) = xc(tc = tTs) = sin(2πf t) = sin(ωt)

where t ∈ C denotes normalized discrete time measured in


samples [sa] and
f - normalized frequency / unormowana czȩstotliwość
expressed in cycles per sample [c/sa]
ω - normalized angular frequency / unormowana pulsacja
expressed in radians per sample [rad/sa]
 
fc 1 1 ωc
f= ∈ − , , ω= ∈ [−π, π]
Fs 2 2 Fs
Introduction to spectral analysis

energy spectrum of a deterministic signal /


widmo energii sygnalu deterministycznego

Let {x(t)} be a finite energy deterministic signal


X
Ex = x2(t) < ∞
t=−∞

Then {x(t)} posesses a discrete-time Fourier transform


(DFT)
X∞
X(ω) = x(t)e−jωt
t=−∞

φX (ω) = |X(ω)|2

energy spectral density / gȩstość widmowa energii


describes distribution of the signal energy over different
frequency bands

Such interpretation stems from the fact that, according to


the Parseval’s theorem
Z π ∞
X
1
φX (ω)dω = x2 (t) = Ex
2π −π t=−∞
Introduction to spectral analysis

naı̈ve extension to random processes /


naiwne rozszerzenie na procesy losowe

Consider a zero-mean wide sense stationary and ergodic


random process X(t, ξ). It holds that

N
1 X 2 2
lim X (t, ξ) = σX
N →∞ N
t=1

power (mean energy) / moc (średnia energia)

Can we define power spectral density in the form analogous


to the energy spectral density

SX (ω) = lim PX (ω, ξ; N ) ?


N →∞

where N 2
1 X
−jωt
PX (ω, ξ; N ) = X(t, ξ)e
N t=1

periodogram / periodogram

No, in the general case the limit above does not exist and/or
it is realization-dependent, i.e., it is a random variable !
Power spectral density
Widmowa gȩstość mocy
power spectral density shows distribution of the signal
power over different frequency bands
Definition 1

SX (ω) = lim E[PX (ω, ξ; N )]


N →∞

mean periodogram

Definition 2 (Wiener - Khintchine)


X
SX (ω) = F[RX (τ )] = RX (τ )e−jωτ
τ =−∞

Fourier transform of the autocorreletion function

THEOREM
Both definitions given above are equivalent provided that
the autocorrelation function of {X(t)} decays sufficiently
rapidly with growing lag, namely
N
1 X
lim |τ |RX (τ ) = 0
N →∞ N
τ =−N
Power spectral density

PROOF

lim E[PX (ω, ξ; N )]


N →∞
 2

1 XN

= lim E  X(t)e−jωt 
N →∞ N t=1

N N
1 XX
= lim E[X(t1)X(t2)]e−jω(t1−t2)
N →∞ N
t =1 t =1
1 2

N
X −1
1
= lim (N − |τ |)RX (τ )e−jωτ
N →∞ N
τ =−(N −1)

X N
X −1
1
= RX (τ )e−jωτ − lim |τ |RX (τ )
N →∞ N
τ =−∞ τ =−(N −1)

X
= RX (τ )e−jωτ = SX (ω)
τ =−∞

note: power spectral density function carries the same


information about a random process as its autocorrelation
function because
Z π
1
RX (τ ) = F −1 [SX (ω)] = SX (ω)ejωτ dω
2π −π
Properties of power spectral density function
Wlasności funkcji widmowej gȩstości mocy

Z π
2 1
σX = SX (ω)dω
2π −π

The power spectral density function of a wide sense


stationary random process X(t)

• is an even function, that is, SX (ω) = SX (−ω), ∀ω

• is a nonnegative function, that is, SX (ω) ≥ 0, ∀ω

• is a periodic function, namely SX (ω + 2kπ) = SX (ω),


∀k ∈ C

interpretation of different frequency bands:


ω ∈ [0, π] - region of physical significance
ω ∈ [−π, 0) - symmetric fold
|ω| > π - periodic extension
DFT-free assessment of spectral density
function

woman, age 19, healthy

Doctor’s evaluation: persistent, regular alpha


activity with a dominant 9 Hz component

DFT-free approaches:

• Count cycles per second; take into account the relative


amplitudes of different rhythmic components [qualitative
assessment].

• Pass the analyzed signal through a bank of narrowband


bandpass filters with different center frequencies ωi ∈
[0, π], i = 1, . . . , k. Evaluate SX (ωi) as the mean power
of the signal observed at the output of the i-th filter
[quantitative assessment].
Power cepstrum of a signal
Cepstrum mocy sygnalu
A cepstrum is the result of taking the Fourier transform
of the log signal spectrum. Effectively we are treating the
signal spectrum as another signal and looking for periodicity
in the spectrum itself. The term cepstrum was derived by
reversing the first four letters of “spectrum”. The cepstrum
is so-called because it turns the spectrum inside-out. The
x-axis of the cepstrum has units of quefrency, and peaks in
the cepstrum (which relate to periodicities in the spectrum)
are called rahmonics.
Z π
1
KX (q) = F −1 [log |X(ω)|] = log |X(ω)|ejωq dω
2π −π

real cepstrum / cepstrum rzeczywiste


Relationships between random processes
Zwia̧zki pomiȩdzy procesami losowymi

X(t, ξ) ←→ Y (t, ξ)

crosscorelation function / funkcja korelacji wzajemnej

RXY (t1, t2) = E{X(t1)Y (t2)}

cross-covariance function / funkcja kowariancji wzajemnej

CXY (t1, t2) = E{[X(t1) − mX (t1)][Y (t2) − mY (t2)]}


= RXY (t1, t2) − mX (t1)mY (t2)

normalized cross-covariance function / unormowana


funkcja kowariancji wzajemnej

CXY (t1, t2)


ρ(t1, t2) = ∈ [−1, 1]
σX (t1)σY (t2)

uncorrelated processes / procesy nieskorelowane

CXY (t1, t2) = 0, ∀t1, t2

orthogonal processes / procesy ortogonalne

RXY (t1, t2) = 0, ∀t1, t2


Jointly wide sense stationary
random processes
Procesy losowe stacjonarne la̧cznie w
szerokim sensie

Processes X(t) and Y (t) are called jointly wide sense


stationary if each of them is wide sense stationary and

RXY (t1, t2) = RXY (τ )

where τ = t2 − t1.

For jointly wide sense stationary processes X(t) and Y (t)


one can define

cross spectral density function / wzajemna widmowa


gȩstość mocy


X
SXY (ω) = F[RXY (τ )] = e−jωτ RXY (τ )
τ =−∞

note: since in general RXY (τ ) 6= RY X (τ ), the cross


spectral density function SXY (ω) is complex-valued
Independent random processes
Procesy losowe niezależne

Two discrete-time random processes X(t) and Y (t) are


statistically independent if the group of random variables
X(t1), . . . , X(tn) is statistically independent of random
variables Y (s1), . . . , Y (sm) for any values of n, m ∈ N ,
t1, . . . , tn ∈ T and s1, . . . , sm ∈ T :

p(x1, . . . , xn, y1 , . . . , ym; t1, . . . , tn, s1, . . . , sm)


= p(x1, . . . , xn; t1, . . . , tn)
× p(y1, . . . , ym; s1, . . . , sm)

note: two jointly Gaussian and mutually uncorrelated


random processes are independent
Relationships for the sum of two
random processes
Zależności dla sumy dwóch
procesów losowych

Suppose that X(t) and Y (t) are jointly wide sense stationary
random processes and

Z(t) = X(t) + Y (t)

Then

RZ (τ ) = RX (τ ) + RY (τ ) + RXY (τ ) + RY X (τ )
SZ (τ ) = SX (τ ) + SY (τ ) + SXY (τ ) + SY X (τ )

When the processes X(t) and Y (t) are also orthogonal, it


holds that

RZ (τ ) = RX (τ ) + RY (τ )
SZ (τ ) = SX (τ ) + SY (τ )
Linear transformations of random processes
Liniowe przeksztalcenia procesów losowych

Y (t, ξ) = L[X(t, ξ)]

mX (t), RX (t1, t2) −→ mY (t), RY (t1, t2), RXY (t1, t2)

condition of linearity:
L[k1x1 (t) + k2x2(t)] = k1L[x1 (t)] + k2L[x2(t)]
∀ t, k1, k2, {x1(·)}, {x2(·)}

t
X
y(t) = k(i)x(t − i)
i=0
{k(i), i = 1, 2, . . .} – impulse response
of a dynamic system

" t
#
X
mY (t) = E[Y (t, ξ)] = E k(i)X(t − i, ξ)
i=0
t
X t
X
= k(i)E[X(t − i, ξ)] = k(i)mX (t − i)
i=0 i=0
Linear transformations of random processes
In an analogous way one can show that:

RXY (t1, t2) = E[X(t1, ξ)Y (t2, ξ)]


t2
X
= k(i)RX (t1, t2 − i)
i=0

RY (t1, t2) = E[Y (t1, ξ)Y (t2, ξ)]


t1 X
X t2
= k(i)k(j)RX (t1 − i, t2 − j)
i=0 j=0

steady state relationships for a wide sense stationary


process X(t): mX (t) = mX , RX (t1, t2) = RX (τ )

X
y(t) = k(i)x(t − i)
i=0


X
mY = mX k(i)
i=0

X
RXY (τ ) = k(i)RX (τ − i)
i=0
X∞ X∞
RY (τ ) = k(i)k(j)RX (τ + i − j)
i=0 j=0

note: the output process Y (t) is also wide sense stationary


Linear time-invariant systems
Uklady liniowe niezmiennicze w czasie
conditions of preservation of stationarity
at the output of a linear system

THEOREM
Suppose that a linear system L[·] is excited by a strictly
(or wide sense) stationary random process X(t). When the
system is time-invariant, i.e., the relationship y(t) = L[x(t)]
entails

y(t + ∆t) = L[x(t + ∆t)], ∀t, ∆t, {x(·)}

then the output random signal Y (t) is also strictly (or wide
sense) stationary.

Pt
The system governed by y(t) = L[x(t)] = i=0 k(i)x(t−i)
is not time-invariant because
t+∆t
X
y(t + ∆t) = k(i)x(t + ∆t − i)
i=0
t
X
6= L[x(t + ∆t)] = k(i)x(t + ∆t − i)
i=0

Show
P∞ that the system governed by y(t) = L[x(t)] =
i=0 k(i)x(t − i) is time-invariant.
Relationships between spectral
density functions
Zwia̧zki pomiȩdzy funkcjami widmowej
gȩstości mocy
Suppose that X(t) is a wide sense stationary random process
with a spectral density function SX (ω) and Y (t) is the
process observed at the output of a linear time-invariant
system
X∞
y(t) = k(i)x(t − i)
i=0

Derive expressions for SXY (ω) and SY (ω).


X
SX (ω) = RX (τ )e−jωτ
τ =−∞


X
SXY (ω) = RXY (τ )e−jωτ
τ =−∞

X ∞
X
= e−jωτ k(s)RX (τ − s)
τ =−∞ s=0

" ∞
#
X X
= e−jωsk(s) e−jω(τ −s) RX (τ − s)
s=0 τ =−∞

X
= e−jωsk(s)SX (ω) = K(e−jω )SX (ω)
s=0
Relationships between spectral
density functions
where ∞
X
K(e−jω ) = e−jωsk(s)
s=0

denotes transfer function of an excited system.



X
SY (ω) = RY (τ )e−jωτ
τ =−∞

X ∞ X
X ∞
= e−jωτ k(s1)k(s2)RX (τ − s2 + s1)
τ =−∞ s1 =0 s2 =0

X ∞
X
= ejωs1 k(s1) e−jωs2 k(s2)
s1 =0 s2 =0
" ∞
#
X
× e−jω(τ −s2+s1)RX (τ − s2 + s1)
τ =−∞

−jω 2
= K(e )K(e jω −jω
)SX (ω) = K(e ) SX (ω)

summary:

SXY (ω) = K(e−jω )SX (ω)



−jω 2

SY (ω) = K(e ) SX (ω)
Spectral subtraction
Metoda odejmowania widmowego

Goal:

Recover signal x(t) based on its noisy measurements y(t)

y(t) = x(t) + z(t)

Assumptions:

1. X(t) (recovered signal) and Z(t) (noise) are locally


wide-sense stationary random processes.

2. Processes X(t) and Z(t) are orthogonal.

3. Spectral density function of noise SZ (ω) is known or


can be estimated based on those fragments of Y (t) that
contain noise only.

S.F. Böll, “Suppression of acoustic noise in speech using


spectral subtraction”, IEEE Transactions on Acoustics,
Speech and Signal Processing, vol. 27, pp. 113-120, 1979.
Spectral subtraction

note: our perception of sounds depends primarily on their


spectral content – two sounds with the same spectral
density functions are perceived as (almost) identical

Under the assumptions made it holds that (locally)

SY (ω) = SX (ω) + SZ (ω)

Can one design a linear filter K(z −1) that would transform
a process Y (t) with spectral density function SY (ω) into
the process X(t) with spectral density function SX (ω)?


−jω 2

SX (ω) = K(e ) SY (ω)

Therefore the amplitude characteristic of the transforming


filter can be obtained from
s
SX (ω)

A(ω) = K(e−jω
) =
SY (ω)
s
SY (ω) − SZ (ω)
=
SY (ω)

power spectral subtraction


Discrete Fourier transform (DFT)
Dyskretna transformata Fouriera

N
X −1
X(ωi) = x(t)e−jωi t
t=0
2πi
ωi = , i = 0, . . . , N − 1
N

{X(ω0), . . . , X(ωN −1)}


= DFT{x(0), . . . , x(N − 1)}

Inverse Discrete Fourier Transform (IDFT)

N −1
1 X
x(t) = X(ωi)ejωit
N i=0
t = 0, . . . , N − 1

{x(0), . . . , x(N − 1)}


= IDFT{X(ω0), . . . , X(ωN−1)}

(Inverse) Fast Fourier Transform

N = 2k −→ FFT, IFFT
Power spectral subtraction
The signal is divided into frames (segments) of Identical
widths N (N should be short enough to guarantee local
signal stationarity, e.g., in case of speech signals it should
cover no more than 10-20 ms of speech). Each frame is
“denoised” separately and the results are combined for the
entire recording. Consider any frame (to simplify notation
the number of the frame was suppressed)

{y(0), . . . , y(N − 1)}

Step 1
Use a fragment of the signal that contains noise only
[y(t) = z(t)] to estimate the power spectral density of the
noise
 
1 2 N
SbZ (ωi) = |Z(ωi)| , i = 0, . . . ,
N 2

If many noise frames are available, apply averaging.


Step 2
Estimate the power spectral density function of the noisy
signal
 
1 N
SbY (ωi) = |Y (ωi)|2 , i = 0, . . . ,
N 2
Step 3
Estimate the power spectral density function of the noiseless
signal

b SbY (ωi) − SbZ (ωi) when nonnegative
SX (ωi) =
0 otherwise
 
N
i = 0, . . . ,
2
Step 4
Design denoising filter

s  
b i) = SbX (ωi) N
A(ω , i = 0, . . . ,
b
SY (ωi) 2
 
b i) = A(ω
b N −1−i ), N
A(ω i= + 1, . . . , N − 1
2
Step 5
Evaluate denoised signal

b i) = A(ω
X(ω b i)Y (ωi), i = 0, . . . , N − 1

{b b(N − 1)}
x(0), . . . , x
b 0), . . . , X(ω
= IDFT{X(ω b N−1)}

Read next frame and return to Step 2.


Remarks and extensions

1. When using spectral subtraction one modifies the


amplitude distribution of noisy speech in different
frequency bands, but preserves its phase distribution.

2. When intensity of wideband noise corrupting speech is


large, the results of denoising may suffer from the so-
called musical noise (tin-like sound consisting of many
short-lived narrowband components appearing at random
frequencies). Musical noise can be reduced by eliminating
tones (i.e., periodogram spectral lines) that appear only
in isolated segments.

3. To eliminate signal discontinuities that may occur at the


beginning/end of each denoised speech segment, one
may use the overlap-add technique, e.g. for segments of
length N = 2n + 1 and 50% overlap one can use the
following data windows

|t − n|
w(t) = 1 − , t = 0, . . . , 2n
n
or  
1 πt
w(t) = 1 − cos , t = 0, . . . , 2n
2 n

w(t) + w(t − n) = 1, t = n, . . . , 2n
Overlap-add synthesis

b(t)w(t) t = 0, . . . , 2n
x
bw (t) =
x
0 elsewhere
bkw (t) = x
x bw (t), t = kn, . . . , (k + 2)n
k−1
b(t) = . . . + x
x bw bkw (t) + x
(t) + x bk+1
w (t) + . . .

4. Generalized spectral subtraction


h i 1/β
 |Y (ωi)|β − α|Z(ωi)|β 
 
b i) =
X(ω +
Y (ωi)

 |Y (ωi )|β 

where β ≥ 1, α > 0 is the user-dependent coefficient,


(·) denotes time averaging, and [·]+ denotes “round to
nonnegative” operation.

β = 1 −→ magnitude spectral substraction


β = 2 −→ power spectral substraction

NOTE: in practice one often takes α > 1.


Project 2

Create a noisy audio recording (add artificially generated


noise to a clean music or speech signal). The first second
of the recording should contain noise only. Then denoise
recording using the method of spectral subtraction.
THE END
The longest run of heads and tails

Mark F. Schilling, “The Longest Run of Heads,”


Mathematical Association of America

http://shiny.stat.calpoly.edu/LongRun/

You might also like