Professional Documents
Culture Documents
Gdansk University
of Technology
Random Processes –
Theory for the Practician
by
Maciej Niedźwiecki
Gdańsk, Poland
Does God play dice ?
Czy Bóg gra w kości ?
“Quantum mechanics is certainly imposing. But an inner
voice tells me that it is not yet the real thing. The theory
says a lot, but does not really bring us any closer to the
secret of the ”old one.” I, at any rate, am convinced that
He does not throw dice.”
Albert Einstein
Yes, He does !
to get the modern view watch Richard Feynman’s
The Douglas Robb Memorial Lectures 1979
http://vega.org.uk/video/subseries/8
X : Ξ −→ ΩX ⊆ R, {X(ξ), ξ ∈ Ξ}
where:
Ξ - sample space / zbiór zdarzeń elementarnych
ξ - outcome / zdarzenie elementarne
ΩX - observation space / zbiór obserwacji
FX (x) = P (X ≤ x)
P (·), 0≤P ≤1
probability of an event / prawdopodobieństwo zdarzenia
Properties:
• FX (−∞) = 0, FX (∞) = 1
• 0 ≤ FX (x) ≤ 1
• if x1 < x2 then
Properties:
• pX (x) ≥ 0
Z x
• FX (x) = pX (z)dz
−∞
Z ∞
• pX (x)dx = 1
−∞
Z x2
• P (x1 < X ≤ x2) = pX (z)dz
x1
Is it possible that
pX (x) > 1 ?
P (X = x) 6= 0 ?
Characteristics of random variables
Charakterystyki zmiennych losowych
median / mediana
αX = med[X] : P (X ≤ αX ) = P (X ≥ αX )
Z αX
1
= pX (x)dx =
−∞ 2
variance / wariancja
2
2
σX= var[X] = E (X − mX )
Z ∞ Z ∞
= (x − mX )2 dFX (x) = (x − mX )2 pX (x)dx
−∞ −∞
γX < 0 γX > 0
κX = 3 κX = 0 κX = −1.5
Super-Gaussian (leptocurtic) [left], Gaussian [middle] and
sub-Gaussian (platycurtic) [right] probability density
functions.
Important random variables
Ważne zmienne losowe
a+b 2 (b−a)2
mX = 2 , σX = 12
MATLAB : U (0, 1) = rand
2 2
X ∼ N (mX , σX ), σX >0
2
1 (x − mX )
pX (x) = p exp − 2
2πσX 2 2σX
Z x
1 (z − mX )2
FX (x) = p exp − 2 dz
−∞
2
2πσX 2σ X
x − mX
=Φ
σX
√ Rx
where Φ(x) = (1/ 2π) −∞ exp{−z 2/2}dz
MATLAB : N (0, 1) = randn
THEOREM
n
1 X Xi − mX
Zn = √
n i=1 σX
n
!
1 X Xi − mX
lim pZn √ = N (0, 1)
n→∞ n i=1
σX
2
mX = a , σX = b2/2
b/π
pX (x) =
b2 + (x − a)2
1 1 x−a
FX (x) = + tan−1
2 π b
2
mX =? , σX =?
1
X ∼ C(0, 1) : pX (x) =
π(1 + x2)
mean value:
Z ∞ Z ∞
x pX (x)dx = x pX (x)dx
−∞ 0
Z 0
− |x| pX (x)dx = ∞ − ∞
−∞
variance:
Z ∞ 2 Z ∞
1 x 1
E[X 2] = 2
dx = dx − 1 = ∞
π −∞ 1 + x π −∞
Pairs of random variables
Pary zmiennych losowych
X(ξ) ←→ Y (ξ)
FXY (x, y) = P (X ≤ x, Y ≤ y)
Properties:
• FXY (∞, ∞) = 1
• 0 ≤ FXY (x, y) ≤ 1
pXY (x, y)
P (x ≤ X < x + ǫx, y ≤ Y < y + ǫy )
= lim
ǫx →0,ǫy →0 ǫx ǫy
∂2
= FXY (x, y)
∂x∂y
Z ∞ Z ∞
• pXY (x, y)dxdy = 1
−∞ −∞
• if x1 < x2 and y1 < y2 then
X|Y (ξ) = y
pXY (x, y)
pX|Y (x|y) =
pY (y)
= rXY − mX mY
e = X − mX , Ye = Y − mY
X
σX σY
2 2
mXe = mYe = 0 , σX
e = σY
e = 1
e Ye ] = cXY
ρXY = E[X
σX σY
−1 ≤ ρXY ≤ 1
Pairs of random variables
THEOREM
PROOF
1
pXY (x, y) = p ×
2πσX σY 1 − ρXY 2
2 2
x−m X
− 2ρ x−mX y−mY
+ y−mY
σX XY σX σY σY
exp −
2(1 − ρ2XY )
2
if X ∼ N (mX , σX ) then
P (X ∈ [mX − σX , mX + σX ]) = 0.68
2
if (X, Y ) ∼ N (mX , mY ; σX , σY2 , ρXY ) then
P [(X, Y ) ∈ DXY ] = 0.68
2
[X|Y (ξ) = y] ∼ N (mX|Y , σX|Y )
σX
mX|Y = mX + ρXY (y − mY )
σY
2
σX|Y = (1 − ρ2XY )σX
2
Jointly Gaussian random variables
note: two uncorrelated jointly Gaussian variables are
independent
if ρXY = 0 then
2 2
x−mX
+ y−mY
1 σX σY
pXY (x, y) = exp −
2πσX σY
2
1 (x − mX )2
=p exp − 2
2
2πσX 2σX
2
1 (y − mY )
×p exp −
2πσY2 2σY2
= pX (x)pY (y)
EXAMPLE
Let X ∼ N (0, 1). Define Y in the form
−X if |X| < c
Y = c>0
X if |X| ≥ c
Jointly Gaussian random variables
Note that
P (Y ≤ x) =
P ({|X| < c and − X < x} or {|X| ≥ c and X < x})
= P (|X| < c and − X < x) + P (|X| ≥ c and X < x)
= P (|X| < c and X < x) + P (|X| ≥ c and X < x)
= P (X < x)
b
θ(n) b n)
= θ(X
Consistency / Zgodność
b
The estimator θ(n) is called consistent (estymator zgodny )
if it converges, in a stochastic sense (e.g. with probability
one), to the true parameter value as the number of data
points increases to infinity
b
lim θ(n) = θ w.p.1
n→∞
Stochastic convergence modes
Typy zbieżności stochastycznej
Convergence in probability
Zbieżność wedlug prawdopodobieństwa
lim P |θ(n)b − θ| > ε = 0
n→∞
b X (n) is unbiased.
which means that the estimator m
Variance / Wariancja
1
Pn 2
n i=1(xi − mX ) if mX is known
2
bX
σ (n) =
1 Pn 2
n−1 i=1 [xi − b
m X (n)] otherwise
Note that
2
EXn σbX (n)
" n # n
1X 1 X
=E (Xi − mX )2 = E[(Xi − mX )2] = σX
2
n i=1 n i=1
FX (x) = P (X ≤ x)
1 if Xi ≤ x
Yi(x) =
0 if Xi > x
n
b 1X
FX (x) = Yi(x)
n i=1
−1
x1 a11 a12 y1
=
x2 a21 a22 y2
sources:
X1 ∼ U (−0.5, 0.5), X2 ∼ U (−0.5, 0.5)
2 2
mX1 = mX2 = 0, σX 1
= σX 2
= 1/12, cX1X2 = 0
mixtures:
Y1 = √1 X1 + √12 X2 , Y2 = √1 X1 − √12 X2
2 2
mY1 = mY2 = 0, σY2 1 = σY2 2 = 1/12, cY1Y2 = 0
One of the possible approaches to ICA
Consider n > 1 sources and n mixtures
n
X
Yi = aij Xj , i = 1, . . . , n
j=1
Two-step procedure:
FastICA algorithm
Picture guide to ICA
X1
X = .. , x = X(ξ)
Xn n×1
ΣX ≥ 0 ⇐⇒ yTΣX y ≥ 0, ∀y
Vector random variables
(x − mX )TΣ−1
X (x − mX ) = 1
• ellipsoid centered at mX
? ? ? ?
ΣX <ΣY , ΣX ≤ΣY , ΣX >ΣY , ΣX ≥ΣY
T −1
D X = x : x ΣX x ≤ 1
T −1
D Y = y : y ΣY y ≤ 1
ΣX ≤ ΣY ⇐⇒ ΣY − ΣX ≥ 0 ⇐⇒ DX ⊆ DY
X ∼ N (mX , ΣX ), ΣX > 0
1
pX (x) = p
(2π)n|ΣX |
1
× exp − (x − mX )TΣ−1
X (x − mX )
2
T
ΣXY = E (X − mX )(Y − mY ) = ΣT
YX
EXAMPLE
Consider 4 zero-mean and jointly normally distributed
random variables X1, . . . , X4. It holds that
Xn×1, Ym×1
x = X(ξ), y = Y(ξ)
Suppose that
Y = AX + b, Am×n, bm×1
y = Ax + b
then
E[Y] = AE[X] + b
=⇒ mY = AmX + b
Y − mY = A(X − mX )
and
N
1 X
avg{x(t)} = x(t)
N t=1
ei
w
wi =
e i||
||w
6. Determine a unit-norm vector vi orthogonal to wi.
wiTvi = 0, ||vi|| = 1
Fact 1
Av = λv
det(A − λI)
Fact 2
A = VΛV−1
Fact 3
wiTwj = 0, ∀i 6= j
wiTwi = ||wi||2 = 1, ∀i
A = WΛWT
vi
wi = , i = 1, . . . , n
||vi||
Facts about matrices
Fact 4
Consider a zero-mean vector random variable X with
covariance matrix
ΣX = E[XXT] = WΛWT
X n
b X (n) = 1
Σ xi xT
i
n i=1
Discrete-time random (stochastic) processes
Procesy losowe (stochastyczne) z czasem
dyskretnym
{X(t, ξ), ξ ∈ Ξ, t ∈ T }, T = {. . . , −1, 0, 1, . . .}
X(t, ξ) = Xc(tc = tTs, ξ)
x(t) = X(t, ξ)
x(t) = A sin(ωt + ϕ)
transition probabilities:
P11 = 0.8, P12 = 0.2, P22 = 0.6, P21 = 0.4
Electroencephalographic (EEG) signals
Sygnaly elektroencefalograficzne (EEG)
Vocabulary Template
hello
yes
no
bye
variance / wariancja
2
σX (t) = var[X(t)] = E{[X(t) − mX (t)]2}
Z ∞
= [x(t) − mX (t)]2p(x; t)dx = CX (t, t)
−∞
CX (t1, t2)
ρX (t1, t2) =
σX (t1)σX (t2)
−1 ≤ ρX (t1, t2) ≤ 1
Stationary random processes
Stacjonarne procesy losowe
where τ = t2 − t1.
Stationary random processes
mX (t) = mX
RX (t1, t2) = RX (τ )
CX (t1, t2) = CX (τ )
2 2
σX (t) = CX (0) = σX
N
1 X
mX = E[X(t)] = lim X(t, ξi)
N →∞ N
i=1
ensemble average
T
1X
b X (T, ξ) =
m X(t, ξ)
T t=1
time average
PROOF
" #2
1X
T
2
b X (T ) − mX ]
E [m =E X(t) − mX
T
t=1
1 X T X T
=E [X(t1) − mX ][X(t2) − mX ]
T 2
t =1 t =1
1 2
T T
1 XX
= 2 E {[X(t1) − mX ][X(t2) − mX ]}
T t =1 t =1
1 2
T T T T
1 XX 1 XX
= 2 CX (t1, t2) = 2 CX (t2 − t1)
T t =1 t =1 T t =1 t =1
1 2 1 2
sufficient condition
lim |CX (τ )| = 0
τ →∞
Ergodic processes
mean-square ergodicity in the second-order moments /
ergodyczność, w sensie średniokwadratowym, wzglȩdem
momentów drugiego rzȩdu
RX (τ ) = E[X(t)X(t + τ )]
N
1 X
= lim X(t; ξi)X(t + τ ; ξi)
N →∞ N
i=1
XT
bX (τ, T, ξ) = 1
R X(t, ξ)X(t + τ, ξ)
T t=1
lim |CX (τ )| = 0
τ →∞
n X
X n
zizj RX (tj − ti) ≥ 0
i=1 j=1
which is equivalent to
F[RX (τ )] ≥ 0
X(t1) z1
X= .. , z = ..
X(tn) zn
R = E[XXT]
RX (t1 − t1) RX (t2 − t1) . . . RX (tn − t1)
RX (t1 − t2)
= .. ... ..
RX (t1 − tn) . . . RX (tn − tn)
n X
X n
zizj RX (tj − ti) = zTRz = zTE[XXT]z
i=1 j=1
= E[zTXXTz] = E[(zTX)2] ≥ 0
Introduction to spectral analysis
Wprowadzenie do analizy widmowej
∞
X
Ex = x2(t) < ∞
t=−∞
φX (ω) = |X(ω)|2
N
1 X 2 2
lim X (t, ξ) = σX
N →∞ N
t=1
where N 2
1 X
−jωt
PX (ω, ξ; N ) = X(t, ξ)e
N t=1
periodogram / periodogram
No, in the general case the limit above does not exist and/or
it is realization-dependent, i.e., it is a random variable !
Power spectral density
Widmowa gȩstość mocy
power spectral density shows distribution of the signal
power over different frequency bands
Definition 1
mean periodogram
∞
X
SX (ω) = F[RX (τ )] = RX (τ )e−jωτ
τ =−∞
THEOREM
Both definitions given above are equivalent provided that
the autocorrelation function of {X(t)} decays sufficiently
rapidly with growing lag, namely
N
1 X
lim |τ |RX (τ ) = 0
N →∞ N
τ =−N
Power spectral density
PROOF
N N
1 XX
= lim E[X(t1)X(t2)]e−jω(t1−t2)
N →∞ N
t =1 t =1
1 2
N
X −1
1
= lim (N − |τ |)RX (τ )e−jωτ
N →∞ N
τ =−(N −1)
∞
X N
X −1
1
= RX (τ )e−jωτ − lim |τ |RX (τ )
N →∞ N
τ =−∞ τ =−(N −1)
∞
X
= RX (τ )e−jωτ = SX (ω)
τ =−∞
Z π
2 1
σX = SX (ω)dω
2π −π
DFT-free approaches:
X(t, ξ) ←→ Y (t, ξ)
where τ = t2 − t1.
∞
X
SXY (ω) = F[RXY (τ )] = e−jωτ RXY (τ )
τ =−∞
Suppose that X(t) and Y (t) are jointly wide sense stationary
random processes and
Then
RZ (τ ) = RX (τ ) + RY (τ ) + RXY (τ ) + RY X (τ )
SZ (τ ) = SX (τ ) + SY (τ ) + SXY (τ ) + SY X (τ )
RZ (τ ) = RX (τ ) + RY (τ )
SZ (τ ) = SX (τ ) + SY (τ )
Linear transformations of random processes
Liniowe przeksztalcenia procesów losowych
condition of linearity:
L[k1x1 (t) + k2x2(t)] = k1L[x1 (t)] + k2L[x2(t)]
∀ t, k1, k2, {x1(·)}, {x2(·)}
t
X
y(t) = k(i)x(t − i)
i=0
{k(i), i = 1, 2, . . .} – impulse response
of a dynamic system
" t
#
X
mY (t) = E[Y (t, ξ)] = E k(i)X(t − i, ξ)
i=0
t
X t
X
= k(i)E[X(t − i, ξ)] = k(i)mX (t − i)
i=0 i=0
Linear transformations of random processes
In an analogous way one can show that:
∞
X
mY = mX k(i)
i=0
∞
X
RXY (τ ) = k(i)RX (τ − i)
i=0
X∞ X∞
RY (τ ) = k(i)k(j)RX (τ + i − j)
i=0 j=0
THEOREM
Suppose that a linear system L[·] is excited by a strictly
(or wide sense) stationary random process X(t). When the
system is time-invariant, i.e., the relationship y(t) = L[x(t)]
entails
then the output random signal Y (t) is also strictly (or wide
sense) stationary.
Pt
The system governed by y(t) = L[x(t)] = i=0 k(i)x(t−i)
is not time-invariant because
t+∆t
X
y(t + ∆t) = k(i)x(t + ∆t − i)
i=0
t
X
6= L[x(t + ∆t)] = k(i)x(t + ∆t − i)
i=0
Show
P∞ that the system governed by y(t) = L[x(t)] =
i=0 k(i)x(t − i) is time-invariant.
Relationships between spectral
density functions
Zwia̧zki pomiȩdzy funkcjami widmowej
gȩstości mocy
Suppose that X(t) is a wide sense stationary random process
with a spectral density function SX (ω) and Y (t) is the
process observed at the output of a linear time-invariant
system
X∞
y(t) = k(i)x(t − i)
i=0
∞
X
SX (ω) = RX (τ )e−jωτ
τ =−∞
∞
X
SXY (ω) = RXY (τ )e−jωτ
τ =−∞
∞
X ∞
X
= e−jωτ k(s)RX (τ − s)
τ =−∞ s=0
∞
" ∞
#
X X
= e−jωsk(s) e−jω(τ −s) RX (τ − s)
s=0 τ =−∞
∞
X
= e−jωsk(s)SX (ω) = K(e−jω )SX (ω)
s=0
Relationships between spectral
density functions
where ∞
X
K(e−jω ) = e−jωsk(s)
s=0
summary:
Goal:
Assumptions:
Can one design a linear filter K(z −1) that would transform
a process Y (t) with spectral density function SY (ω) into
the process X(t) with spectral density function SX (ω)?
−jω 2
SX (ω) = K(e ) SY (ω)
N
X −1
X(ωi) = x(t)e−jωi t
t=0
2πi
ωi = , i = 0, . . . , N − 1
N
N −1
1 X
x(t) = X(ωi)ejωit
N i=0
t = 0, . . . , N − 1
N = 2k −→ FFT, IFFT
Power spectral subtraction
The signal is divided into frames (segments) of Identical
widths N (N should be short enough to guarantee local
signal stationarity, e.g., in case of speech signals it should
cover no more than 10-20 ms of speech). Each frame is
“denoised” separately and the results are combined for the
entire recording. Consider any frame (to simplify notation
the number of the frame was suppressed)
Step 1
Use a fragment of the signal that contains noise only
[y(t) = z(t)] to estimate the power spectral density of the
noise
1 2 N
SbZ (ωi) = |Z(ωi)| , i = 0, . . . ,
N 2
s
b i) = SbX (ωi) N
A(ω , i = 0, . . . ,
b
SY (ωi) 2
b i) = A(ω
b N −1−i ), N
A(ω i= + 1, . . . , N − 1
2
Step 5
Evaluate denoised signal
b i) = A(ω
X(ω b i)Y (ωi), i = 0, . . . , N − 1
{b b(N − 1)}
x(0), . . . , x
b 0), . . . , X(ω
= IDFT{X(ω b N−1)}
|t − n|
w(t) = 1 − , t = 0, . . . , 2n
n
or
1 πt
w(t) = 1 − cos , t = 0, . . . , 2n
2 n
w(t) + w(t − n) = 1, t = n, . . . , 2n
Overlap-add synthesis
b(t)w(t) t = 0, . . . , 2n
x
bw (t) =
x
0 elsewhere
bkw (t) = x
x bw (t), t = kn, . . . , (k + 2)n
k−1
b(t) = . . . + x
x bw bkw (t) + x
(t) + x bk+1
w (t) + . . .
http://shiny.stat.calpoly.edu/LongRun/