Professional Documents
Culture Documents
ASP Lecture 1 Slides 2018
ASP Lecture 1 Slides 2018
c D. P. Mandic Advanced Signal Processing 1
Introduction Recap
Discrete Random Signals:
c D. P. Mandic Advanced Signal Processing 2
Standardisation and normalisation
(e.g. to be invariant of amplifier gain or the quality of sensor contact)
• Centred data: xC = x − µ
CS xC
• Centred and scaled data (normalised): x = σ (µ = 0, σ = 1)
Normalised data can be standardised to any mean µ̄ and variance σ̄ 2 by
ST xCS − µ̄
x =
σ̄
or bounded to any lower l and upper u bounds by
x(n) − xmin
xST = (u − l) +l
xmax − xmin
x(n)−xmin
Standardize to zero mean and range [−1, 1] # x(n) = 2 xmax −xmin −1
c D. P. Mandic Advanced Signal Processing 3
Standardisation: Example 1
Autocorrelation under centering and normalisation
T
Consider AR(2) signal with parameters: a = [0.2, −0.1]
AR(2) signal ACF of signal
Signal value
ACF value
120
2 100
80
60
0 40
20
-2
0 20 40 60 80 100 0 20 40 60 80 100
Sample number ACF lag
ACF value
50 50
0 0
0 20 40 60 80 100 0 20 40 60 80 100
ACF lag ACF lag
For σ < 1, normalisation will increase the magnitude the ACF, e.g. for this
example σ = 0.5.
c D. P. Mandic Advanced Signal Processing 4
Standardisation: Example 2
The bars denote the amplitudes of the samples of signals x1-x4
For the raw measurements: {x1[n], x2[n], x3[n], x4[n]}n=1:N
0 0 0
−5 −5 −5
x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4
c D. P. Mandic Advanced Signal Processing 5
How do we describe a signal, statistically?
Probability distribution functions → very convenient!
NB: For random signals, for two time instants n1 and n2, the pdf of
x[n1] need not be identical to that of x[n2], e.g. sin(n) + w(n).
c D. P. Mandic Advanced Signal Processing 6
Statistical distributions: Uniform distribution
Important: Recall that probability densities sum up to unity
Z ∞
p x[n] dx[n] = 1
−∞
and that the connection between pdf and its cumulative density
function CDF is
Z x[n]
F (x[n]) = p(z)dz, also lim F x[n] = 1
−∞ x[n]→∞
p(x[n]) F(x[n])
0 1 x[n] 0 1 x[n]
Figure: pdf and CDF for uniform distribution. In MATLAB – function rand
c D. P. Mandic Advanced Signal Processing 7
Gaussian probability and cumulative density functions
How does the variance σ 2 influence the shape of CDF and pdf?
!
2
(x − µ) x−µ
1 1
p(x; µ, σ ) = √ exp − P (x; µ, σ ) = 1 + erf √
σ 2π 2σ 2 2 2σ
c D. P. Mandic Advanced Signal Processing 8
Statistical distributions: Gaussian # randn in Matlab
Very convenient (mathematical tractability) - especially in terms of
log-likelihood log p(x[n])
(x[n]−µx )2 2
1 − 2 (x[n] − µ x ) 1 2
p(x[n]) = p e 2σx ⇒ log p(x[n]) = − 2
− log 2πσx
2πσx2 2σx 2
Sample distribution for realisation lengths
20
Length=100
µx, σx2 µx → mean, σx2 → variance
10
or x[n] ∼ N
0
2 4 6 8 10
Bipolar distribution
300
0 0.5 1
2 4 6 8 10
0.5
3000
Length=10000 x[n]
2000
−1 0 1 −1 0 1 x[n]
1000
0
2 4 6 8 10
pdf CDF
Sample densities for varying N
c D. P. Mandic Advanced Signal Processing 9
Multi-dimensionality versus multi-variability
Univariate vs. Multivariate vs. Multidimensional
◦ Single input single output (SISO) e.g. single-sensor system
◦ Multiple input multiple output (MIMO) (arrays of transmitters and
receivers) can measure one source with many sensors
◦ Multidimensional processes (3D inertial bodymotion sensors, radar,
vector fields, wind anemometers) – intrinsically multidimensional
c D. P. Mandic Advanced Signal Processing 10
Joint distributions of delayed samples (temporal)
Joint distribution (bivariate CDF)
F (x[n1], x[n2]) = Prob (X[n1] ≤ x[n1], X[n2] ≤ x[n2])
c D. P. Mandic Advanced Signal Processing 11
Example 1.1. Bivariate pdf
Notice the change in indices (assuming discrete time signals)
CDF : F (x[n], x[m]) = Prob {X[n] ≤ x[n], X[m] ≤ x[m]}
∂ 2F (x[n], x[m])
P DF : p (x[n], x[m]) =
∂x[n]∂x[m]
p(x[n],x[m])
1/4
x[m]
1/4
1
1/4
−1
1/4
1
−1 x[n]
Homework: Plot the CDF for this case, what would happen in C?
c D. P. Mandic Advanced Signal Processing 12
Properties of the statistical expectation operator
P1: Linearity:
E{ax[n] + by[m]} = aE{x[n]} + bE{y[m]}
c D. P. Mandic Advanced Signal Processing 13
Example 1.2. Mean for linear systems (use P1 & P2)
Consider a general linear system given by z[n] = ax[n] + by[n]. Find the
mean (E{x[n]} = µx, E{y[n]} = µy , and x ⊥ y).
Solution:
E{z[n]} = E{ax[n] + by[n]} = aE{x[n]} + bE{y[n]}
X
x[n] + z[n]
Σ
+
y[n]
X
b
This is a consequence of the linearity of the E{·} operator.
c D. P. Mandic Advanced Signal Processing 14
Example 1.3. Mean for nonlinear systems (use P3)
think about e.g. estimating the variance empirically
z[n] = x2[n]
This is extremely useful, since most of the real–world signals are observed
through sensors, e.g.
microphones, geophones, various probes ...
c D. P. Mandic Advanced Signal Processing 15
Dealing with ensembles of random processes
Ensemble # collection of all possible realisations of a random signal
The Ensemble Mean Realisations of random process sin(2*x)+2*rand
2
N
1 X 0
µx(n) = xi[n] −2
−5
2 0 5
N i=1 0
−2
−5
2 0 5
where xi[n] # outcome of 0
For N → ∞ we have 0
−2
N −5 0 5
1 X 2
N →∞ N −2
−5
2 0 5
i=1
0
Average both along one and −2
−5 0 5
across all realisations? x[n]
Z ∞
Average Statistically E{x[n]} = µx = x[n]p(x[n])dx[n]
−∞
c D. P. Mandic Advanced Signal Processing 16
Our old noisy sine example
stochastic process is a collection of random variables
Ensemble average for the noisy sinewave Ensemble Histogram
2 3
2.5
1.5
1
1.5
Amplitude
Amplitude
0.5 1
0.5
0
−0.5
-0.5
−1 -1
−5 −4 −3 −2 −1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
Time index
Time Index
The pdf at time instant n is different Left & Right: Ensemble average
from that at m, in particular: sin(2x) + 2 ∗ randn + 1
Left: 6 realisations, Right: 100
µ(n) 6= µ(m) m 6= n realisations (and the overlay plot)
c D. P. Mandic Advanced Signal Processing 17
Ensemble average of a noisy sinewave
A more precise probability distribution for every sample
Every sample in our ensemble average is a random process and has its pdf
p(x) −1 2 3
100 realisations of random process sin(2*x) + 2*rand
68%
1.5
Amplitude
1
x
0.5
µ−σ µ µ+σ
0
Ensemble
µ−2σ 95% µ+2σ −0.5
Average
µ−3σ µ+3σ −1
−5 −4 −3 −2 −1 0 1 2 3 4 5
Time Index
>99%
Left: Area under the Gaussian vs σ Right: Histogram for each random sample
c D. P. Mandic Advanced Signal Processing 18
Second order statistics: 1) Correlation
• Correlation (also known as Autocorrelation Function (ACF))
r(m, n) = E{x[m]x[n]}, that is
Z ∞
r(m, n) = x[m]x[n]p x[m], x[n] dx[m]dx[n]
−∞
c D. P. Mandic Advanced Signal Processing 19
Example 1.4. Autocorrelation of sinewaves
(the need for a covariance function)
function sin(2x) function sin(2x) + 2 * randn function sin(2x) + 4
1 8 5
6
4
0.5 4
2 3
0 0
−2 2
−0.5 −4
1
−6
−1 −8 0
−5 0 5 −5 0 5 −5 0 5
1000 3000
50
800 2500
600 2000
0
400 1500
200 1000
−50
0 500
−100 −200 0
−10 −5 0 5 10 −10 −5 0 5 10 −10 −5 0 5 10
c D. P. Mandic Advanced Signal Processing 20
Second order statistics: 2) Covariance
• Covariance is defined as
• Properties:
◦ c(m, n) = c(n, m) ⇒ the covariance matrix for
T
x = x[0], . . . , x[N − 1] is symmetric and is given by
H
C = {c(m, n)} = E xx , where x = {x − µ}
c D. P. Mandic Advanced Signal Processing 21
Higher order moments
For a zero-mean stochastic process {x[n]}:
• Third and fourth order moments
~ In many applications the signals are assumed to be, or are reduced to,
zero-mean stochastic process.
c D. P. Mandic Advanced Signal Processing 22
Example 1.5. Use of statistics in system identification
(statistical rather than transfer function based analysis)
unknown d[n]
system +
x[n] e[n]
Σ
− error
{ h(n) }
y[n]
c D. P. Mandic Advanced Signal Processing 23
Solution: Minimise error power E{e2[n]} by selecting suitable {h(k)}
P 2
• Cost function: J = E d[n] − k h(k)x[n − k]
• Setting ∇hJ = 0 for h = h(i), gives (you will see more detail later)
X
E{d[n]x[n − i]} − h(k)E{x[n − k]x[n − i]} = 0
k
P
• The solution rdx(−i) = k h(k)rxx (i − k) in vector form is
h = R−1rdx
c D. P. Mandic Advanced Signal Processing 24
Independence, uncorrelatedness and orthogonality
• Two RV are independent if the realisation of one does not affect the
distribution of the other, consequently, the joint density is separable:
p(x, y) = p(x)p(y)
c D. P. Mandic Advanced Signal Processing 25
Independence, uncorrelatedness and orthogonality -
Properties
• Independent RVs are always uncorrelated
c D. P. Mandic Advanced Signal Processing 26
Stationarity: Strict and wide sense
• Strict Sense Stationarity (SSS): The process {x[n]} is SSS if for all
k the joint distribution p(x[n1], . . . , x[nk ]) is invariant under time
shifts, i.e. (all moments considered)
c D. P. Mandic Advanced Signal Processing 27
Autocorrelation function r(m) of WSS processes
i) Time/shift invariant: r(m, n) = r(m − n, 0) = r(m − n) (follows
from the covariance WSS requirement)
c D. P. Mandic Advanced Signal Processing 28
Properties of ACF – continued
iv) The AC matrix for a stationary x = [x[0], . . . , x[L − 1]]T is
x0 r(0) r(1) ··· r(L − 1)
n
x
o
r(1) r(0) ··· ..
1
R =E{xxT } = E
[x
.. 0 1
, x , . . . , x L−1 ] = .. .. ...
r(1)
xL−1 r(L − 1) r(L − 2) · · · r(0)
aT Ra ≥ 0 ∀a 6= 0
c D. P. Mandic Advanced Signal Processing 29
Properties of r(m) – contd II
vi) The autocorrelation function reflects the basic shape of a signal, for
instance if the signal is periodic, the autocorrelation function will also
be periodic
sin(x) [150 points] sin(x) [600 points]
0.5 0.5
0 0
−0.5 −0.5
0 5 10 15 0 20 40 60
[seconds] [seconds]
ACF of sin(x) ACF of sin(x)
1 1
0.5 0.5
0 0
−0.5 −0.5
c D. P. Mandic Advanced Signal Processing 30
Properties of the crosscorrelation
i) rxy (m) = E{x[n]y[n + m]} = ryx(−m) (accounts for the lead/trail
signal see also the radar principle in Example 1.6)
2
iii) rxy (m) ≤ rxx(0)ryy (0) (Same as ACF P(iii) when x = y)
c D. P. Mandic Advanced Signal Processing 31
Example 1.6. The use of (cross-)correlation
Detection of Tones in Noise: Principle of Radar:
Consider a noisy tone x = A cos(ωn + θ) The received signal (see previous slide)
y[n] = A cos(ωn + θ) + w[n] y[n] = ax[n − T0] + w[n], so that
ACF : R(m) = E y[n]y[n + m] =
Rxy (τ ) = E{x(n)y(n + τ )}
α = 0. 1, ω = 0. 5, A = 1 −1
20
B =4
B =2 −2
0 200 400 600 800 1000
15 Time
R ecei ved wavef orm
2
10
1
5
0
0
−1
−5 −2
−100 −50 0 50 100 0 200 400 600 800 1000
T i me d el ay Time
c D. P. Mandic Advanced Signal Processing 32
Example 1.7. Range of a radar
Unbiased estimate of a true radar delay δ0 that has distribution δ ∼ N (δ0, σ02)
Q: What is the distribution of the range of the radar, and how should the
radar be designed (i.e. what should σ0 be) so that the range estimate is
within 100 of the actual range with a probability of 99%?
2
A: The range is given by R = δ C2 , therefore, R ∼ N (δ0 C2 , σ02 C4 ), where R0 = δ0 C2 is
the actual true range.
To fulfil the radar design requirement, we need, P{|R − R0| < 100} = 0.99, or
equivalently (due to the symmetry of the RV R)
(R − R0)
100
P < 2 = 0.995,
σ02C/2 σ0 C/2
(R−R0 ) 100
and as 2 ∼ N (0, 1), we have P 2 ; 1, 0 = 0.995. Evaluating this from
σ0 C/2 σ0 C/2
the expression of the Gaussian CDF in an earlier slide we have
s
100 200
= 2.58 ⇒ σ0 = = 0.51milliseconds
σ02C/2 2.58 × 3 × 108
NB: By dividing N (0, σ) with σ we standardise pdf to unit variance N (0, 1).
c D. P. Mandic Advanced Signal Processing 33
Power spectral density (PSD)
The power spectrum or power spectral density S(f ) of a process
{x[n]} is the Fourier transform of its ACF (Wiener–Khintchine Theorem)
∞
X
S(f ) = F{rxx(m)} = rxx(m)e−2πnf f ∈ (−1/2, 1/2], ω ∈ (−π, π]
m=−∞
c D. P. Mandic Advanced Signal Processing 34
PSD properties
i) S(f ) is a positive real function (it is a distribution) ⇒ S(f ) = S ∗(f ).
Since r(−m) = r(m) we can write
∞
X ∞
X
S(f ) = rxx(−m)e2πmf = rxx(m)e−2πmf
m=−∞ m=−∞
and hence
X∞ ∞
X
S(f ) = rxx(m) cos(2πmf ) = rxx(0) + 2 rxx(m) cos(2πmf )
m=−∞ m=1
ii) S(f ) is a symmetric function, S(−f ) = S(f ). This follows from the
last expression.
R 1/2 2
iii) r(0) = −1/2
S(f )df = E{x [n]} ≥ 0.
⇒ the area below the PSD (power spectral density) curve = Signal Power.
c D. P. Mandic Advanced Signal Processing 35
Linear systems
transfer function
known unknown/known
input H(z) output Y(z)
H(z) =
X(z) h(k) Y(z) X(z)
x(k) known/unknown
y(k)
Described by their impulse response h(n) or the transfer function H(z)
∞
X
that is y[n] = h(r)x[n − r] = h ∗ x
r=−∞
c D. P. Mandic Advanced Signal Processing 36
Example 1.8. Linear systems – statistical properties #
mean and variance
i) Mean
∞ ∞
( )
X X
E{y[n]} = E h(r)x[n − r] = h(r)E{x[n − r]}
r=−∞ r=−∞
∞
X
⇒ µy = µ x h(r) = µxH(0)
r=−∞
P∞ −jrθ P∞
[ NB: H(θ) = r=−∞ h(r)e . For θ = 0, then H(0) = r=−∞ h(r) ]
ii) Cross–correlation
∞
X
ryx(m) = E{y[n]x[n + m]} = h(r)E{x[n − r]x[n + m]}
r=−∞
∞
X
= h(r)rxx(m + r) convolution of input ACF and {h}
r=−∞
c D. P. Mandic Advanced Signal Processing 37
Example 1.9. Linear systems – statistical properties #
crosscorrelation (this will be used in AR spectrum)
From rxy (m)
P∞= ryx(−m) we have
rxy (m) = r=−∞ h(r)rxx(m − r). Now we write
∞
X
ryy (m) = E{y[n]y[n + m]} = h(r)E{x[n − r]y[n + m]}
r=−∞
∞
X ∞
X
= h(r)rxy (m + r) = h(−r)rxy (m − r)
r=−∞ r=−∞
or
Syy (f ) = H(f )H(−f )Sxx(f ) = |H(f )|2Sxx(f )
c D. P. Mandic Advanced Signal Processing 38
Crosscorrelation and cross–PSD (recap)
• CC of two jointly WSS discrete time signals (this is not symmetric)
• For z[n] = x[n] + y[n] where x[n] and y[n] are zero mean and
independent, we have rxy (m) = ryx(m) = 0, therefore
c D. P. Mandic Advanced Signal Processing 39
Special signals: a) White noise
If the joint pdf is separable
p(x[0], x[1], . . . x[n]) = p(x[0])p(x[1]) · · · p(x[n]) ∀n
where the pdf’s p(x[r]) are identical ∀r, then all the pairs x[n], x[m] are
independent and {x[n]} is said to be an independent identically
distributed (iid) signal.
c D. P. Mandic Advanced Signal Processing 40
Example 1.10. ACF and power spectrum of white noise
The Fourier transform of WN is constant for all frequencies, hence ”white”.
S(f)
2
σ
x 2
σ
x
−1/2 1/2
• The autocorrelation matrix
R = σ 2I r(m) = σx2 δ(m)
Since E{x[n]x[n − 1]} = 0, the variance r(0) = σx2 is the power of WN.
• The shape of the pdf p(x[n]) determines whether the white noise is
called Gaussian (WGN), uniform (UWN), Poisson, Laplacian, etc.
From the Wiener–KhinchineTheorem:
c D. P. Mandic Advanced Signal Processing 41
b) First order Markov signals: autoregressive modelling
(finite memory in the description of a random signal)
where p(a|b) is defined as the pdf of ”a” conditioned upon the (possibly
correlated) observation ”b”
c D. P. Mandic Advanced Signal Processing 42
c) Minimum phase signals
Let {x[n]} be observed for n = 0, 1, . . . , N − 1.
c D. P. Mandic Advanced Signal Processing 43
d) Gaussian random signals
A signal in which each of the L samples is Gaussian distributed
(x[i]−µ(i))2
1 2σ 2
p(x[i]) = p e i i = 0, . . . , L − 1
2πσi2
c D. P. Mandic Advanced Signal Processing 44
e) Properties of a Gaussian distribution
1) If x and y are jointly Gaussian, then
for any constants a and b the random
p(x) 2
variable
−1
sqrt ( 2πσ ) z = ax + by
is Gaussian with mean
68% mz = amx + bmy
x and variance
µ−σ µ µ+σ
σz2 = a2σx2 + b2σy2 + 2abσxσy ρxy
2) If two jointly Gaussian random
µ−2σ 95% µ+2σ variables are uncorrelated (ρxy =
µ−3σ µ+3σ 0) then they are statistically
>99% independent,
fx,y = f (x)f (y)
For µ = 0, σ = 1, the inflection points are ±1
c D. P. Mandic Advanced Signal Processing 45
Example 1.11. Rejecting outliers from cardiac data
◦ Failed detection of R peaks in ECG [left] causes outliers in R-R interval
(RRI, time difference between consecutive R peaks) [right]
R peak missed outlier
1.5 RRI ± 3σRRI
RRI (s)
ECG
1
0.5
0
20 25 30 35 20 25 30 35
Time (s) Time (s)
PSD
PSD
1.2
4
RRI (s)
RRI (s)
0 0.5 1 0 0.5 1
3 Frequency (Hz) 1 Frequency (Hz)
2
0.8
1
0 50 100 150 200 250 0 50 100 150 200 250
Time (s) Time (s)
c D. P. Mandic Advanced Signal Processing 46
f) Conditional mean estimator for Gaussian random
variables
3) If x and y are jointly Gaussian random variables then the optimum
estimator for y, given by
ŷ = g(x)
ŷ = ax + b
1 × 3 × 5 × · · · × (n − 1)σxn, n
n even
E{x } =
0, n odd
c D. P. Mandic Advanced Signal Processing 47
e) Ergodic signals
c D. P. Mandic Advanced Signal Processing 48
e) Ergodic signals – Example
However, for a single realisation of this process, for large N , the sample
mean is approximately zero
mx ≈ 0, N >> 1
c D. P. Mandic Advanced Signal Processing 49
e) Ergodicity in the mean
◦ Asymptotically unbiased
limN →∞ E{m̂x(N )} = mx
c D. P. Mandic Advanced Signal Processing 50
e) Ergodicity - Summary
In practice, it is necessary to assume that the single realisation of a discrete
time random signal satisfies ergodicity in the mean and autocorrelation.
Mean Ergodic Theorem: Let x[n] be a wide sense stationary (WSS)
random process with autocovariance sequence cx(k), sufficient conditions
for x[n] to be ergodic in the mean are that cx(k) < ∞ and
lim cx(k) = 0
k→∞
N −1
1 X
lim cx(k) = 0
N →∞ N
k=0
c D. P. Mandic Advanced Signal Processing 51
Taylor series expansion
Most ’smooth’ functions can be expanded into their Taylor Series
Expansion (TSE)
∞
f 0(x0) f 00(x0) 2
X f (n)(x0)
f (x) = f (x0) + (x − x0) + (x − x0) + · · · =
1 2! n=1
n!
d
f (x) = a1 + 2a2(x − x0) + 3a3(x − x0)2 + 4a4(x − x0)4 + · · ·
dx
k
df (x) 1 d f (x)
choose x = x0 ⇒ a1 = dx |x=x0 and so on ... ak = k! dxk |x=x0
c D. P. Mandic Advanced Signal Processing 52
Power series - contd.
Consider
∞ ∞ Z x ∞
X X X an n+1
f (x) = anxn ⇒ f 0(x) = nanxn−1 and f (t)dt = x
n=0 n=1 0 n=0
n+1
P∞ nx 2n+1
Integrate to obtain atan(x) = n=0 (−1) 2n+1 .
π
For x = 1 we have 4 = 1 = 1/3 + 1/5 − 1/7 + · · ·
c D. P. Mandic Advanced Signal Processing 53
Numerical derivatives - examples
• Two-point approximation:
◦ Forward: f 0(0) = f (1)−f
h
(0)
0 f (−1)−f (0)
◦ Backward: f (0) = h
• Three-point approximation:
◦ f 0(0) = f (1)−2f 2h
(0)+f (−1)
c D. P. Mandic Advanced Signal Processing 54
Constrained optimisation using Lagrange multipliers:
Basic principles
Consider a two-dimensional problem:
maximize f (x, y)
| {z }
f unction to max/min
subject to g(x, y) = c
| {z }
constraint
# we look for point(s) where curves f & g touch (but do not cross).
In those points, the tangent lines for f and g are parallel ⇒ so too are the
gradients ∇x,y f k λ∇x,y g, where λ is a scaling constant.
Although the two gradient vectors are parallel they can have different magnitudes
Therefore, we are looking for max or min points (x, y) of f (x, y) for which
∂f ∂f ∂g ∂g
∇x,y f (x, y) = −λ∇x,y g(x, y) where ∇x,y f = , and ∇x,y g = ,
∂x ∂y ∂x ∂y
We can now combine these conditions into one equation as:
F (x, y, λ) = f (x, y) − λ g(x, y) − c and solve ∇x,y,z F (x, y, λ) = 0
Obviously, ∇λF (x, y, λ) = 0 ⇔ g(x, y) = c
c D. P. Mandic Advanced Signal Processing 55
The method of Lagrange multipliers in a nutshell
max/min of a function f (x, y, z) where x, y, z are coupled
Since x, y, z are not independent there exists a constraint g(x, y, z) = c
Solution: Form a new function
F (x, y, z, λ) = f (x, y, z) − λ g(x, y, z) − c and calculate Fx0 , Fy0 , Fz0 , Fλ0
Set Fx0 , Fy0 , Fz0 , Fλ0 = 0 and solve for the unknown x, y, z, λ.
c D. P. Mandic Advanced Signal Processing 56
Notes:
c D. P. Mandic Advanced Signal Processing 57
Notes:
c D. P. Mandic Advanced Signal Processing 58
Notes:
c D. P. Mandic Advanced Signal Processing 59
Notes:
c D. P. Mandic Advanced Signal Processing 60