Stochastic Signal Processing

Stochastic Signal Processing
TS Ñoã Hoàng Tuaán
Boä Moân Vieãn Thoâng, Khoa Ñieän-Ñieän Töû

Ñaïi Hoïc Baùch Khoa TP. HCM
E-mail: do-hong@hcmut.edu.vn
Boä moân Vieãn thoâng SSP2008

Khoa Ñieän-Ñieän töû 1 BG, CH, ÑHBK
Outline (1)
Chapter 1: Discrete-Time Signal Processing

z-Transform.
Linear Time-Invariant Filters.
Discrete Fourier Transform (DFT).
Chapter 2: Stochastic Processes and Models

Review of Probability and Random Variables.
Stochastic Models.
Stochastic Processes.
Chapter 3: Spectrum Analysis

Spectral Density.
Spectral Representation of Stochastic Process.
Spectral Estimation.
Other Statistical Characteristics of a Stochastic Process.

Outline (2)
Chapter 4: Eigenanalysis
Properties of Eigenvalues and Eigenvectors.
Chapter 5: Wiener Filters
Minimum Mean-Squared Error.
Wiener-Hopf Equations.
Channel Equalization.
Linearly Constrained Minimum Variance (LCMV) Filter.
Chapter 6: Linear Prediction
Forward Linear Prediction.
Backward Linear Prediction.
Levinson-Durbin Algorithm.
Chapter 7: Kalman Filters
Kalman Algorithm.
Applications of Kalman Filter: Tracking Trajectory of Object
and System Identification.

Outline (3)
Chapter 8: Linear Adaptive Filtering

Adaptive Filters and Applications.
Method of Steepest Descent.
Least-Mean-Square Algorithm.
Recursive Least-Squares Algorithm.
Examples of Adaptive Filtering: Adaptive Equalization
and Adaptive Beamforming.
Chapter 9: Estimation Theory

Fundamentals.
Minimum Variance Unbiased Estimators.
Maximum Likelihood Estimation.
Eigenanalysis Algorithms for Spectral Estimation.
Example: Direction-of-Arrival (DoA) Estimation.

References
[1] Simon Haykin, Adaptive Filter Theory, Prentice Hall,

1996 (3rd Ed.), 2001 (4th Ed.).
[2] Steven M. Kay, Fundamentals of Statistical Signal Processing:

Estimation Theory, Prentice Hall, 1993.
[3] Alan V. Oppenheim, Ronald W. Schafer, Discrete-Time Signal

Processing, Prentice Hall, 1989.
[4] Athanasios Papoulis, Probability, Random Variables, and

Stochastic Processes, McGraw-Hill, 1991 (3rd Ed.), 2001 (4th Ed.).
[5] Dimitris G. Manolakis, Vinay K. Ingle, Stephen M. Kogon,

Statistical and Adaptive Signal Processing, Artech House, 2005.

Goal of the course
Introduction to the theory and algorithms used for the analysis and
processing of random signals (stochastic signals) and their
applications to communications problems.
Understanding the random signals: via statistical description

(theory of probability, random variables and stochastic processes),
modeling and the dependence between the samples of one or more
discrete-time random signals.
Developing the theoretically methods/practical techniques for

processing random signals to achieve a predefined application-
dependent objective.
• Major applications in communications: signal modeling,
spectral estimation, frequency-selective filtering, adaptive
filtering, array processing...

Random Signals vs. Deterministic Signals
Example: Discrete-time random signals (a) and the dependence
between the samples (b), [5].

Techniques for Processing Random Signals
Signal analysis (signal modeling, spectral estimation): Primary goal
is to extract useful information for understanding and classifying the
signals.
Typical applications: Detection of useful information from receiving
signals, system modeling/identification, detection and classification of
radar and sonar targets, speech recognition, signal representation for
data compression…
Signal filtering (frequency-selective filtering, adaptive filtering,

array processing): Main objective is to improve the quality of a signal
according to a criterion of performance.
Typical applications: Noise and interference cancellation, echo
cancellation, channel equalization, active noise control…

Applications of Adaptive Filters (1)
System identification [5]:
System inversion [5]:

Signal prediction [5]:
Interference cacellation [5]:

Simple model of a digital communications system with channel
equalization [5]:
Pulse trains: (a) without intersymbol

interference (ISI) and (b) with ISI.
ISI distortion: The tails of adjacent
pulses interfere the current pulse and
can lead to an incorrect decision.
The equalizer can compensate for
the ISI distortion.

Block diagram of the basic components of an active noise control system
[5]:

Applications of Array Processing (1)
Beamforming [5]:

Example of (adaptive) beamforming with an airborne for interference
mitigation [5]:

(Adaptive) Sidelobe canceler [5]:

Chapter 1:
Discrete-Time Signal Processing
z-Transform.
Linear Time-Invariant Filters.
Discrete Fourier Transform (DFT).

Khoa Ñieän-Ñieän töû BG, CH, ÑHBK
1. Z-transform (1)
Discrete-time signals → signals described as a time series, consisting
of sequence of uniformly spaced samples whose varying amplitudes
carry the useful information content of the signal.
Consider time series (sequence) {u(n)} or u(n) denoted by samples:

u(n), u(n-1), u(n-2),…, n: discrete time.
Z-transform of u(n):
∞
U ( z ) = z[u (n)] = ∑ u ( n ) z −n (1.1)
n = −∞
z: complex variable.
z-transform pair: u(n) ↔ U(z).
Region of convergence (ROC): set of values of z for which U(z) is
uniformly convergent.

1. Z-transform (2)
Properties:
Linear transform (superposition):
au1 (n) + bu2 (n) ↔ aU1 ( z ) + bU 2 ( z ) (1.2)
ROC of (1.2): intersection of ROC of U1(z) and ROC of U2(z).
Time-shifting:
u ( n) ↔ U ( z ) ⇒ u (n − n0 ) ↔ z n0U ( z ) (1.3)
n0: integer.
ROC of (1.3): same as ROC of U(z).
Special case: n 0 = 1 ⇒ u (n − 1) ↔ z −1U ( z )
z-1: unit-delay element.
Convolution theorem:
∞
∑ u (n)u
i = −∞
1 2 (n − i ) ↔ U1 ( z )U 2 ( z ) (1.4)
ROC of (1.4): intersection of ROC of U1(z) and ROC of U2(z).

1. Linear Time-Invariant (LTI) Filters (1)
Definition:
v(n) LTI u(n)

Filter:
av1 (n) + bv2 (n) Linearity, au1 (n) + bu2 (n)
Time-invariant
v(n − k ) u (n − k )
Impulse response h(n):
LTI u(n)=h(n)
t=0 Filter
h(n)
For arbitrary input v(n): convolution sum

∞
u ( n) = ∑ h(i)v(n − i)
i = −∞
(1.5)

1. LTI Filters (2)
Transfer function:
Applying z-transform to both side of (1.5)
U ( z ) = H ( z )V ( z )
(1.6)
U(z) ↔ u(n), V(z) ↔ v(n), H(z) ↔ h(n)
H(z): transfer function of the filter
U ( z)
H ( z) =
V ( z)
(1.7)
When input sequence v(n) and output sequence u(n) are related
by difference equation of order N:
N N
∑ a u (n − j ) =∑ b v(n − j )
j =0
j
j =0
j
(1.8)
aj, bj: constant coefficients. Applying z-transform, obtaining:
N N
U ( z)
∑a z
j =0
j
−j
a ∏ (1 − c z
k
−1
)
H ( z) = = = 0 k =1
(1.9)
V ( z) N N
∑b z ∏ (1 − d
−j b0 −1
j k z )
j =0 k =1

1. LTI Filters (3)
From (1.9), two distinct types of LTI filters:
Finite-duration impulse response (FIR) filters: dk=0 for all k.
→ all-zero filter, h(n) has finite duration.
Infinite-duration impulse response (IIR) filters: H(z) has at least
one non-zero pole, h(n) has infinite duration. When ck=0 for all
k → all-pole filter.
See examples of FIR and IIR filters in next two slides.

1. LTI Filters (4)
v(n) v(n-1) v(n-2) v(n-M+1) v(n-M)

z-1 z-1 … z-1
a1 a2 … aM-1 aM
u(n)
+ + … + +
FIR filter

1. LTI Filters (5)
v(n) + u(n)
z-1
+ a1 u(n-1)
z-1
+ a2 u(n-2)
.
.
.
+ aM-1 u(n-M+1)
z-1
IIR filter
aM u(n-M)

1. LTI Filters (6)
Causality and stability:
LTI filter is causal if:
h( n) = 0 for n < 0 (1.10)
LTI filter is stable if output sequence is bounded for all bounded
input sequences. From (1.5), necessary and sufficient condition:
∞
∑ h( k ) < ∞
k = −∞
(1.11)
Im
Unit circle A causal LTI filter is stable

if and only if all of the poles
of the filter’s transfer
Re function lie inside the unit
circle in the z-plane
Region of
stability
(See more in [3])
z-plane
1. Discrete Fourier Transform (DFT) (1)
Fourier transform of a sequence u(n) is obtained from z-transform
by setting z=exp(j2πf), f: real frequency variable.
When u(n) has a finite duration, its Fourier representation →

discrete Fourier transform (DFT).
For numerical computation of DFT → efficient fast Fourier

transform (FFT).
u(n): finite-duration sequence of length N, DFT of u(n):

N −1
⎛ j 2πkn ⎞ (1.12)
U (k ) = ∑ u (n) exp⎜ − ⎟ , k = 0,..., N − 1
n =0 ⎝ N ⎠
Inverse DFT (IDFT) of U(k):
1 N −1
⎛ j 2πkn ⎞
u ( n) =
N
∑
k =0
U ( k ) exp⎜
⎝ N
⎟ , n = 0,..., N − 1
⎠
(1.13)
u(n), U(k): same length N → “N-point DFT”

Chapter 2:
Stochastic Processes and Models
Review of Probability and Random Variables.
Review of Stochastic Processes and Stochastic Models.

Khoa Ñieän-Ñieän töû BG, CH, ÑHBK
2. Review of Probability and Random Variables
Axioms of Probability.
Repeated Trials.
Concepts of Random Variables.
Functions of Random Variables.
Moments and Conditional Statistics.
Sequences of Random Variables.

2. Review of Probability Theory: Basics (1)
Probability theory deals with the study of random phenomena, which
under repeated experiments yield different outcomes that have
certain underlying patterns about them. The notion of an experiment
assumes a set of repeatable conditions that allow any number of
identical repetitions. When an experiment is performed under these
conditions, certain elementary events ξi occur in different but
completely uncertain ways. We can assign nonnegative number P(ξi ),
as the probability of the event ξi in various ways:
Laplace’s Classical Definition: The Probability of an event A is
defined a-priori without actual experimentation as
Number of outcomes favorable to A
P( A) = ,
Total number of possible outcomes
provided all these outcomes are equally likely.

Relative Frequency Definition: The probability of an event A is defined
as
nA
P ( A) = lim
n→∞ n
where nA is the number of occurrences of A and n is the total number of

trials.
The axiomatic approach to probability, due to Kolmogorov, developed

through a set of axioms (below) is generally recognized as superior to the
above definitions, as it provides a solid foundation for complicated
applications.

The totality of all ξ i , known a priori, constitutes a set Ω, the set of all
experimental outcomes.
Ω = {ξ 1 , ξ 2 , ,ξ k , }
Ω has subsetsA, B, C , . Recall that if A is a subset of Ω, then ξ ∈ A
implies ξ ∈ Ω. From A and B, we can generate other related subsets
A ∪ B, A ∩ B, A, B,
A ∪ B = { ξ | ξ ∈ A or ξ ∈ B}
A ∩ B = { ξ | ξ ∈ A and ξ ∈ B}
and
A = { ξ | ξ ∉ A}

A
A B A B A
A∪ B A∩ B A
If A ∩ B = φ , the empty set, then A and B are said to be mutually
exclusive (M.E).
A partition of Ω is a collection of mutually exclusive subsets of Ω such
that their union is Ω.
Ai ∩ A j = φ , and ∪A
i =1
i = Ω.
A1
A2
A B Ai
Aj An
A∩ B =φ
De-Morgan’s Laws:
A∪ B = A∩ B ; A∩ B = A∪ B
A B A B A B A B
A∪ B A∪ B A∩ B
Often it is meaningful to talk about at least some of the subsets of Ω as

events, for which we must have mechanism to compute their probabilities.
Example: Consider the experiment where two coins are simultaneously

tossed. The various elementary events are

ξ1 = (H, H ), ξ2 = (H, T ), ξ3 = (T , H ), ξ4 = (T , T )
and
Ω = { ξ 1 , ξ 2 , ξ 3 , ξ 4 }.
The subset A = { ξ 1 , ξ 2 , ξ 3 } is the same as “Head has occurred at least
once” and qualifies as an event.
Suppose two subsets A and B are both events, then consider
“Does an outcome belong to A or B = A∪ B ”
“Does an outcome belong to A and B = A∩ B ”
“Does an outcome fall outside A”?

Thus the sets A ∪ B , A ∩ B , A , B , etc., also qualify as events. We shall
formalize this using the notion of a Field.
•Field: A collection of subsets of a nonempty set Ω forms a field F if

(i) Ω ∈ F
(ii) If A ∈ F , then A ∈ F
(iii) If A ∈ F and B ∈ F , then A ∪ B ∈ F.
Using (i) - (iii), it is easy to show that A ∩ B , A ∩ B , etc., also belong to F.
For example, from (ii) we have A ∈ F , B ∈ F , and using (iii) this gives
A ∪ B ∈ F ; applying (ii) again we get A ∪ B = A ∩ B ∈ F , where we
have used De Morgan’s theorem in slide 22.

Axioms of Probability
For any event A, we assign a number P(A), called the probability of
the event A. This number satisfies the following three conditions that
act the axioms of probability.
(i) P( A) ≥ 0 (Probability is a nonnegative number)

(ii) P(Ω) = 1 (Probability of the whole set is unity)
(iii) If A ∩ B = φ , then P( A ∪ B ) = P( A) + P( B ).
(Note that (iii) states that if A and B are mutually exclusive (M.E.)
events)

The following conclusions follow from these axioms:
a. P( A ∪ A) = P( A) + P( A) = 1 or P( A) = 1 − P( A).
b. P{ φ } = 0.
c. Suppose A and B are not mutually exclusive (M.E.)?
P ( A ∪ B ) = P ( A) + P ( B ) − P ( AB ).
Conditional Probability and Independence

In N independent trials, suppose NA, NB, NAB denote the number of times
events A, B and AB occur respectively, for large N
NA N N
P ( A) ≈ , P ( B ) ≈ B , P ( AB ) ≈ AB .
N N N
Among the NA occurrences of A, only NAB of them are also found among
the NB occurrences of B. Thus the ratio
N AB N AB / N P ( AB )
= =
NB NB / N P( B)
is a measure of “the event A given that B has already occurred”. We
denote this conditional probability by
P(A|B) = Probability of “the event A given that B has occurred”.
We define P( AB )
P( A | B ) = ,
P( B )
provided P ( B ) ≠ 0. As we show below, the above definition satisfies all
probability axioms discussed earlier.
Independence: A and B are said to be independent events, if

P ( AB ) = P ( A ) ⋅ P ( B ).
Then
P ( AB ) P ( A)P ( B )
P(A | B) = = = P ( A ).
P(B) P(B)
Thus if A and B are independent, the event that B has occurred does not
shed any more light into the event A. It makes no difference to A
whether B has occurred or not.
Let
A = A1 ∪ A2 ∪ A3 ∪ ∪ An , (*)
a union of n independent events. Then by De-Morgan’s law

A = A1 A 2 An
and using their independence
n n
P ( A ) = P ( A1 A 2 An ) = ∏ P ( A ) = ∏ (1 − P ( A )).
i =1
i
i =1
i
Thus for any A as in (*) n

P ( A ) = 1 − P ( A ) = 1 − ∏ (1 − P ( Ai )) ,
i =1
a useful result.

PILLAI
Bayes’ theorem:
P( B | A)
P( A | B ) = ⋅ P( A)
P( B )
Although simple enough, Bayes’ theorem has an interesting

interpretation: P(A) represents the a-priori probability of the event A.
Suppose B has occurred, and assume that A and B are not independent.
How can this new information be used to update our knowledge about
A? Bayes’ rule above take into account the new information (“B has
occurred”) and gives out the a-posteriori probability of A given B.
We can also view the event B as new knowledge obtained from a fresh
experiment. We know something about A as P(A). The new information
is available in terms of B. The new information should be used to
improve our knowledge/understanding of A. Bayes’ theorem gives the
exact mechanism for incorporating such new information.

Let A1 , A2 ,..., An are pair wise disjoint Ai Aj = φ , we have
n n
P(B) = ∑
i =1
P ( BA i ) = ∑i =1
P ( B | A i ) P ( A i ).
A more general version of Bayes’ theorem

P ( B | Ai ) P ( Ai ) P ( B | Ai ) P ( Ai )
P ( Ai | B ) = = ,
P(B) n
∑i =1
P ( B | Ai ) P ( Ai )

Example: Three switches connected in parallel operate independently.
Each switch remains closed with probability p. (a) Find the probability of
receiving an input signal at the output. (b) Find the probability that switch S1
is open given that an input signal is received at the output.
s1
s2
Input Output
s3
Solution:
a. Let Ai = “Switch Si is closed”. Then P ( Ai ) = p , i = 1 → 3 .
Since switches operate independently, we have
P ( Ai A j ) = P ( Ai ) P ( A j ); P ( A1 A2 A3 ) = P ( A1 ) P ( A2 ) P ( A3 ).

Let R = “input signal is received at the output”. For the event R to
occur either switch 1 or switch 2 or switch 3 must remain closed, i.e.,
R = A1 ∪ A2 ∪ A3 .
Using results in slide 38,
P( R) = P( A1 ∪ A2 ∪ A3 ) = 1 − (1 − p)3 = 3 p − 3 p 2 + p3.
b. We need P( A1 | R ). From Bayes’ theorem
P ( R | A1 ) P( A1 ) (2 p − p 2 )(1 − p ) 2 − 2 p + p2
P( A1 | R ) = = = .
P( R ) 3 p − 3 p2 + p3 3 p − 3 p2 + p3
Because of the symmetry of the switches, we also have

P( A1 | R ) = P( A2 | R ) = P( A3 | R ).

2. Review of Probability Theory: Repeated Trials (1)
Consider two independent experiments with associated probability models
(Ω1, F1, P1) and (Ω2, F2, P2). Let ξ∈Ω1, η∈Ω2 represent elementary events.
A joint performance of the two experiments produces an elementary events
ω = (ξ, η). How to characterize an appropriate probability to this
“combined event” ?
Consider the Cartesian product space Ω = Ω1× Ω2 generated from Ω1 and Ω2
such that if ξ ∈ Ω1 and η ∈ Ω2 , then every ω in Ω is an ordered pair of the
form ω = (ξ, η). To arrive at a probability model we need to define the
combined trio (Ω, F, P).
Suppose A∈F1 and B ∈ F2. Then A × B is the set of all pairs (ξ, η), where ξ
∈ A and η ∈ B. Any such subset of Ω appears to be a legitimate event for the
combined experiment. Let F denote the field composed of all such subsets A
× B together with their unions and compliments. In this combined
experiment, the probabilities of the events A × Ω2 and Ω1 × B are such that
P( A × Ω 2 ) = P1 ( A), P (Ω1 × B ) = P2 ( B ).
Moreover, the events A × Ω2 and Ω1 × B are independent for any A ∈ F1 and
B ∈ F2 . Since
( A × Ω 2 ) ∩ (Ω1 × B ) = A × B,
then
P( A × B ) = P( A × Ω 2 ) ⋅ P(Ω1 × B ) = P1 ( A) P2 ( B )
for all A ∈ F1 and B ∈ F2 . This equation extends to a unique probability
measure P( ≡ P1 × P2 ) on the sets in F and defines the combined trio (Ω, F, P).
Generalization: Given n experiments Ω1 , Ω 2 , , Ωn , and their associated

Fi and Pi , i = 1 → n , let
Ω = Ω1 × Ω2 × × Ωn
represent their Cartesian product whose elementary events are the ordered
n-tuples ξ 1 , ξ 2 , , ξ n , where ξ i ∈ Ω i . Events in this combined space are of
the form
A1 × A2 × × An
where Ai ∈ Fi ,

If all these n experiments are independent, and P(Ai) is the probability of
the event Ai in Fi then as before
P ( A1 × A2 × × An ) = P1 ( A1 ) P2 ( A2 ) Pn ( An ).
Example: An event A has probability p of occurring in a single trial. Find
the probability that A occurs exactly k times, k ≤ n in n trials.
Solution: Let (Ω, F, P) be the probability model for a single trial. The
outcome of n experiments is an n-tuple
ω = { ξ1 , ξ 2 , , ξ n }∈ Ω0 ,
where every ξi ∈ Ω and Ω 0 = Ω × Ω × × Ω. The event A occurs

at trial # i , if ξ i ∈ A . Suppose A occurs exactly k times in ω. Then k of
the ξi belong to A, say ξ i1 , ξ i 2 , , ξ i k , and the remaining n-k are
contained in its compliment in A .

Using independence, the probability of occurrence of such an ω is given by
P0 (ω ) = P ({ ξi1 , ξi2 , , ξik , , ξin }) = P({ ξ i1 }) P({ ξi2 }) P({ ξik }) P({ ξin })
= P ( A) P ( A) P( A) P ( A) P ( A) P( A) = p k q n −k .
k n −k
However the k occurrences of A can occur in any particular location inside

ω. Let ω1 , ω 2 , , ω N represent all such events in which A occurs exactly
k times. Then
" A occurs exactly k times in n trials" = ω1 ∪ ω 2 ∪ ∪ ωN .
But, all these ωi s are mutually exclusive, and equiprobable. Thus

P (" A occurs exactly k times in n trials" )
N
= ∑ P0 (ωi ) = NP0 (ω ) = Np k q n −k ,
i =1

Recall that, starting with n possible choices, the first object can be chosen
n different ways, and for every such choice the second one in (n-1) ways, …
and the kth one (n-k+1) ways, and this gives the total choices for k objects
out of n to be n(n-k)…(n-k+1). But, this includes the k! choices among the
k objects that are indistinguishable for identical objects. As a result
n( n − 1) (n − k + 1) n! ⎛n⎞
N= = =⎜ ⎟
k! ( n − k )! k! ⎜⎝ k ⎟⎠
represents the number of combinations, or choices of n identical objects

taken k at a time. Thus, we obtain Bernoulli formula
Pn ( k ) = P(" A occurs exactly k times in n trials" )
⎛ n ⎞ k n −k
= ⎜⎜ ⎟⎟ p q , k = 0,1,2, , n,
⎝k ⎠

Independent repeated experiments of this nature, where the outcome is
either a “success” ( = A) or a “failure” ( = A) are characterized as
Bernoulli trials, and the probability of k successes in n trials is given by
Bernoulli formula, where p represents the probability of “success”
in any one trial.
Bernoulli trial: consists of repeated independent and identical

experiments each of which has only two outcomes A or A with P( A) = p,
and P ( A) = q. The probability of exactly k occurrences of A in n such trials
is given by Bernoulli formula. Let
X k = " exactly k occurrence s in n trials" .
Since the number of occurrences of A in n trials must be an integer

k = 0, 1, 2,…, n, either X0 or X1 or X2 or … or Xn must occur in such an
experiment. Thus
P(X 0 ∪ X1 ∪ ∪ X n ) = 1.
But Xi, Xj are mutually exclusive. Thus
n
⎛n⎞
n
P( X 0 ∪ X 1 ∪ ∪ X n ) = ∑ P( X k ) = ∑ ⎜⎜ ⎟⎟ p k q n −k .
k =0 k =0 ⎝ k ⎠
For a given n and p what is the most likely value of k ?
Pn (k )
n = 12, p = 1 / 2.
From Figure, the most probable value of k is that number which maximizes
Pn(k) in Bernoulli formula. To obtain this value, consider the ratio
Pn (k − 1) n! p k −1q n −k +1 (n − k )! k! k q
= k n −k
= .
Pn ( k ) (n − k + 1)! ( k − 1)! n! p q n − k +1 p

Pn (k − 1) n! p k −1q n −k +1 (n − k )! k! k q
= k n −k
= .
Pn ( k ) (n − k + 1)! ( k − 1)! n! p q n − k +1 p
Thus, Pn(k) ≥ Pn(k-1), if k(1-p) ≤ (n-k+1)p or k ≤ (n+1)p . Thus Pn(k) as a

function of k increases until
k = ( n + 1) p
if it is an integer, or the largest integer kmax less than (n+1)p, it represents

the most likely number of successes (or heads) in n trials.

2. Review of Probability Theory: Random Variables (1)
Let (Ω, F, P) be a probability model for an experiment, and X a function
that maps every ζ ∈ Ω to a unique point x ∈ R, the set of real numbers.
Since the outcome ζ is not certain, so is the value X(ζ ) = x. Thus if B is some
subset of R, we may want to determine the probability of “X(ζ ) ∈ B ”. To
determine this probability, we can look at the set A = X-1(B) ∈ Ω that contains
all ζ ∈ Ω that maps into B under the function X.
Ω
ξ
A
X (ξ )
x B R
Obviously, if the set A = X-1(B) also belongs to the associated field F, then it
is an event and the probability of A is well defined; in that case we can say
Probability of the event " X (ξ ) ∈ B " = P( X −1 ( B )).

2. Review of Probability Theory: R. V (2)
However, X-1(B) may not always belong to F for all B, thus creating
difficulties. The notion of random variable (r.v) makes sure that the inverse
mapping always results in an event so that we are able to determine the
probability for any B ∈ R.
Random Variable (r.v): A finite single valued function X(.) that maps
the set of all experimental outcomes Ω into the set of real numbers R is said
to be a r.v, if the set { ζ | X(ζ) ≤ x } is an event ∈ F for every x in R.
Alternatively, X is said to be a r.v, if X-1(B) ∈ F where B represents semi-

definite intervals of the form {-∞ < x ≤ a} and all other sets that can be
constructed from these sets by performing the set operations of union,
intersection and negation any number of times. Thus if X is a r.v, then
{ξ | X (ξ ) ≤ x } = { X ≤ x }
is an event for every x .

What about { a < X ≤ b }, { X = a } ? Are they also events ?
In fact with b > a , since { X ≤ a } and { X ≤ b } are events,

{ X ≤ a }c = { X > a } is an event and hence
{ X > a } ∩ { X ≤ b } = { a < X ≤ b },
is also an event. Thus, { a – 1/n < X ≤ a }, is an event for every n.
Consequently, ∞
⎧ 1 ⎫
∩ ⎨⎩ n a − < X ≤ a ⎬ = { X = a}
⎭
n =1
is also an event. All events have well defined probability. Thus the
probability of the event { ζ | X(ζ) ≤ x } must depend on x. Denote
P{ ξ | X (ξ ) ≤ x } = FX ( x) ≥ 0
which is referred to as the Probability Distribution Function (PDF)
associated with the r.v X.
Distribution Function: If g(x) is a distribution function, then
(i) g(+∞) = 1; g(-∞) = 0
(ii) if x1 < x2, then g(x1) ≤ g(x2)
(iii) g(x+) = g(x) for all x.
It is shown that the PDF FX(x) satisfies these properties for any r.v X.
Additional Properties of a PDF
(iv) if FX(x0) = 0 for some x0, then FX(x) = 0 for x ≤ x0

(v) P{ X (ξ ) > x } = 1 − FX ( x ).
(vi) P{ x1 < X (ξ ) ≤ x2 } = FX ( x2 ) − FX ( x1 ), x2 > x1.
(vii) P ( X (ξ ) = x ) = FX ( x ) − FX ( x − ).

X is said to be a continuous-type r.v if its distribution function FX(x) is
continuous. In that case FX(x -) = FX(x) for all x, and from property (vii) we
get P{X = x} = 0.
If FX(x) is constant except for a finite number of jump discontinuities

(piece-wise constant; step-type), then X is said to be a discrete-type r.v.
If xi is such a discontinuity point, then from (vii)
pi = P{X = xi } = FX ( xi ) − FX ( xi− ).

Example: X is a r.v such that X(ζ) = c, ζ ∈ Ω. Find FX(x).
Solution: For x < c, {X(ζ) ≤ x } = {φ}, so that FX(x) = 0, and for x > c,
{X(ζ) ≤ x } = Ω, so that FX(x) = 1.
FX (x) FX (x)
1 1
q
x x
c 1
Example: Toss a coin. Ω = {H, T}. Suppose the r.v X is such that X(T) = 0,
X(H) = 1. Find FX(x).
Solution: For x < 0, {X(ζ) ≤ x } = {φ}, so that FX(x) = 0.
0 ≤ x < 1, { X (ξ ) ≤ x } = { T }, so that FX ( x) = P{ T } = 1 − p,
x ≥ 1, { X (ξ ) ≤ x } = { H , T } = Ω, so that FX ( x) = 1.
Example: A fair coin is tossed twice, and let the r.v X represent the number
of heads. Find FX(x).
Solution: In this case Ω = {HH, HT, TH, TT}, and X(HH) = 2, X(HT) =
X(TH) = 1, X(TT) = 0.
x < 0, {X (ξ ) ≤ x} = φ ⇒ FX ( x) = 0,
1
0 ≤ x < 1, {X (ξ ) ≤ x} = { TT } ⇒ FX ( x) = P{ TT } = P (T ) P(T ) = ,
4
3
1 ≤ x < 2, {X (ξ ) ≤ x} = { TT , HT , TH } ⇒ FX ( x) = P{ TT , HT , TH } = ,
4
x ≥ 2, {X (ξ ) ≤ x} = Ω ⇒ FX ( x) = 1.
P{X = 1} = FX (1) − FX (1− ) = 3 / 4 − 1 / 4 = 1 / 2.
FX (x)
1
3/ 4
1/ 4
x
1 2
Probability Density Function (p.d.f): The derivative of the distribution
function FX(x) is called the probability density function fX(x) of the r.v X.
Thus dF ( x )
f X ( x) = X .
dx
Since
dFX ( x ) F ( x + Δx ) − FX ( x )
= lim X ≥ 0,
dx Δx → 0 Δx
from the monotone-nondecreasing nature of FX(x), it follows that fX(x) ≥ 0
for all x. fX(x) will be a continuous function, if X is a continuous type r.v.
However, if X is a discrete type r.v, then its p.d.f has the general form
f X ( x)
f X ( x) = ∑ piδ ( x − xi ), pi
i
xi x

where xi represent the jump-discontinuity points in FX(x). In the figure,
fX(x) represents a collection of positive discrete masses, and it is known as
the probability mass function (p.m.f ) in the discrete case.
f X (x)
pi
xi x
We also obtain
x
FX ( x) = ∫ f x (u )du.
−∞
Since FX(+ ∞) = 1, it yields

+∞
∫−∞
f x ( x )dx = 1,
which justifies its name as the density function. Further, we also get
P{ x1 < X (ξ ) ≤ x2 } = FX ( x2 ) − FX ( x1 ) = ∫ f X ( x )dx.
x2
x1

Thus the area under fX(x) in the interval (x1, x2) represents the probability in
the equation
P{ x1 < X (ξ ) ≤ x2 } = FX ( x2 ) − FX ( x1 ) = ∫ f X ( x )dx.
x2
x1
FX (x) f X (x)
1
x1 x2 x x
x1 x2
(a) (b)
Often, r.vs are referred by their specific density functions - both in the
continuous and discrete cases - and in what follows we shall list a number
of them in each category.

Continuous-type Random Variables
1. Normal (Gaussian): X is said to be normal or Gaussian r.v, if
1 − ( x − μ ) 2 / 2σ 2
f X ( x) = e .
2πσ 2
This is a bell shaped curve, symmetric around the parameter μ and its
distribution function is given by
x 1 ⎛x−μ⎞
FX ( x ) = ∫
2 2
e −( y − μ ) / 2σ dy = G⎜ ⎟,
−∞
2πσ 2
⎝ σ ⎠
x 1 − y2 / 2
where G ( x ) = ∫ e dy is often tabulated. Since fX(x) depends on
−∞ 2π
two parameters μ and σ2, the notation X ∼ N(μ, σ2) will be used to
represent fX(x). f X (x)
x
μ
2. Uniform: X ∼ U(a,b), a < b, if
⎧⎪ 1 f X (x)
, a ≤ x ≤ b, 1
f X ( x) = ⎨ b − a
b−a
⎪⎩ 0, otherwise. x
a b
3. Exponential: X ∼ ε (λ), if
⎧⎪ 1 − x / λ f X (x)
e , x ≥ 0,
f X ( x) = ⎨ λ
4. Gamma: X ∼ G(α, β), if α > 0, β > 0

f X ( x)
⎧ xα −1 −x / β
⎪ e , x ≥ 0,
f X ( x ) = ⎨ Γ(α ) β α
⎪⎩ 0, otherwise.
If α = n an integer, Γ(n) = (n-1)!
5. Beta: X ∼ β(a,b) if (a > 0, b > 0) f X ( x)
⎧ 1
⎪ x a −1 (1 − x )b−1 , 0 < x < 1,
f X ( x ) = ⎨ β ( a , b) x
⎪⎩ 0, otherwise. 0 1
where the Beta function β(a,b) is defined as

1
β (a, b) = ∫ u a −1 (1 − u )b−1 du.
0
6. Chi-Square: X ∼ χ 2(n) if f X ( x)
⎧ 1
⎪ n/2 x n / 2−1e − x / 2 , x ≥ 0,
f X ( x ) = ⎨ 2 Γ( n / 2)
Note that χ 2(n) is the same as Gamma (n/2, 2)

7. Rayleigh: X ∼ R(σ2), if f X ( x)
⎧⎪ x − x 2 / 2σ 2
e , x ≥ 0,
f X ( x ) = ⎨σ 2
8. Nakagami – m distribution:
⎧ 2 ⎛ m ⎞ m 2 m −1 − mx 2 / Ω
⎪ , x≥0
f X ( x ) = ⎨ Γ( m ) ⎜⎝ Ω ⎟⎠
x e
⎪ 0 otherwise
⎩
9. Cauchy: X ∼ C(α, μ), if f X ( x)

α /π
f X ( x) = , − ∞ < x < +∞.
α + (x − μ)
2 2
x
μ
10. Laplace: fX (x)
1 −|x|/ λ
f X ( x) = e , − ∞ < x < +∞.
2λ
x
11. Student’s t-distribution with n degrees of freedom:

fT ( t )
Γ(( n + 1) / 2 ) ⎛ t ⎞2 − ( n +1) / 2
fT (t ) = ⎜⎜1 + ⎟⎟ , − ∞ < t < +∞.

πn Γ ( n / 2 ) ⎝ n ⎠
t
12. Fisher’s F-distribution:
⎧Γ{( m + n ) / 2} m m / 2 n n / 2 z m / 2 −1
⎪ , z≥0
f z ( z) = ⎨ Γ ( m / 2) Γ ( n / 2) ( n + mz ) (m+n) / 2
⎪ 0 otherwise
⎩

Discrete-type Random Variables
1. Bernoulli: X takes the values (0,1), and

P ( X = 0) = q, P ( X = 1) = p.
2. Binomial: X ∼ B(n,p) if P(X = k)
⎛n⎞
P( X = k ) = ⎜⎜ ⎟⎟ p k q n −k , k = 0,1,2, , n.
⎝k ⎠ k
12 n
3. Poisson: X ∼ P(λ) if
P(X = k)
−λ λ
k
P( X = k ) = e , k = 0,1,2, , ∞.
k!
4. Discrete-Uniform
1
P( X = k ) = , k = 1,2, , N.
N
2. Review of Probability Theory: Function of R. V (1)
Let X be a r.v defined on the model (Ω, F, P) and suppose g(x) is a
function of the variable x. Define
Y = g(X)
Is Y necessarily a r.v ? If so what is its PDF FY(y) and pdf fY(y) ?
Consider some of the following functions to illustrate the technical details.
Example 1: Y = aX + b
Suppose a > 0,
⎛ y −b⎞ ⎛ y −b⎞
FY ( y ) = P (Y (ξ ) ≤ y ) = P (aX (ξ ) + b ≤ y ) = P⎜ X (ξ ) ≤ ⎟ = FX⎜ ⎟.
⎝ a ⎠ ⎝ a ⎠
and
1 ⎛ y −b⎞
fY ( y ) = fX ⎜ ⎟.
a ⎝ a ⎠

If a < 0, then
⎛ y −b⎞
FY ( y ) = P (Y (ξ ) ≤ y ) = P (aX (ξ ) + b ≤ y ) = P ⎜ X (ξ ) > ⎟
⎝ a ⎠
⎛ y −b⎞
= 1 − FX ⎜ ⎟ ,
⎝ a ⎠
and hence
1 ⎛ y −b⎞
fY ( y ) = − fX ⎜ ⎟.
a ⎝ a ⎠
For all a
1 ⎛ y −b⎞
fY ( y ) = fX ⎜ ⎟.
|a | ⎝ a ⎠

Example 2: Y = X2
FY ( y ) = P (Y (ξ ) ≤ y ) = P (X 2 (ξ ) ≤ y ).
If y < 0 then the event {X2(ζ) ≤ y} = φ, and hence
FY ( y ) = 0, y < 0.
For y > 0, from figure, the event {Y(ζ) ≤ y} = {X2(ζ) ≤ y} is equivalent to
{x1 ≤ X(ζ) ≤ x2}. Hence
FY ( y ) = P ( x1 < X (ξ ) ≤ x2 ) = FX ( x2 ) − FX ( x1 ) Y = X2
= FX ( y ) − FX ( − y ), y > 0. y
By direct differentiation, we get X

x1 x2
⎧ 1
⎪ (
f ( y ) + f X (− y ) ,
fY ( y ) = ⎨ 2 y X
)
y > 0,

If fX(x) represents an even function, then fY(y) reduces to
fY ( y ) =
1
y
( )
f X y U ( y ).
In particular, if X ∼ N(0,1), so that

1 − x2 / 2
f X ( x) = e ,
2π
then, we obtain the p.d.f of Y = X2 to be

1
fY ( y ) = e − y / 2U ( y ).
2πy
we notice that this equation represents a Chi-square r.v with n = 1, since

Γ(1/2) = √π .
Thus, if X is a Gaussian r.v with μ = 0, then Y = X2 represents a
Chi-square r.v with one degree of freedom (n = 1).

Example 3: Let g( X )
⎧ X − c, X > c,
−c
⎪
Y = g ( X ) = ⎨ 0, − c < X ≤ c, c X
⎪ X + c, X ≤ −c.
⎩
In this case
P (Y = 0) = P ( −c < X (ξ ) ≤ c ) = FX (c ) − FX ( −c ).
FX (x )
For y > 0, we have x > c, and Y(ζ) = X(ζ) – c, so that
FY ( y ) = P (Y (ξ ) ≤ y ) = P ( X (ξ ) − c ≤ y ) x
= P ( X (ξ ) ≤ y + c ) = FX ( y + c ), y > 0.
Similarly y < 0, if x < - c, and Y(ζ) = X(ζ) + c, so that FY ( y )
FY ( y ) = P (Y (ξ ) ≤ y ) = P ( X (ξ ) + c ≤ y )
= P ( X (ξ ) ≤ y − c ) = FX ( y − c ), y < 0. y
Thus
⎧ f X ( y + c ), y > 0,
⎪
f Y ( y ) = ⎨[ FX ( c ) − FX ( − c )]δ ( y ),
⎪ f ( y − c ), y < 0.
⎩ X
Example 4: Half-wave rectifier Y
⎧ x, x > 0,
Y = g ( X ); g ( x ) = ⎨
⎩0, x ≤ 0.
X
In this case
P(Y = 0) = P( X (ξ ) ≤ 0) = FX (0).
and for y > 0, since Y = X

FY ( y ) = P (Y (ξ ) ≤ y ) = P ( X (ξ ) ≤ y ) = FX ( y ).
Thus
⎧ f X ( y ), y > 0,
⎪
fY ( y ) = ⎨ FX (0)δ ( y ) y = 0, = f X ( y )U ( y ) + FX (0)δ ( y ).
⎪ 0, y < 0,
⎩

Note: As a general approach, given Y = g(X), first sketch the graph
y = g(x), and determine the range space of y. Suppose a < y < b is the range
space of y = g(x). Then clearly for y < a, FY(y) = 0, and for y > b, FY(y) = 1,
so that FY(y) can be nonzero only in a < y < b. Next, determine whether there
are discontinuities in the range space of y. If so, evaluate P(Y(ζ) = yi) at these
discontinuities. In the continuous region of y, use the basic approach
FY ( y ) = P (g ( X (ξ )) ≤ y )
and determine appropriate events in terms of the r.v X for every y. Finally,
we must have FY(y) for -∞ < y < + ∞ and obtain
dFY ( y )
fY ( y ) = in a < y < b.
dy

However, if Y = g(X) is a continuous function, it is easy to obtain fY(y)
directly as
1 1
fY ( y ) = ∑ f X ( xi ) = ∑ f X ( xi ). (♦)
i dy / dx x ′
i g ( xi )
i
The summation index i in this equation depends on y, and for every y the
equation y = g(xi) must be solved to obtain the total number of solutions at
every y, and the actual solutions x1, x2, … all in terms of y.
For example, if Y = X2, then for all y > 0, x1 = -√y and x2 = +√y represent
the two solutions for each y. Notice that the solutions xi are all in terms of
y so that the right side of (♦) is only a function of y. Moreover
Y = X2
dy dy y
= 2 x so that =2 y
dx dx x = xi
X
x1 x2

Using (♦), we obtain
⎧ 1
⎪ ( f ( y ) + f X (− y ) ,
fY ( y ) = ⎨ 2 y X
)
y > 0,
⎪⎩ 0, otherwise ,
which agrees with the result in Example 2.
Example 5: Y = 1/X. Find fY(y)
Here for every y, x1 = 1/y is the only solution, and

dy 1 dy 1
= − 2 so that = 2
= y 2
,
dx x dx x = x1 1/ y
Then, from (♦), we obtain
1 ⎛1⎞
fY ( y ) = f
2 X⎜
⎜ ⎟⎟.
y ⎝ y⎠
In particular, suppose X is a Cauchy r.v with parameter α so that
α /π
f X ( x) = 2 , − ∞ < x < +∞.
α + x2
In this case, Y = 1/X has the p.d.f
1 α /π (1 / α ) / π
fY ( y ) = 2 2 = , − ∞ < y < +∞.
y α + (1 / y ) 2
(1 / α ) + y
2 2
But this represents the p.d.f of a Cauchy r.v with parameter (1/α). Thus if
X ∼ C(α), then 1/X ∼ C(1/α).
Example 6: Suppose fX(x) = 2x / π 2, 0 < x < π and Y = sinX . Determine fY(y).
Since X has zero probability of falling outside the interval (0, π), y = sinx has
zero probability of falling outside the interval (0, 1). Clearly fY(y) = 0 outside
this interval.

For any 0 < y < 1, the equation y = sinx has an infinite number of solutions
…x1, x2, x3,… (see the Figure below), where x1 = sin-1y is the principal
solution. Moreover, using the symmetry we also get x2 = π - x1 etc. Further,
dy
= cos x = 1 − sin 2 x = 1 − y 2
dx
so that
dy fX (x)
= 1− y .
2
dx x = xi
x3
x
(a) π
y = sin x
x
x −1 x1 x2 x3
π
(b)

From (♦), we obtain for 0 < y < 1
+∞
1
fY ( y ) = ∑ f X ( xi ).
i =−∞ 1− y 2
i ≠0
But from the figure, in this case fX(-x1) = fX(x3) = fX(x4) = … = 0

(Except for fX(x1) and fX(x2) the rest are all zeros). Thus
1 1 ⎛ 2 x1 2 x2 ⎞
fY ( y ) = ( f X ( x1 ) + f X ( x2 ) ) = ⎜ 2 + 2 ⎟
1− y 2
1− y ⎝ π
2 π ⎠
⎧ 2 fY ( y )
2( x1 + π − x1 ) ⎪ , 0 < y < 1,
= = ⎨π 1 − y 2
π 1− y
2 2
2
π
y
1

Functions of a discrete-type r.v
Suppose X is a discrete-type r.v with P(X = xi) = pi , x = x1, x2,…, xi, …

and Y = g(X). Clearly Y is also of discrete-type, and when x = xi, yi = g(xi)
and for those yi , P(Y = yi) = P(X = xi) = pi , y = y1, y2,…, yi, …
Example 7: Suppose X ∼ P(λ), so that

−λ λk
P( X = k ) = e , k = 0,1,2,
k!
Define Y = X2 + 1. Find the p.m.f of Y.
X takes the values 0, 1, 2,…, k, … so that Y only takes the value 1, 2, 5,…,
k2+1,… and P(Y = k2+1) = P(X = k) so that for j = k2+1
λ j −1
P(Y = j ) = P X = ( )
j −1 = e −λ
( j − 1)!
, j = 1, 2, 5, , k 2 + 1, .

2. Review of Probability Theory: Moments… (1)
Mean or the Expected Value of a r.v X is defined as
+∞
η X = X = E ( X ) = ∫ x f X ( x )dx.
−∞
If X is a discrete-type r.v, then we get

η X = X = E ( X ) = ∫ x ∑ piδ ( x − xi )dx = ∑ xi pi ∫ δ ( x − xi )dx
i i
1
= ∑ xi pi = ∑ xi P( X = xi ) .
i i
Mean represents the average (mean) value of the r.v in a very large number
of trials. For example if X ∼ U(a,b), then
b
b x 1 x2 b2 − a 2 a + b
E( X ) = ∫ dx = = =
a b−a b − a 2 a 2( b − a ) 2
is the midpoint of the interval (a,b).

On the other hand if X is exponential with parameter λ, then
∞ x ∞
E( X ) = ∫ e −x / λ
dx = λ ∫ ye − y dy = λ ,
0 λ 0
implying that the parameter λ represents the mean value of the exponential
r.v.
Similarly if X is Poisson with parameter λ , we get

∞ ∞
λk ∞
λk
E ( X ) = ∑ kP ( X = k ) = ∑ ke −λ
=e −λ
∑ k k!
k =0 k =0 k! k =1
∞
λk ∞
λi
=e −λ
∑ (k − 1)! = λe ∑ i! = λe λ eλ = λ.
k =1
−λ
i =0
−
Thus the parameter λ also represents the mean of the Poisson r.v.

In a similar manner, if X is binomial, then its mean is given by
n
⎛ n ⎞ k n −k
n n
n!
E ( X ) = ∑ kP ( X = k ) = ∑ k ⎜⎜ ⎟⎟ p q = ∑ k p k q n −k
k =0 k =0 ⎝ k ⎠ k =1 (n − k )! k!
n
n! n −1
( n − 1)!
=∑ p q = np ∑
k n −k
p i q n −i −1 = np( p + q)n −1 = np.
k =1 ( n − k )! ( k − 1)! i =0 ( n − i − 1)! i!
Thus np represents the mean of the binomial r.v.
For the normal r.v,

1 +∞ 1 +∞
∫ ∫
2
/ 2σ 2 2
/ 2σ 2
E( X ) = xe −( x − μ ) dx = ( y + μ )e − y dy
2πσ 2 −∞
2πσ 2 −∞
1 +∞ 1 +∞
∫ ∫
2
/ 2σ 2 2
/ 2σ 2
= ye − y dy + μ ⋅ e− y dy = μ .
2πσ 2 −∞
2πσ 2 −∞
0
1

Given X ∼ fX(x), suppose Y = g(X) defines a new r.v with p.d.f fY(y), the new
r.v Y has a mean μY given by
+∞ +∞
μY = E (Y ) = E ( g ( X ) ) = ∫ y fY ( y )dy = ∫ g ( x) f X ( x)dx.
−∞ −∞
In the discrete case,

E (Y ) = ∑ g ( xi )P( X = xi ).
i
From the equations, fY(y) is not required to evaluate E(Y) for Y=g(X).

Example: Determine the mean of Y = X2 where X is a Poisson r.v.
∞ ∞
λk ∞
λk
E (X 2
)= ∑ k 2
P( X = k ) = ∑k 2
e −λ
=e −λ
∑k 2
k =0 k =0 k! k =1 k!
∞
λk ∞
λi +1
=e −λ
∑ k ( k − 1)! = e ∑ ( i + 1)
k =1
−λ
i=0 i!
⎛ ∞ λi ∞
λi ⎞ ⎛ ∞ λi λ ⎞
= λe −λ
⎜⎜ ∑ i + ∑ ⎟ = λe −λ
⎜⎜ ∑ i + e ⎟⎟
⎝ i=0 i ! i=0 i! ⎟⎠ ⎝ i =1 i ! ⎠
⎛ ∞ λi λ ⎞ −λ ⎛
∞
λ m +1 ⎞
= λ e ⎜⎜ ∑
−λ
+ e ⎟⎟ = λ e ⎜⎜ ∑ + e λ ⎟⎟
⎝ i =1 ( i − 1)! ⎠ ⎝ m =0 m! ⎠
= λ e − λ (λ e λ + e λ ) = λ 2 + λ .
In general, E(Xk) is known as the kth moment of r.v X. Thus if X ∼ P(λ), its
second moment is given by the above equation.

PILLAI
For a r.v X with mean μ , X - μ represents the deviation of the r.v from its
mean. Since this deviation can be either positive or negative, consider the
quantity (X - μ )2 and its average value E[(X - μ )2 ] represents the average
mean square deviation of X around its mean. Define
σ 2 = E[( X − μ ) 2 ] > 0.
X
With g(X) = (X - μ )2 , we get

+∞
σ = ∫ ( x − μ )2 f X ( x )dx > 0.
2
X −∞
where σX2 is known as the variance of the r.v X, and its square root
σ X = E ( X − μ ) 2 is known as the standard deviation of X. Note that the
standard deviation represents the root mean square spread of the r.v X
around its mean μ .

Alternatively, the variance can be calculated by
Var ( X ) = σ X2 = ∫
+∞
−∞
(x 2
− 2 xμ + μ 2 ) f X ( x )dx
+∞ +∞
=∫ x f X ( x )dx − 2 μ ∫ x f X ( x )dx + μ 2
2
−∞ −∞
= E (X )− μ = E (X ) − [E ( X )] 2
___
2
2 2 2
=X −X .
2
Example: Determine the variance of Poisson r.v
σ = X − X = (λ2 + λ ) − λ2 = λ.
___
2
2 2
X
Thus for a Poisson r.v, mean and variance are both equal to its parameter λ

Example: Determine the variance of the normal r.v N(μ , σ2 )
We have +∞ 1
Var ( X ) = E [( X − μ ) 2 ] = ∫ ( x − μ)
2
/ 2σ 2
e −( x − μ )
2
dx.
−∞
2πσ 2
Use of the identity

+∞ +∞ 1
∫ f X ( x )dx = ∫
2
/ 2σ 2
e −( x − μ ) dx = 1
−∞ −∞
2πσ 2
for a normal p.d.f. This gives

+∞
∫
2
/ 2σ 2
e −( x − μ ) dx = 2π σ .
−∞
Differentiating both sides with respect to σ , we get

+∞ ( x − μ )
2
∫
−( x − μ ) 2 / 2σ 2
e dx = 2π
3 −∞ σ
or
+∞ 1
∫ −∞ ( )
2 2
μ e −( x− μ ) / 2σ
dx = σ 2 ,
2
x −
2 πσ 2

Moments: In general ___
mn = X = E ( X n ), n ≥ 1
n
are known as the moments of the r.v X, and

μn = E[( X − μ )n ]
are known as the central moments of X. Clearly, the mean μ = m1 ,
and the variance σ2 = μ2. It is easy to relate mn and μn. Infact
⎛ n ⎛n⎞ k n −k ⎞
μn = [( − μ ) ] = ⎜ ∑ ⎜⎜ ⎟⎟ ( − μ ) ⎟⎟
E X n
E ⎜ X
⎝ k =0 ⎝ k ⎠ ⎠
⎛n⎞ ⎛n⎞
= ∑ ⎜⎜ ⎟⎟E (X ) ( − μ ) = ∑ ⎜⎜ ⎟⎟ mk ( − μ )n −k .
n n
k n −k
k =0 ⎝ k ⎠ k =0 ⎝ k ⎠
In general, the quantities E[(X - a)n] are known as the generalized

moments of X about a, and E[⏐X⏐n] are known as the absolute moments
of X.

Characteristic Function of a r.v X is defined as
Φ X (ω ) = E (e )= ∫ +∞
jXω
e jxω f X ( x )dx.
−∞
Thus ΦX(0) = 1, and ⏐ΦX(ω)⏐ ≤ 1 for all ω.
For discrete r.vs the characteristic function reduces to

Φ X (ω ) = ∑ e jkω P( X = k ).
k
Example: If X ∼ P(λ), then its characteristic function is given by

∞
jkω − λ λ
k ∞
( λe jω ) k
Φ X (ω ) = ∑ e e =e ∑
jω jω
−λ
= e −λ e λe = e λ ( e −1) .
k =0 k! k =0 k!
Example: If X is a binomial r.v, its characteristic function is given by

n
⎛ n ⎞ k n −k n ⎛ n ⎞
Φ X (ω ) = ∑ e jkω
⎜⎜ ⎟⎟ p q = ∑ ⎜⎜ ⎟⎟( pe jω )k q n −k =( pe jω + q) n .
k =0 ⎝k ⎠ k =0 ⎝ k ⎠

The characteristic function can be used to compute the mean, variance and
other higher order moments of any r.v X.
⎡ ∞ ( jωX ) k ⎤ ∞ k E ( X k ) k
Φ X (ω ) = E (e ) = E ⎢∑
jXω
⎥=∑j ω
⎣ k =0 k ! ⎦ k =0 k !
2
2 E( X ) k E( X )
k
= 1 + jE ( X )ω + j ω + +j
2
ωk + .
2! k!
Taking the first derivative with respect to ω, and letting it to be equal to zero,
we get ∂Φ X (ω ) 1 ∂Φ X (ω )
= jE ( X ) or E ( X ) = .
∂ω ω =0 j ∂ω ω =0
Similarly, the second derivative gives

1 ∂ 2Φ X (ω )
E( X ) = 2
2
,
j ∂ω 2
ω =0
and repeating this procedure k times, we obtain the kth moment of X to be
1 ∂ k Φ X (ω )
E( X ) = k
k
, k ≥ 1.
j ∂ω k
ω =0

Example: If X ∼ P(λ), then
∂Φ X (ω ) jω
= e −λ e λe λje jω ,
∂ω
so that E(X) = λ.
The second derivative gives

∂ 2Φ X (ω )
∂ω 2
= e e (
− λ λe jω
( λje jω 2
) + e λe j ω
λj )
2 jω
e ,
so that
E ( X 2 ) = λ2 + λ ,

2. Review of Probability Theory: Two r.vs (1)
Let X and Y denote two random variables (r.v) based on a probability
model (Ω, F, P). Then
P ( x1 < X (ξ ) ≤ x2 ) = FX ( x2 ) − FX ( x1 ) = ∫ f X ( x )dx,
x2
and
x1
P ( y1 < Y (ξ ) ≤ y2 ) = FY ( y2 ) − FY ( y1 ) = ∫ fY ( y )dy.
y2
y1
What about the probability that the pair of r.vs (X,Y) belongs to an arbitrary
region D? In other words, how does one estimate, for example,
P[( x1 < X (ξ ) ≤ x2 ) ∩ ( y1 < Y (ξ ) ≤ y2 )] = ?
Towards this, we define the joint probability distribution function of X

and Y to be
FXY ( x, y ) = P[( X (ξ ) ≤ x ) ∩ (Y (ξ ) ≤ y )]
= P( X ≤ x, Y ≤ y ) ≥ 0,
where x and y are arbitrary real numbers.

Properties
(i) FXY ( −∞, y ) = FXY ( x,−∞) = 0, FXY ( +∞,+∞) = 1.
(ii) P ( x1 < X (ξ ) ≤ x2 , Y (ξ ) ≤ y ) = FXY ( x2 , y ) − FXY ( x1 , y ).

P ( X (ξ ) ≤ x, y1 < Y (ξ ) ≤ y2 ) = FXY ( x, y2 ) − FXY ( x, y1 ).
(iii) P ( x1 < X (ξ ) ≤ x2 , y1 < Y (ξ ) ≤ y2 ) = FXY ( x2 , y2 ) − FXY ( x2 , y1 )

− FXY ( x1 , y2 ) + FXY ( x1 , y1 ).
This is the probability that (X,Y) belongs to the rectangle R0.
Y
y2
R0
y1
x1 x2
Joint probability density function (Joint p.d.f): By definition, the
joint p.d.f of X and Y is given by
∂ 2 FXY ( x, y )
f XY ( x, y ) = .
∂x ∂y
and hence we obtain the useful formula
x y
FXY ( x, y ) = ∫ ∫ f XY (u, v ) dudv.
−∞ −∞
Using property (i), we also get

+∞ +∞
∫ ∫
−∞ −∞
f XY ( x, y ) dxdy = 1.
The probability that (X,Y) belongs to an arbitrary region D is given by
P (( X , Y ) ∈ D ) = ∫ ∫ f XY ( x, y )dxdy.
( x , y )∈D

Marginal Statistics: In the context of several r.vs, the statistics of each
individual ones are called marginal statistics. Thus FX(x) is the marginal
probability distribution function of X, and fX(x) is the marginal p.d.f of X.
It is interesting to note that all marginals can be obtained from the joint p.d.f.
In fact
FX ( x ) = FXY ( x,+∞), FY ( y ) = FXY ( +∞, y ).
Also +∞ +∞
f X ( x ) = ∫ f XY ( x, y )dy , fY ( y ) = ∫ f XY ( x, y )dx.
−∞ −∞
If X and Y are discrete r.vs, then pij = P(X = xi, Y = yi) represents their
joint p.d.f, and their respective marginal p.d.fs are given by
P( X = xi ) = ∑ P( X = xi , Y = y j ) = ∑ pij
j j
P(Y = y j ) = ∑ P( X = xi , Y = y j ) = ∑ pij
i i
The joint P.D.F and/or the joint p.d.f represent complete information about the r.vs,
and their marginal p.d.fs can be evaluated from the joint p.d.f. However, given
marginals, (most often) it will not be possible to compute the joint p.d.f.
Independence of r.vs
Definition: The random variables X and Y are said to be statistically
independent if the events {X(ζ) ∈ A} and {Y(ζ) ∈ B} are independent
events for any two sets A and B in x and y axes respectively. Applying the
above definition to the events {X(ζ) ≤ x} and {Y(ζ) ≤ y}, we conclude that,
if the r.vs X and Y are independent, then
P (( X (ξ ) ≤ x ) ∩ (Y (ξ ) ≤ y ) ) = P ( X (ξ ) ≤ x ) P (Y (ξ ) ≤ y )
i.e.,
FXY ( x, y ) = FX ( x ) FY ( y )
or equivalently, if X and Y are independent, then we must have
f XY ( x, y ) = f X ( x ) fY ( y ).
If X and Y are discrete-type r.vs then their independence implies

P( X = xi , Y = y j ) = P ( X = xi ) P(Y = y j ) for all i, j.

The equations in previous slide give us the procedure to test for
independence. Given fXY(x,y), obtain the marginal p.d.fs fX(x) and fY(x)
and examine whether these equations (last two equations) are valid. If so,
the r.vs are independent, otherwise they are dependent.
Example: Given ⎧ xy 2e − y , 0 < y < ∞, 0 < x < 1,
f XY ( x, y ) = ⎨
⎩ 0, otherwise.
Determine whether X and Y are independent.
+∞ ∞
We have f X ( x ) = ∫ f XY ( x, y )dy = x ∫ y 2e − y dy
0 0
= x ⎛⎜ − 2 ye − y + 2 ∫ ye − y dy ⎞⎟ = 2 x, 0 < x < 1.
∞ ∞
⎝ 0 0 ⎠
Similarly,
1 y2 −y
fY ( y ) = ∫ f XY ( x, y )dx = e , 0 < y < ∞.
0 2
In this case, f XY ( x, y ) = f X ( x ) fY ( y ), and hence X and Y are independent r.vs.
2. Review of Probability Theory: Func. of 2 r.vs (1)
Given two random variables X and Y and a function g(x,y), we form a new
random variable Z = g(X, Y).
Given the joint p.d.f fXY(x, y), how does one obtain fZ(z) the p.d.f of Z ?
Problems of this type are of interest from a practical standpoint. For example,
a receiver output signal usually consists of the desired signal buried in noise,
and the above formulation in that case reduces to Z = X + Y.
It is important to know the statistics of the incoming signal for proper

receiver design. In this context, we shall analyze problems of the following
type: X +Y
max( X , Y ) X −Y
min( X , Y ) Z = g ( X ,Y ) XY
X 2 +Y 2 X /Y
tan −1 ( X / Y )

Start with: FZ ( z ) = P (Z (ξ ) ≤ z ) = P (g ( X , Y ) ≤ z ) = P[( X , Y ) ∈ Dz ]
=∫ ∫ f XY ( x, y )dxdy ,
x , y∈Dz
where Dz in the XY plane represents the region such that g(x, y) ≤ z is

satisfied. To determine FZ(z), it is enough to find the region Dz for every z,
and then evaluate the integral there.
Example 1: Z = X + Y. Find fZ(z).

+∞ z− y
FZ ( z ) = P ( X + Y ≤ z ) = ∫ ∫ f XY ( x, y )dxdy ,
y = −∞ x = −∞
since the region Dz of the xy plane where x+y ≤ z y
is the shaded area in the figure to the left of the

line x+y = z. Integrating over the horizontal strip x= z− y
along the x-axis first (inner integral) followed by

x
sliding that strip along the y-axis from -∞ to +∞
(outer integral) we cover the entire shaded area.
We can find fZ(z) by differentiating FZ(z) directly. In this context, it is useful
to recall the differentiation rule due to Leibnitz. Suppose
b( z )
H ( z) = ∫ h( x, z )dx.
a( z)
Then
dH ( z ) db( z ) da ( z ) b ( z ) ∂h ( x , z )
= h (b( z ), z ) − h (a ( z ), z ) + ∫ dx.
dz dz dz a ( z ) ∂z
Using above equations, we get

⎛ ∂ z− y +∞ ⎛ z − y ∂f
⎞ XY ( x , y ) ⎞
+∞
fZ ( z) = ∫ ⎜ ∫ −∞ XYf ( x , y ) dx ⎟ dy = ∫ ⎜ f ( z − y , y ) − 0 + ∫ ⎟ dy
⎝ ∂z ∂z
XY
−∞
⎠ −∞
⎝ −∞
⎠
+∞
=∫ f XY ( z − y , y )dy.
−∞
If X and Y are independent, fXY(x,y) = fX(x)fY(y), then we get

+∞ +∞
fZ ( z) = ∫ f X ( z − y ) fY ( y )dy = ∫ f X ( x ) fY ( z − x )dx. (Convolution!)
y = −∞ x = −∞

Example 2: X and Y are independent normal r.vs with zero mean and
common variance σ2 . Determine fZ(z) for Z = X2 + Y2 .
We have
FZ ( z ) = P (X 2 + Y 2 ≤ z ) = ∫ ∫ f XY ( x, y )dxdy.
X 2 +Y 2 ≤ z
But, X2 + Y2 ≤ z represents the area of a circle with radius √z and hence

z z− y2
FZ ( z ) = ∫ ∫ f XY ( x, y )dxdy.
y =− z x =− z − y 2 y
This gives after repeated differentiation z

X 2 +Y 2 = z
f Z ( z) = z x
∫
z
y =− z
1
2 z−y 2
(f XY ( z − y 2 , y ) + f XY (− z − y 2 , y ) dy. )
− z
(*)

Moreover, X and Y are said to be jointly normal (Gaussian) distributed, if
their joint p.d.f has the following form:
−1 ⎛⎜ ( x − μ X ) 2 2 ρ ( x − μ X )( y − μY ) ( y − μY ) 2 ⎞⎟
− +
1 2 (1− ρ 2 ) ⎜⎝ σ X2 σ XσY σ Y2 ⎟
f XY ( x, y ) = e ⎠
,
2πσ X σ Y 1 − r 2
− ∞ < x < +∞, − ∞ < y < +∞, | r |< 1.

with zero mean 1 ⎛ x 2 2 rxy y 2 ⎞
− ⎜ − + ⎟
1 2 (1− r 2 ) ⎜⎝ σ 12 σ 1σ 2 σ 22 ⎟
f XY ( x, y ) = e ⎠
.
2πσ 1σ 2 1 − r 2
with r = 0 and σ1 = σ2 = σ, then direct substitution into (*), we get

2
z ⎛ 1 1 ⎞ e − z / 2σ z 1
fZ ( z) = ∫ ∫
2 2 2
−( z− y + y ) / 2σ
⎜ 2 ⋅ e ⎟ dy = dy
2 z − y 2 ⎝ 2πσ πσ 2
2
y =− z
⎠ 0
z− y 2
− z / 2σ 2
e π /2 z cos θ 1 − z / 2σ 2
=
πσ 2 ∫ 0
z cos θ
dθ =
2σ 2
e U ( z ),
with y = z sin θ .
Thus, if X and Y are independent zero mean Gaussian r.vs with common
variance σ2 then X2 + Y2 is an exponential r.vs with parameter 2σ2.
Example 3: Let Z = X 2 + Y 2 . Find fZ(z)

y
From the figure, the present case corresponds to a
circle with radius z2. Thus
z
2 2
X 2 +Y 2 = z
z z −y
FZ ( z ) = ∫ ∫ 2 2
f XY ( x, y )dxdy. z x
y =− z x =− z − y
f Z ( z) = ∫
z
−z
z −y
2
z
2
(f XY ( z 2
− y 2
, y ) + f XY ( − z 2
− y 2
, y ) dy. ) − z
Now suppose X and Y are independent Gaussian as in Example 2, we obtain

z z 1 2 z − z / 2σ z 1
f Z ( z) = 2∫ ∫
2 2 2 2 2 2
( z − y + y ) / 2σ
e dy = e dy
2 2πσ 2 πσ 2
0
z −y
2 0
z −y
2 2
2z π/2 z cosθ z
∫ Rayleigh distribution!
2
/ 2σ 2 2 2
= e −z
dθ = 2 e − z / 2σ U ( z ),
πσ 2 0 z cosθ σ
Thus, if W = X + iY, where X and Y are real, independent normal r.vs with
zero mean and equal variance, then the r.v W = X 2 + Y 2 has a Rayleigh
density. W is said to be a complex Gaussian r.v with zero mean, whose real
and imaginary parts are independent r.vs and its magnitude has Rayleigh
distribution.
⎛X ⎞
What about its phase θ = tan −1 ⎜ ⎟?
⎝Y ⎠
Clearly, the principal value of θ lies in the interval (-π/2, +π/2). If we let
U = tan θ = X/Y , then it is shown that U has a Cauchy distribution with
1/ π
fU ( u ) = , − ∞ < u < ∞.
u2 + 1
As a result
1 1 1/π ⎧1 / π , − π / 2 < θ < π / 2,
fθ (θ ) = fU (tan θ ) = =⎨
| dθ / du | (1 / sec 2 θ ) tan 2 θ + 1 ⎩ 0, otherwise .

To summarize, the magnitude and phase of a zero mean complex Gaussian r.v
has Rayleigh and uniform distributions respectively. Interestingly, as we will
see later, these two derived r.vs are also independent of each other!
Example 4: Redo example 3, where X and Y have nonzero means μX and

μY respectively.
Since 1 −[( x − μ X ) 2 + ( y − μY ) 2 ] / 2σ 2
f XY ( x, y ) = e ,
2πσ 2
Similar to example 3, we obtain the Rician probability density function to be

(e ) dθ
− ( z 2 + μ 2 ) / 2σ 2
ze π/2
∫π
zμ cos(θ −φ ) / σ 2 − zμ cos(θ +φ ) / σ 2
f Z ( z) = +e
2πσ 2 − /2
−( z 2 + μ 2 ) / 2σ 2
ze ⎛⎜ π/2 e zμ cos(θ −φ ) / σ 2 dθ + 3π/2 e zμ cos(θ −φ ) / σ 2 dθ ⎞⎟
⎝ ∫ −π/2 ∫π/2
=
2πσ 2 ⎠
−( z 2 + μ 2 ) / 2σ 2
ze ⎛ zμ ⎞
= I 0 ⎜ 2 ⎟,
2πσ 2 ⎝σ ⎠
where x = z cos θ , y = z sin θ , μ = μ X2 + μY2 , μ X = μ cos φ , μY = μ sin φ ,
and 1 2π 1 π
I 0 (η ) =
2π ∫π ∫0
0
eη cos(θ −φ ) dθ = eη cosθ dθ
is the modified Bessel function of the first kind and zeroth order.
Thus, if X and Y have nonzero means μX and μY, respectively. Then

Z = X 2 + Y 2 is said to be a Rician r.v. Such a scene arises in fading multipath
situation where there is a dominant constant component (mean) in addition to
a zero mean Gaussian r.v. The constant
component may be the line of sight Multipath/Gaussian
signal and the zero mean Gaussian r.v Line of sight noise
part could be due to random multipath signal (constant)
components adding up incoherently Rician
a ∑
(see diagram). The envelope of such Output
a signal is said to have a Rician p.d.f.

2. Review of Probability Theory: Joint Moments… (1)
Given two r.vs X and Y and a function g(x,y), define the r.v Z = g(X,Y), the
mean of Z can be defined as
+∞
μ Z = E ( Z ) = ∫ z f Z ( z )dz.
−∞
or more useful formula

+∞ +∞ +∞
E(Z ) = ∫ z f Z ( z )dz = ∫ ∫ g ( x, y ) f XY ( x, y )dxdy.
−∞ −∞ −∞
If X and Y are discrete-type r.vs, then

E[ g ( X , Y )] = ∑∑ g ( xi , y j ) P( X = xi , Y = y j ).
i j
Since expectation is a linear operator, we also get

⎛ ⎞
E ⎜ ∑ ak g k ( X , Y ) ⎟ = ∑ ak E[ g k ( X , Y )].
⎝ k ⎠ k

If X and Y are independent r.vs, it is easy to see that V=g(X) and W=h(Y)
are always independent of each other. We get the interesting result
+∞ +∞
E[ g ( X )h(Y )] = ∫ ∫ g ( x )h( y ) f X ( x ) fY ( y )dxdy
−∞ −∞
+∞ +∞
= ∫ g ( x ) f X ( x )dx ∫ h( y ) fY ( y )dy = E[ g ( X )]E[h(Y )].
−∞ −∞
In the case of one random variable, we defined the parameters mean and
variance to represent its average behavior. How does one parametrically
represent similar cross-behavior between two random variables? Towards
this, we can generalize the variance definition given as
Covariance: Given any two r.vs X and Y, define

Cov ( X , Y ) = E [( X − μ X )(Y − μY )].
or
Cov( X , Y ) = E ( XY ) − μ X μY = E ( XY ) − E ( X ) E (Y )
____ __ __
= XY − X Y .
It is easy to see that
Cov ( X , Y ) ≤ Var ( X )Var (Y ) .
We define the normalized parameter
Cov( X , Y ) Cov( X , Y )
ρ XY = = , − 1 ≤ ρ XY ≤ 1,
Var ( X )Var (Y ) σ Xσ Y
and it represents the correlation coefficient between X and Y.
Uncorrelated r.vs: If ρXY = 0, then X and Y are said to be uncorrelated r.vs.

If X and Y are uncorrelated, then E(XY) = E(X)E(Y)
Orthogonality: X and Y are said to be orthogonal if E(XY) = 0
If either X or Y has zero mean, then orthogonality implies uncorrelatedness

also and vice-versa. Suppose X and Y are independent r.vs, it also implies
they are uncorrelated.

Naturally, if two random variables are statistically independent, then there
cannot be any correlation between them ρXY = 0. However, the converse
is in general not true. As the next example shows, random variables can
be uncorrelated without being independent.
Example 5: Let Z = aX + bY. Determine the variance of Z in terms of σX, σY

and ρXY .
We have μ Z = E ( Z ) = E ( aX + bY ) = a μ X + bμY
σ Z2 = Var ( Z ) = E ⎡⎣( Z − μ Z )2 ⎤⎦ = E ⎡( a ( X − μ X ) + b(Y − μY ) ) ⎤

2
⎣ ⎦
= a 2 E ( X − μ X ) 2 + 2abE ( ( X − μ X )(Y − μY ) ) + b2 E (Y − μY ) 2
= a 2σ X2 + 2ab ρ XY σ X σ Y + b2σ Y2 .
In particular, if X and Y are independent, then ρXY = 0 and

σ Z2 = a 2σ X2 + b2σ Y2 .
Moments: represents the joint moment of order (k,m) for X and Y
+∞ +∞
E[ X Y ] = ∫
k m
∫ x k y m f XY ( x, y )dx dy ,
−∞ −∞
Following the one random variable case, we can define the joint
characteristic function between two random variables which will turn out to
be useful for moment calculations.
Joint characteristic functions: between X and Y is defined as

Φ XY (u, v ) = E ( e )=∫ ∫
+∞ +∞
j ( Xu +Yv )
e j ( Xu +Yv ) f XY ( x, y )dxdy.
−∞ −∞
Note that Φ XY (u, v ) ≤ Φ XY (0,0) = 1.
It is easy to show that

1 ∂ 2Φ XY (u, v )
E ( XY ) = 2 .
j ∂u∂v u = 0 ,v = 0

If X and Y are independent r.vs, then we obtain
Φ XY (u, v ) = E ( e juX ) E ( e jvY ) = Φ X (u )ΦY ( v ).
Also
Φ X (u ) = Φ XY (u, 0), ΦY ( v ) = Φ XY (0, v ).
More on Gaussian r.vs: the joint characteristic function of two jointly

Gaussian r.vs to be
1
j ( μ X u + μY v ) − (σ X2 u 2 + 2 ρσ X σ Y uv +σ Y2 v 2 )
j ( Xu +Yv )
Φ XY (u, v ) = E (e )=e 2
.
Example 6: Let X and Y be jointly Gaussian r.vs with parameters

N(μX, μY, σX2, σY2, ρ). Define Z = aX + bY. Determine fZ(z).
In this case we can use characteristic function to solve.
Φ Z (u ) = E (e jZu ) = E ( e j ( aX +bY ) u ) = E ( e jauX + jbuY )
= Φ XY ( au, bu ).
or 1
j ( aμ X + bμY ) u − ( a 2σ X2 + 2 ρabσ X σ Y + b2σ Y2 ) u 2
1
jμ Z u − σ Z2 u 2
Φ Z (u ) = e 2
=e 2
,
where μ Z =Δ a μ X + b μ Y ,
σ 2
Z =Δ a 2 σ 2
X + 2 ρ ab σ X σ Y + b 2 σ Y2 .
Thus, Z = aX + bY is also Gaussian with mean and variance as above.
We conclude that any linear combination of jointly Gaussian r.vs generate a
Gaussian r.v.
Example 7: Suppose X and Y are jointly Gaussian r.vs as in the example 6.

Define two linear combinations: Z = aX + bY and W = cX + dY. Determine
their joint distribution?
The characteristic function of Z and W is given by

Φ ZW (u, v ) = E ( e j ( Zu +Wv ) ) = E ( e j ( aX +bY ) u + j ( cX + dY ) v )
= E ( e jX ( au + cv )+ jY ( bu + dv ) ) = Φ XY ( au + cv, bu + dv ).

Similar to example 6, we get
1
j ( μ Z u + μW v ) − (σ Z2 u 2 + 2 ρ ZW σ X σ Y uv +σ W2 v 2 )
Φ ZW (u, v ) = e 2
,
where
μ Z = aμ X + bμY ,
μW = cμ X + dμY ,
σ Z2 = a 2σ X2 + 2abρσ X σ Y + b2σ Y2 ,
σ W2 = c 2σ X2 + 2cdρσ X σ Y + d 2σ Y2 ,
and
acσ X2 + ( ad + bc ) ρσ X σ Y + bdσ Y2
ρ ZW = .
σ Zσ W
Thus, Z and W are also jointly distributed Gaussian r.vs with means,
variances and correlation coefficient as above.

To summarize, any two linear combinations of jointly Gaussian random
variables (independent or dependent) are also jointly Gaussian r.vs.
Gaussian Linear Gaussian

input operator output
Gaussian r. vs are also interesting because of the following result:
Central Limit Theorem: Suppose X1, X2,…, Xn are a set of zero mean
independent, identically distributed (i.i.d) random variables with some
common distribution. Consider their scaled sum
X1 + X 2 + + Xn
Y= .
n
Then asymptotically (as n →∞)
Y → N (0, σ 2 ).

The central limit theorem states that a large sum of independent random
variables each with finite variance tends to behave like a normal random
variable. Thus the individual p.d.fs become unimportant to analyze the
collective sum behavior. If we model the noise phenomenon as the sum of
a large number of independent random variables (eg: electron motion in
resistor components), then this theorem allows us to conclude that noise
behaves like a Gaussian r.v.

2. Review of Stochastic Processes and Models
Mean, Autocorrelation.
Stationarity.
Deterministic Systems.
Discrete Time Stochastic Process.
Stochastic Models.

2. Review of Stochastic Processes: Introduction (1)
Let ζ denote the random outcome of an experiment. To every such
outcome suppose a waveform X(t, ζ) is assigned. The collection of such
waveforms form a stochastic process. The set of {ζk} and the time index t
can be continuous or discrete (countably infinite or finite) as well. For fixed
ζi ∈ S (the set of all experimental outcomes), X(t, ζ) is a specific time
function. For fixed t, X1 = X(t1, ζi) is a random variable. The ensemble of
all such realizations X(t, ζ) over
time represents the stochastic X (t, ξ )
process (or random process)
X(t) (see the figure). X (t, ξ )
n
For example: X (t, ξ k )
X(t) = acos(ω0t +φ)

X (t, ξ 2 )
where φ is a uniformly distributed
random variable in (0, 2π), X (t, ξ1 )
represents a stochastic process. 0 t1 t2
t

If X(t) is a stochastic process, then for fixed t, X(t) represents a random
variable. Its distribution function is given by
FX ( x, t ) = P{ X (t ) ≤ x}
Notice that FX(x, t) depends on t, since for a different t, we obtain a
different random variable. Further
Δ dF X ( x , t )
f X ( x, t ) =
dx
represents the first-order probability density function of the process X(t).
For t = t1 and t = t2, X(t) represents two different random variables

X1 = X(t1) and X2 = X(t2), respectively. Their joint distribution is given by
FX ( x1 , x2 , t1 , t2 ) = P{ X (t1 ) ≤ x1 , X (t2 ) ≤ x2 }
and ∂ 2 FX ( x1 , x2 , t1 , t2 )
Δ
f X ( x1 , x2 , t1 , t2 ) =
∂x1 ∂x2
represents the second-order density function of the process X(t).
Similarly, fX(x1, x2,… xn, t1, t2,… tn) represents the nth order density function
of the process X(t). Complete specification of the stochastic process X(t)
requires the knowledge of fX(x1, x2,… xn, t1, t2,… tn) for all ti , i = 1, 2…, n and
for all n. (an almost impossible task in reality!).
Mean of a stochastic process:

+∞
μ ( t ) =Δ E { X ( t )} = ∫ −∞ x f X ( x , t ) dx
represents the mean value of a process X(t). In general, the mean of
a process can depend on the time index t.
Autocorrelation function of a process X(t) is defined as

Δ
RXX (t1 , t2 ) = E{X (t1 ) X * (t2 )} = ∫ ∫ x1 x2* f X ( x1 , x2 , t1 , t2 )dx1dx2
and it represents the interrelationship between the random variables
X1 = X(t1) and X2 = X(t2) generated from the process X(t).

Properties:
(i) R XX (t1 , t2 ) = R XX* (t2 , t1 ) = [ E{ X (t2 ) X * (t1 )}]*
(ii) R XX (t , t ) = E{| X (t ) |2 } > 0. (Average instantaneous power)
(iii) RXX(t1, t2) represents a nonnegative definite function, i.e., for any set
of constants {ai}ni=1 n n
∑ ∑ ai a*j R XX
(ti , t j ) ≥ 0.
i =1 j =1
n
this follows by noticing that E{| Y | } ≥ 0 for Y = ∑ ai X (ti ).
2
i =1
The function
C XX (t1 , t 2 ) = RXX (t1 , t 2 ) − μ X (t1 ) μ *X (t 2 )
represents the autocovariance function of the process X(t).

Example:
X (t ) = a cos(ω 0 t + ϕ ), ϕ ~ U (0,2π ).
This gives
μ (t ) = E{ X (t )} = aE{cos(ω 0 t + ϕ )}
X
= a cos ω 0 t E{cosϕ } − a sin ω 0 t E{sin ϕ } = 0,

2π
since E{cosϕ } = 1 cosϕ dϕ = 0 = E{sin ϕ }.
2π ∫0
Similarly,
R XX (t1 , t2 ) = a 2 E{cos(ω 0 t1 + ϕ ) cos(ω 0 t2 + ϕ )}
a2
= E{cosω 0 (t1 − t2 ) + cos(ω 0 (t1 + t2 ) + 2ϕ )}
2
a2
= cos ω 0 (t1 − t2 ).
2

2. Review of Stochastic Processes: Stationary (1)
Stationary processes exhibit statistical properties that are invariant to
shift in the time index. Thus, for example, second-order stationarity implies
that the statistical properties of the pairs {X(t1) , X(t2)} and {X(t1+c),X(t2+c)}
are the same for any c. Similarly, first-order stationarity implies that the
statistical properties of X(ti) and X(ti+c) are the same for any c.
In strict terms, the statistical properties are governed by the joint probability
density function. Hence a process is nth-order Strict-Sense Stationary
(S.S.S) if
f X ( x1 , x2 , xn , t1 , t2 , tn ) ≡ f X ( x1 , x2 , xn , t1 + c, t2 + c , tn + c ) (*)
for any c, where the left side represents the joint density function of the
random variables X1 = X(t1), X2 = X(t2), …, Xn = X(tn), and the right side
corresponds to the joint density function of the random variables X’1=X(t1+c),
X’2 = X(t2+c), …, X’n = X(tn+c). A process X(t) is said to be strict-sense
stationary if (*) is true for all ti, i = 1, 2, …, n; n = 1, 2, … and any c.

For a first-order strict sense stationary process, from (*) we have
f X ( x, t ) ≡ f X ( x, t + c )
for any c. In particular c = – t gives
f X ( x, t ) = f X ( x )
i.e., the first-order density of X(t) is independent of t. In that case
+∞
E [ X (t )] = ∫ −∞ x f ( x )dx = μ , a constant.
Similarly, for a second-order strict-sense stationary process we have

from (*)
f X ( x1 , x2 , t1 , t 2 ) ≡ f X ( x1 , x2 , t1 + c, t 2 + c)
for any c. For c = – t2 we get
f X ( x1 , x2 , t1 , t 2 ) ≡ f X ( x1 , x2 , t1 − t 2 )
i.e., the second order density function of a strict sense stationary process
depends only on the difference of the time indices t1 – t2 = τ .

In that case the autocorrelation function is given by
Δ
RXX (t1 , t2 ) = E{ X (t1 ) X * (t2 )}
= ∫ ∫ x1 x2* f X ( x1 , x2 ,τ = t1 − t2 )dx1dx2
= RXX (t1 − t2 ) Δ= RXX (τ ) = RXX* (−τ ),
i.e., the autocorrelation function of a second order strict-sense stationary
process depends only on the difference of the time indices τ .
However, the basic conditions for the first and second order stationarity are
usually difficult to verify. In that case, we often resort to a looser definition
of stationarity, known as Wide-Sense Stationarity (W.S.S). Thus, a process
X(t) is said to be Wide-Sense Stationary if
(i) E{ X (t )} = μ
and
(ii) E{ X (t1 ) X (t2 )} = R XX (t1 − t2 ),
*

i.e., for wide-sense stationary processes, the mean is a constant and the
autocorrelation function depends only on the difference between the time
indices. Strict-sense stationarity always implies wide-sense stationarity.
However, the converse is not true in general, the only exception being the
Gaussian process. If X(t) is a Gaussian process, then
wide-sense stationarity (w.s.s) ⇒ strict-sense stationarity (s.s.s).

2. Review of Stochastic Processes: Systems (1)
A deterministic system transforms each input waveform X(t, ζi ) into an
output waveform Y(t, ζi) = T[X(t, ζi)] by operating only on the time variable t.
A stochastic system operates on both the variables t and ζ .
Thus, in deterministic system, a set of realizations at the input corresponding

to a process X(t) generates a new set of realizations Y(t, ζ) at the output
associated with a new process Y(t).
Y (t, ξ i )
X (t , ξ i )
⎯⎯⎯→
X (t )
T [⋅] ⎯Y⎯→
⎯
(t )
t t
Our goal is to study the output process statistics in terms of the input
process statistics and the system function.

Deterministic Systems
Memoryless Systems Systems with Memory

Y ( t ) = g [ X ( t )]
Time-varying Time-invariant Linear systems

systems systems Y ( t ) = L[ X ( t )]
Linear-Time Invariant
(LTI) systems
+∞
X (t ) h (t ) Y ( t ) = ∫ − ∞ h ( t − τ ) X (τ ) dτ
+∞
= ∫ − ∞ h (τ ) X ( t − τ ) dτ .
LTI system
Memoryless Systems:
The output Y(t) in this case depends only on the present value of the input X(t).
i.e., Y ( t ) = g{ X ( t )}
Strict-sense Memoryless Strict-sense

stationary input system stationary output.
Wide-sense Need not be

Memoryless
stationary in
stationary input system
any sense.
X(t) stationary Memoryless Y(t) stationary,but

Gaussian with system not Gaussian with
R XX (τ ) R XY (τ ) = η R XX (τ ).

Theorem: If X(t) is a zero mean stationary Gaussian process, and
Y(t) = g[X(t)], where g(.) represents a nonlinear memoryless device, then
R XY (τ ) = ηR XX (τ ), η = E{g ′( X )}.
where g’(x) is the derivative with respect to x.
Linear Systems: L[.] represents a linear system if

L{a1 X (t1 ) + a 2 X (t2 )} = a1 L{ X (t1 )} + a 2 L{ X (t2 )}.
Let Y(t) = L{X(t)} represent the output of a linear system.
Time-Invariant System: L[.] represents a time-invariant system if

Y (t ) = L{ X (t )} ⇒ L{ X (t − t0 )} = Y (t − t0 )
i.e., shift in the input results in the same shift in the output also.
If L[.] satisfies both conditions for linear and time-invariant, then it

corresponds to a linear time-invariant (LTI) system.

LTI systems can be uniquely represented in terms of their output to a
delta function Impulse h(t )
response of
the system
δ (t ) LTI h (t )
t
Impulse Impulse
response
then Y (t )
X (t )
X (t ) Y (t ) t
t LTI
+∞
Y (t ) = ∫ − ∞ h (t − τ ) X (τ )dτ
arbitrary
input +∞
= ∫ − ∞ h (τ ) X (t − τ )dτ
where +∞
X (t ) = ∫ − ∞ X (τ )δ (t − τ )dτ

Thus +∞
Y (t ) = L{ X (t )} = L{∫ − ∞ X (τ )δ (t − τ )dτ }
+∞
= ∫ − ∞ L{ X (τ )δ (t − τ )dτ } By Linearity
+∞
= ∫ − ∞ X (τ ) L{δ (t − τ )}dτ By Time-invariance
+∞ +∞
= ∫ − ∞ X (τ )h(t − τ )dτ = ∫ − ∞ h(τ ) X (t − τ )dτ .
Then, the mean of the output process is given by

+∞
μ (t ) = E{Y (t )} = ∫ − ∞ E{ X (τ )h(t − τ )dτ }
Y
+∞
= ∫ − ∞ μ X (τ )h(t − τ )dτ = μ X (t ) ∗ h(t ).

Similarly, the cross-correlation function between the input and output
processes is given by
R XY (t1 , t2 ) = E{ X (t1 )Y * (t2 )}
+∞
= E{ X (t1 ) ∫ − ∞ X * (t2 − α )h * (α )dα }
+∞
= ∫ − ∞ E{ X (t1 ) X * (t2 − α )}h * (α )dα
+∞
= ∫ − ∞ R XX (t1 , t2 − α )h * (α )dα
= R XX (t1 , t2 ) ∗ h * (t2 ).

Finally the output autocorrelation function is given by
RYY (t1 , t2 ) = E{Y (t1 )Y * (t2 )}
+∞
= E{∫ − ∞ X (t1 − β )h ( β )dβ Y * (t2 )}
+∞
= ∫ − ∞ E{ X (t1 − β )Y * (t2 )}h ( β )dβ
+∞
= ∫ − ∞ R XY (t1 − β , t2 )h ( β )dβ
= R XY (t1 , t2 ) ∗ h(t1 ),
or
RYY (t1 , t2 ) = R XX (t1 , t2 ) ∗ h * (t2 ) ∗ h (t1 ).
In particular, if X(t) is wide-sense stationary, then we have μX(t) = μX

Also +∞
R XY ( t1 , t 2 ) = ∫ − ∞ R XX ( t1 − t 2 + α ) h (α ) dα
*
Δ
= R XX (τ ) ∗ h * ( −τ ) = R XY (τ ), τ = t1 − t 2 .
Thus X(t) and Y(t) are jointly w.s.s. Further, the output autocorrelation
simplifies to
+∞
RYY (t1 , t 2 ) = ∫ −∞ RXY (t1 − β − t 2 )h( β )dβ , τ = t1 − t 2
= RXY (τ ) ∗ h(τ ) = RYY (τ ).
or
RYY (τ ) = RXX (τ ) ∗ h* ( −τ ) ∗ h(τ ).

X (t ) Y (t )
wide-sense LTI system wide-sense
stationary process h(t) stationary process.
(a)
X (t ) Y (t )
LTI system
strict-sense strict-sense
h(t)
stationary process stationary process
(b)
X (t ) Y (t )
Gaussian process Linear system Gaussian process
(also stationary) (also stationary)
(c)

White Noise Process: W(t) is said to be a white noise process if
RWW (t1 , t 2 ) = q (t1 )δ (t1 − t 2 ),
i.e., E[W(t1) W*(t2)] = 0 unless t1 = t2.
W(t) is said to be wide-sense stationary (w.s.s) white noise if

E[W(t)] = constant, and
RWW (t1 , t 2 ) = qδ (t1 − t 2 ) = qδ (τ ).
If W(t) is also a Gaussian process (white Gaussian process), then all of

its samples are independent random variables (why?).
For w.s.s. white noise input W(t), we have

+∞
E [ N (t )] = μW ∫ −∞ h(τ )dτ , a constant and Rnn (τ ) = qδ (τ ) ∗ h* ( −τ ) ∗ h(τ )
= qh* (−τ ) ∗ h(τ ) = qρ (τ )
+∞
where ρ (τ ) = h(τ ) ∗ h* (−τ ) = ∫ −∞ h(α )h* (α + τ )dα .

Thus the output of a white noise process through an LTI system represents
a (colored) noise process.
Note: White noise need not be Gaussian.
“White” and “Gaussian” are two different concepts!
White noise LTI Colored noise

W(t) h(t) N (t ) = h (t ) ∗ W (t )

2. Review of Stochastic Processes: Discrete Time (1)
A discrete time stochastic process (DTStP) Xn = X(nT) is a sequence of
random variables. The mean, autocorrelation and auto-covariance functions
of a discrete-time process are gives by
μ n = E{ X ( nT )}
R ( n1 , n2 ) = E{ X ( n1T ) X * ( n2T )}
C ( n1 , n2 ) = R ( n1 , n2 ) − μ n1 μ n*2
respectively. As before strict sense stationarity and wide-sense stationarity
definitions apply here also. For example, X(nT) is wide sense stationary if
E{ X ( nT )} = μ , a constant
and
Δ
E[X{(k +n)T}X*{(k)T}] = R(n) = rn = r−*n
i.e., R(n1, n2) = R(n1 – n2) = R*(n2 – n1).

If X(nT) represents a wide-sense stationary input to a discrete-time
system {h(nT)}, and Y(nT) the system output, then as before the cross
correlation function satisfies
R XY ( n ) = R XX ( n ) ∗ h * ( − n )
and the output autocorrelation function is given by
RYY ( n ) = R XY ( n ) ∗ h( n )
or
RYY (n ) = R XX ( n ) ∗ h * ( −n ) ∗ h( n ).
Thus wide-sense stationarity from input to output is preserved for discrete-

time systems also.

Mean (or ensemble average μ) of a stochastic process is obtained by
averaging across the process, while time average is obtained by averaging
along the process as M −1
1
μˆ =
M
∑X
n =0
n
where M is total number of time samples used in estimation.
Consider wide-sense DTStP Xn, time average converge to ensemble average

if:
[
lim (μ − μˆ ) = 0
M →∞
2
]
the process Xn is said mean ergodic (in the mean-square error sense).

Let define an (M×1)-observation vector xn represents elements of time
series Xn, Xn-1, ….Xn-M+1
xn = [Xn, Xn-1, ….Xn-M+1]T
An (M×M)-correlation matrix R (using condition of wide-sense stationary)

can be defined as
⎡ R(0) R(1) R( M − 1) ⎤
⎢ R(−1) R ( M − 2)⎥⎥
R = E xnxn [ H
] =⎢
⎢
R(0)
⎥
⎢ ⎥
⎣ R (− M + 1) R (− M + 2) R (0) ⎦
Superscript H denotes Hermitian transposition.

Properties:
(i) Correlation matrix R of stationary DTStP is Hermitian: RH = R or
R(k)=R*(k).Therefore:
⎡ R(0) R(1) R( M − 1) ⎤
⎢ R ∗ (1) R ( 0) R ( M − 2)⎥⎥
R= ⎢
⎢ ⎥
⎢ ∗ ∗ ⎥
⎣ R ( M − 1) R ( M − 2) R ( 0) ⎦
(ii) Matrix R of stationary DTStP is Toeplitz: all elements on main
diagonal are equal, elements on any subdiagonal are also equal.
(iii) Let x be arbitrary (nonzero) (M×1)-complex-valued vector, then
xHRx ≥ 0 (nonnegative definition).
(iv) If xBn is backward arrangement of xn: x n = [ xn − M +1 , xn − M + 2 ,..., xn ]
B T
Then, E[xBnxBHn] = RT

(v) Consider correlation matrices RM and RM+1, corresponding to M and M+1
observations of process, these matrices are related by
⎡ R (0) r H ⎤
R M +1 = ⎢ ⎥
⎣ r R M⎦
or
⎡R M r B∗ ⎤
R M +1 = ⎢ BT ⎥
⎣r R ( 0) ⎦
where rH = [R(1), R(2),…, R(M)] and rBT = [r(-M), r(-M+1),…, r(-1)]

Consider a time series consisting of complex sine wave plus noise:
un = u (n) = α exp( jωn) + v(n), n = 0,..., N − 1
Sources of sine wave and noise are independent. Assumed that v(n) has
zero mean and autocorrelation function given by
⎧σ
[ ] k =0
2
E v ( n )v ∗ ( n − k ) = ⎨ v
⎩0 k ≠0
For a lag k, autocorrelation function of process u(n):

⎧ α
2
+ σ v2 , k =0
[ ∗ ⎪
]
r (k ) = E u (n)u (n − k ) = ⎨ 2 jωk
⎪⎩ α e , k ≠0

Therefore, correlation matrix of u(n):
⎡ 1 ⎤
⎢ 1 + exp( jω ) exp( jω ( M − 1)) ⎥
ρ
⎢ ⎥
1
2⎢ exp(− jω ) 1+ exp( jω ( M − 2))⎥
R=α ⎢ ρ ⎥
⎢ ⎥
⎢ 1 ⎥
⎢exp(− jω ( M − 1)) exp(− jω ( M − 2)) 1+ ⎥
⎣ ρ ⎦
2
α
where ρ = 2 : signal to noise ratio (SNR).
σv

2. Review of Stochastic Models (1)
Consider an input – output representation
p q
X ( n ) = − ∑ a k X ( n − k ) + ∑ bkW ( n − k ),
k =1 k =0
where X(n) may be considered as the output of a system {h(n)} driven by

the input W(n). Using z – transform, it gives
p q
X ( z ) ∑ ak z −k
= W ( z ) ∑ bk z − k , a0 ≡ 1
k =0 k =0
or
−1 −2
∞
X ( z ) b0 + b1 z + b2 z + + bq z − q Δ B ( z )
H (z) = ∑ h(k ) z −k = =
W ( z ) 1 + a1 z − 1 + a 2 z −2 + + apz −p
=
A( z )
k =0
represents the transfer function of the associated system response {h(n)} so

that ∞
X ( n ) = ∑ h( n − k )W ( k ).
k =0
W(n) h(n) X(n)

Notice that the transfer function H(z) is rational with p poles and q zeros that
determine the model order of the underlying system. The output X(n)
undergoes regression over p of its previous values and at the same time a
moving average based on W(n), W(n-1), …, W(n-q) of the input over (q + 1)
values is added to it, thus generating an Auto Regressive Moving Average
(ARMA (p, q)) process X(n).
Generally the input {W(n)} represents a sequence of uncorrelated random
variables of zero mean and constant variance so that
RWW (n) = σ W2δ (n).
If in addition, {W(n)} is normally distributed then the output {X(n)} also

represents a strict-sense stationary normal process.
If q = 0, then X(n) represents an Auto Regressive AR(p) process (all-pole

process), and if p = 0, then X(n) represents an Moving Average MA(q)
process (all-zero process).

AR(1) process: An AR(1) process has the form
X ( n ) = aX ( n − 1) + W ( n )
and the corresponding system transfer function
∞
1
H ( z) =
1 − az −1
= ∑ a n −n
z
n =0
provided | a | < 1. Thus
h(n ) = a n , | a |< 1
represents the impulse response of an AR(1) stable system. We get the
output autocorrelation sequence of an AR(1) process to be
∞ |n |
a
R XX ( n ) = σ W2δ ( n ) ∗ {a − n } ∗ {a n } = σ W2 ∑ a |n|+ k a k = σ W2
k =0 1− a2
The normalized (in terms of RXX (0)) output autocorrelation sequence is
given by
R (n)
ρ X ( n ) = XX = a |n| , | n | ≥ 0.
R XX (0)
It is instructive to compare an AR(1) model discussed above by
superimposing a random component to it, which may be an error term
associated with observing a first order AR process X(n). Thus
Y ( n) = X ( n) + V ( n)
where X(n) ~ AR(1), and V(n) is an uncorrelated random sequence with
zero mean and variance σ2V that is also uncorrelated with {W(n)}. Then,
we obtain the output autocorrelation of the observed process Y(n) to be
RYY (n) = RXX ( n) + RVV ( n) = RXX (n) + σ V2δ (n)
a |n|
= σW 2
+ σ 2
δ ( n)
1− a 2 V
so that its normalized version is given by

Δ RYY (n) ⎧1 n=0
ρY (n) = = ⎨ |n|
RYY (0) ⎩c a n = ±1, ± 2,
where
σ W2
c= 2 < 1.
σ W + σ V (1 − a )
2 2

The results demonstrate the effect of superimposing an error sequence on
an AR(1) model. For non-zero lags, the autocorrelation of the observed
sequence {Y(n)} is reduced by a constant factor compared to the original
process {X(n)}. The superimposed error sequence V(n) only affects the
corresponding term in Y(n) (term by term). However, a particular term in
the “input sequence” W(n) affects X(n) and Y(n) as well as all subsequent
observations.
ρ X (0) = ρ Y (0) = 1
ρ X (k ) > ρY (k )
n
0 k

AR(2) Process: An AR(2) process has the form
X ( n) = a1 X ( n − 1) + a2 X (n − 2) + W (n)
and the corresponding transfer function is given by
∞
1 b1 b2
H ( z ) = ∑ h( n) z − n = = +
n =0 1 − a1 z −1 − a2 z − 2 1 − λ1 z −1 1 − λ2 z −1
so that
h(0) = 1, h(1) = a1 , h(n) = a1h(n − 1) + a2 h(n − 2), n ≥ 2
and in term of the poles of the transfer function, we have
h(n) = b1λ1n + b2 λn2 , n≥0
that represents the impulse response of the system. We also have
λ1 + λ2 = a1 , λ1λ2 = −a2 ,
b1 + b2 = 1, b1λ1 + b2 λ2 = a1.
and H(z) stable implies | λ1 |< 1, | λ2 |< 1.
Further, the output autocorrelations satisfy the recursion
R XX ( n ) = E{ X ( n + m ) X * ( m )}
= E{[ a1 X ( n + m − 1) + a2 X ( n + m − 2)] X * ( m )}
0
+ E{W ( n + m ) X * ( m )}
= a1 R XX ( n − 1) + a2 R XX ( n − 2)
and hence their normalized version is given by
R (n )
ρ X ( n ) =Δ XX = a1 ρ X ( n − 1) + a 2 ρ X ( n − 2).
R XX (0)
By direct calculation using, the output autocorrelations are given by
RXX (n) = RWW (n) ∗ h* (− n) ∗ h(n) = σ W2 h* (− n) ∗ h(n)
∞
= σ W ∑ h * ( n + k ) ∗ h( k )
2
k =0
⎛ | b1 |2 (λ1* ) n b1*b2 (λ1* ) n b1b2* (λ*2 ) n | b2 |2 (λ*2 ) n ⎞

= σW ⎜ 2
+ + + ⎟
⎝ 1− | λ1 | 1 − λ1λ2 1 − λ1λ2 1− | λ2 | ⎠
2 * * 2

Then, the normalized output autocorrelations may be expressed as
RXX (n)
ρ X ( n) = = c1λ1 + c2 λ2
*n *n
RXX (0)

An ARMA (p, q) system has only p + q + 1 independent coefficients,
(ak, k = 1… p; bi, i = 0… q) and hence its impulse response sequence {hk}
also must exhibit a similar dependence among them. In fact, according to
P. Dienes (1931), and Kronecker (1881) states that the necessary and
sufficient condition for H ( z ) = ∑ k = 0 hk z to represent a rational system
∞ −k
(ARMA) is that
det H n = 0, n≥N (for all sufficiently large n ),
where
⎛ h0 h1 h2 hn ⎞
⎜h h h3 hn +1 ⎟
Δ
Hn = ⎜ 1 2
⎟.
⎜ ⎟
⎜h h h2 n ⎟⎠
⎝ n n +1 hn + 2
i.e., in the case of rational systems for all sufficiently large n, the Hankel
matrices Hn all have the same rank.


Stochastic Signal Processing

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stochastic Signal Processing

Uploaded by

Copyright:

Available Formats

Stochastic Signal Processing

TS Ñoã Hoàng Tuaán

Boä Moân Vieãn Thoâng, Khoa Ñieän-Ñieän Töû

Boä moân Vieãn thoâng SSP2008

Chapter 1: Discrete-Time Signal Processing

Chapter 2: Stochastic Processes and Models

Chapter 3: Spectrum Analysis

Boä moân Vieãn thoâng SSP2008

Boä moân Vieãn thoâng SSP2008

Chapter 8: Linear Adaptive Filtering

Chapter 9: Estimation Theory

Boä moân Vieãn thoâng SSP2008

[1] Simon Haykin, Adaptive Filter Theory, Prentice Hall,

[2] Steven M. Kay, Fundamentals of Statistical Signal Processing:

[3] Alan V. Oppenheim, Ronald W. Schafer, Discrete-Time Signal

[4] Athanasios Papoulis, Probability, Random Variables, and

[5] Dimitris G. Manolakis, Vinay K. Ingle, Stephen M. Kogon,

Boä moân Vieãn thoâng SSP2008

 Understanding the random signals: via statistical description

 Developing the theoretically methods/practical techniques for

Boä moân Vieãn thoâng SSP2008

Boä moân Vieãn thoâng SSP2008

 Signal filtering (frequency-selective filtering, adaptive filtering,

Boä moân Vieãn thoâng SSP2008

System inversion [5]:

Boä moân Vieãn thoâng SSP2008

Interference cacellation [5]:

Boä moân Vieãn thoâng SSP2008

Pulse trains: (a) without intersymbol

Boä moân Vieãn thoâng SSP2008

Boä moân Vieãn thoâng SSP2008

Boä moân Vieãn thoâng SSP2008

Boä moân Vieãn thoâng SSP2008

Boä moân Vieãn thoâng SSP2008

 Linear Time-Invariant Filters.

 Discrete Fourier Transform (DFT).

Boä moân Vieãn thoâng SSP2008

 Consider time series (sequence) {u(n)} or u(n) denoted by samples:

Boä moân Vieãn thoâng SSP2008

Boä moân Vieãn thoâng SSP2008

v(n) LTI u(n)

 Impulse response h(n):

For arbitrary input v(n): convolution sum

Boä moân Vieãn thoâng SSP2008

Boä moân Vieãn thoâng SSP2008

Boä moân Vieãn thoâng SSP2008

v(n) v(n-1) v(n-2) v(n-M+1) v(n-M)

Boä moân Vieãn thoâng SSP2008

Boä moân Vieãn thoâng SSP2008

Unit circle A causal LTI filter is stable

When u(n) has a finite duration, its Fourier representation →

For numerical computation of DFT → efficient fast Fourier

 u(n): finite-duration sequence of length N, DFT of u(n):

u(n), U(k): same length N → “N-point DFT”

Boä moân Vieãn thoâng SSP2008

 Review of Probability and Random Variables.

 Review of Stochastic Processes and Stochastic Models.

Boä moân Vieãn thoâng SSP2008

 Concepts of Random Variables.

 Functions of Random Variables.

 Moments and Conditional Statistics.

 Sequences of Random Variables.

Boä moân Vieãn thoâng SSP2008

Understanding the random signals: via statistical description

Developing the theoretically methods/practical techniques for

Signal filtering (frequency-selective filtering, adaptive filtering,

Linear Time-Invariant Filters.

Discrete Fourier Transform (DFT).

Consider time series (sequence) {u(n)} or u(n) denoted by samples:

Impulse response h(n):

u(n): finite-duration sequence of length N, DFT of u(n):

Review of Probability and Random Variables.

Review of Stochastic Processes and Stochastic Models.

Concepts of Random Variables.

Functions of Random Variables.

Moments and Conditional Statistics.

Sequences of Random Variables.

Conditional Probability and Independence

Generalization: Given n experiments Ω1 , Ω 2 , , Ωn , and their associated

Bernoulli trial: consists of repeated independent and identical