Chapter1and3 Slides PDF

Chapter 1: Introduction Chapter 3: Probability, Random Variables, Random Processes
A First Course in Digital Communications

Ha H. Nguyen and E. Shwedyk
February 2009
A First Course in Digital Communications 1/75

Analog and Digital Amplitude Modulations

Analog message
1
0
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
1
AM signal
0
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
BASK signal Digital message
1
0
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
1
0
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
What is Digital Communication?

x(t) x(t)
t t
0
(a) (b)
x(t) x(t)
Ts
t t
0 Ts 0
(c) (d)

Why Digital Communications?

Received BASKTransmitted BASK Received AM Transmitted AM 1
0
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
2
0
−2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
1
0
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
2
0
−2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
Why Digital Communications?

Received BASKTransmitted BASK Received AM Transmitted AM 1
0
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
5
0
−5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
1
0
−1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
5
0
−5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
Regenerative Repeater in Digital Communications
Severely
Original pulse Some distortion Degraded Regenerated
degraded
1 2 3 4 5 Propagation distance
Digital communications: Transmitted signals belong to a

finite set of waveforms → The distorted signal can be
recovered to its ideal shape, hence removing all the noise.
Analog communications: Transmitted signals are analog
waveforms, which can take infinite variety of shapes → Once
the analog signal is distorted, the distortion cannot be
removed.
Block Diagram of a Communication System

Synchronization
Source Sink
Transmitter Channel Receiver
(User) (User)
(a)
Transmitter Receiver
Source Channel De- Channel Source

Modulator
Encoder Encoder modulator Decoder Decoder
(b)
Note: “Synchronization” block is only present in a digital system.
Digital vs. Analog

Advantages:
Digital signals are much easier to be regenerated.
Digital circuits are less subject to distortion and interference.
Digital circuits are more reliable and can be produced at a
lower cost than analog circuits.
It is more flexible to implement digital hardware than analog
hardware.
Digital signals are beneficial from digital signal processing
(DSP) techniques.
Disadvantages:
Heavy signal processing.
Synchronization is crucial.
Larger transmission bandwidth.
Non-graceful degradation.
Introduction
The main objective of a communication system is the transfer
of information over a channel.
Message signal is best modeled by a random signal: Any
signal that conveys information must have some uncertainty in
it, otherwise its transmission is of no interest.
Two types of imperfections in a communication channel:
Deterministic imperfection, such as linear and nonlinear
distortions, inter-symbol interference, etc.
Nondeterministic imperfection, such as addition of noise,
interference, multipath fading, etc.
We are concerned with the methods used to describe and
characterize a random signal, generally referred to as a
random process (also commonly called stochastic process).
In essence, a random process is a random variable evolving in
time.
Sample Space and Probability
Random experiment: its outcome, for some reason, cannot be

predicted with certainty.
Examples: throwing a die, flipping a coin and drawing a card
from a deck.
Sample space: the set of all possible outcomes, denoted by Ω.
Outcomes are denoted by ω’s and each ω lies in Ω, i.e., ω ∈ Ω.
A sample space can be discrete or continuous.
Events are subsets of the sample space for which measures of
their occurrences, called probabilities, can be defined or
determined.

Example of Throwing a Fair Die
Various events can be defined: “the outcome is even number of

dots”, “the outcome is smaller than 4 dots”, “the outcome is more
than 3 dots”, etc.

Three Axioms of Probability
For a discrete sample space Ω, define a probability measure P on

Ω as a set function that assigns nonnegative values to all events,
denoted by E, in Ω such that the following conditions are satisfied
Axiom 1: 0 ≤ P (E) ≤ 1 for all E ∈ Ω (on a % scale
probability ranges from 0 to 100%. Despite popular sports
lore, it is impossible to give more than 100%).
Axiom 2: P (Ω) = 1 (when an experiment is conducted there
has to be an outcome).
Axiom 3:SFor mutually exclusive events1 E1 , E2 , E3 ,. . . we
∞ P∞
have P ( i=1 Ei ) = i=1 P (Ei ).
1
The events E1 , E2 , E3 ,. . . are mutually exclusive if Ei ∩ Ej = ⊘ for all
i 6= j, where ⊘ is the null set.
Important Properties of the Probability Measure
1. P (E c ) = 1 − P (E), where E c denotes the complement of E.

This property implies that P (E c ) + P (E) = 1, i.e., something
has to happen.
2. P (⊘) = 0 (again, something has to happen).
3. P (E1 ∪ E2 ) = P (E1 ) + P (E2 ) − P (E1 ∩ E2 ). Note that if
two events E1 and E2 are mutually exclusive then
P (E1 ∪ E2 ) = P (E1 ) + P (E2 ), otherwise the nonzero
common probability P (E1 ∩ E2 ) needs to be subtracted off.
4. If E1 ⊆ E2 then P (E1 ) ≤ P (E2 ). This says that if event E1
is contained in E2 then occurrence of E1 means E2 has
occurred but the converse is not true.

Conditional Probability
We observe or are told that event E1 has occurred but are
actually interested in event E2 : Knowledge that of E1 has
occurred changes the probability of E2 occurring.
If it was P (E2 ) before, it now becomes P (E2 |E1 ), the
probability of E2 occurring given that event E1 has occurred.
This conditional probability is given by

 P (E2 ∩ E1 )
, if P (E1 ) 6= 0
P (E2 |E1 ) = P (E1 ) .
0, otherwise

If P (E2 |E1 ) = P (E2 ), or P (E2 ∩ E1 ) = P (E1 )P (E2 ), then

E1 and E2 are said to be statistically independent.
Bayes’ rule
P (E1 |E2 )P (E2 )
P (E2 |E1 ) = ,
P (E1 )
Total Probability Theorem

The events {Ei }ni=1 partition the sample space Ω if:
n
[
(i) Ei = Ω (1a)
i=1
(ii) Ei ∩ Ej = ⊘ for all 1 ≤ i, j ≤ n and i 6= j (1b)
If for an event A we have the conditional probabilities
{P (A|Ei )}ni=1 , P (A) can be obtained as
n
X
P (A) = P (Ei )P (A|Ei ).
i=1
Bayes’ rule:
P (A|Ei )P (Ei ) P (A|Ei )P (Ei )
P (Ei |A) = = Pn .
P (A) j=1 P (A|Ej )P (Ej )

Random Variables
Ω
ω1 ω2
ω4
ω3
R
x(ω4 ) x(ω1 ) x(ω3 ) x(ω2 )
A random variable is a mapping from the sample space Ω to

the set of real numbers.
We shall denote random variables by boldface, i.e., x, y, etc.,
while individual or specific values of the mapping x are
denoted by x(ω).
Random Variable in the Example of Throwing a Fair Die
R
1 2 3 4 5 6
There could be many other random variables defined to describe

the outcome of this random experiment!

Cumulative Distribution Function (cdf)
cdf gives a complete description of the random variable. It is

defined as:
Fx (x) = P (ω ∈ Ω : x(ω) ≤ x) = P (x ≤ x).
The cdf has the following properties:

1. 0 ≤ Fx (x) ≤ 1 (this follows from Axiom 1 of the probability
measure).
2. Fx (x) is nondecreasing: Fx (x1 ) ≤ Fx (x2 ) if x1 ≤ x2 (this is
because event x(ω) ≤ x1 is contained in event x(ω) ≤ x2 ).
3. Fx (−∞) = 0 and Fx (+∞) = 1 (x(ω) ≤ −∞ is the empty set,
hence an impossible event, while x(ω) ≤ ∞ is the whole
sample space, i.e., a certain event).
4. P (a < x ≤ b) = Fx (b) − Fx (a).

Typical Plots of cdf I
A random variable can be discrete, continuous or mixed.

Fx ( x )
1.0
(a)
x
−∞ 0 ∞

Typical Plots of cdf II

Fx ( x )
1.0
(b)
x
−∞ 0 ∞
Fx ( x )
1.0
(c)
x
−∞ 0 ∞

Probability Density Function (pdf)

The pdf is defined as the derivative of the cdf:
dFx (x)
fx (x) = .
dx
It follows that:
P (x1 ≤ x ≤ x2 ) = P (x ≤ x2 ) − P (x ≤ x1 )
Z x2
= Fx (x2 ) − Fx (x1 ) = fx (x)dx.
x1
Basic properties of pdf:
1. fRx (x) ≥ 0.
∞
2. −∞ fx (x)dx = 1.
R
3. In general, P (x ∈ A) = A fx (x)dx.
For discrete random variables, it is more common to define
the probability mass function (pmf): pi X= P (x = xi ).
Note that, for all i, one has pi ≥ 0 and pi = 1.
i
Bernoulli Random Variable
fx ( x ) Fx ( x )
1− p
( p)
(1 − p )
x x
0 1 0 1
A discrete random variable that takes two values 1 and 0 with

probabilities p and 1 − p.
Good model for a binary data source whose output is 1 or 0.
Can also be used to model the channel errors.

Binomial Random Variable

fx ( x )
0.30
0.25
0.20
0.15
0.10
0.05
x
0 2 4 6
A discrete random variable that gives the number of 1’s in a
sequence of n independent Bernoulli trials.
n
X n k n−k n n!
fx (x) = p (1−p) δ(x−k), where = .
k k k!(n − k)!
k=0

Uniform Random Variable
fx ( x ) Fx ( x )
1
1
b−a
x x
a 0 b a 0 b
A continuous random variable that takes values between a

and b with equal probabilities over intervals of equal length.
The phase of a received sinusoidal carrier is usually modeled
as a uniform random variable between 0 and 2π. Quantization
error is also typically modeled as uniform.

Gaussian (or Normal) Random Variable

fx ( x ) Fx ( x )
1
2πσ 2 1
1
2
x x
0 µ 0 µ
A continuous random variable whose pdf is:

(x − µ)2

1
fx (x) = √ exp − ,
2πσ 2 2σ 2
µ and σ 2 are parameters. Usually denoted as N (µ, σ 2 ).
Most important and frequently encountered random variable
in communications.
Functions of A Random Variable
The function y = g(x) is itself a random variable.

From the definition, the cdf of y can be written as
Fy (y) = P (ω ∈ Ω : g(x(ω)) ≤ y).
Assume that for all y, the equation g(x) = y has a countable

number of solutions and at each solution point, dg(x)/dx
exists and is nonzero. Then the pdf of y = g(x) is:
f (x )
x i ,
X
fy (y) = dg(x)
i
dx x=xi
where {xi } are the solutions of g(x) = y.

A linear function of a Gaussian random variable is itself a
Gaussian random variable.
Expectation of Random Variables I
Statistical averages, or moments, play an important role in the

characterization of the random variable.
The expected value (also called the mean value, first moment)
of the random variable x is defined as
Z ∞
mx = E{x} ≡ xfx (x)dx,
−∞
where E denotes the statistical expectation operator.

In general, the nth moment of x is defined as
Z ∞
n
E{x } ≡ xn fx (x)dx.
−∞
For n = 2, E{x2 } is known as the mean-squared value of the

random variable.
Expectation of Random Variables II
The nth central moment of the random variable x is:

Z ∞
n
E{y} = E{(x − mx ) } = (x − mx )n fx (x)dx.
−∞
When n = 2 the central moment is called the variance,

commonly denoted as σx2 :
Z ∞
2 2
σx = var(x) = E{(x − mx ) } = (x − mx )2 fx (x)dx.
−∞
The variance provides a measure of the variable’s

“randomness”.
The mean and variance of a random variable give a partial
description of its pdf.

Expectation of Random Variables III
Relationship between the variance, the first and second

moments:
σx2 = E{x2 } − [E{x}]2 = E{x2 } − m2x .
An electrical engineering interpretation: The AC power equals

total power minus DC power.
The square-root of the variance is known as the standard
deviation, and can be interpreted as the root-mean-squared
(RMS) value of the AC component.

The Gaussian Random Variable
(a) A muscle (emg) signal

0.6
0.4
0.2
Signal amplitude (volts)
−0.2
−0.4
−0.6
−0.8
0 0.2 0.4 0.6 0.8 1
t (sec)

(b) Histogram and pdf fits

4
Histogram
3.5 Gaussian fit
Laplacian fit
2.5
fx(x) (1/volts)
1.5
0.5
0
−1 −0.5 0 0.5 1
x (volts)
2
1 − (x−m2
x)
fx (x) = p e 2σx
(Gaussian)
2πσx2
a −a|x|
fx (x) = e (Laplacian)
2
Gaussian Distribution (Univariate)

0.4
σx=1
0.35 σx=2
σ =5
0.3 x
0.25
fx(x)
0.2
0.15
0.1
0.05
0
−15 −10 −5 0 5 10 15
x
Range (±kσx ) k=1 k=2 k=3 k=4

P (mx − kσx < x ≤ mx − kσx ) 0.683 0.955 0.997 0.999
Error probability 10−3 10−4 10−6 10−8
Distance from the mean 3.09 3.72 4.75 5.61

Example
The noise voltage in an electric circuit can be modeled as a

Gaussian random variable with zero mean and variance σ 2 .
(a) Show that the probability that the noise voltage exceeds some
level A can be expressed as Q A σ .
(b) Given that the noise voltage is negative, express in terms of
the Q function the probability that the noise voltage exceeds
A, where A < 0.
(c) Using the MATLAB function Q.m available from the course’s
website, evaluate the expressions found in (a) and (b) for
σ 2 = 10−8 and A = −10−4 .

Solution
1 (x−µ)2 (µ=0) 1 x2
fx (x) = √ e− 2σ2 = √ e− 2σ2 .
2πσ 2πσ
The Q-function is defined as:
∞
1
Z
λ2
Q(t) = √ e− 2 dλ
2π t
The probability that the noise voltage exceeds some level A is
Z ∞
P (x > A) = fx (x)dx
A
Z ∞ x
Z ∞
1 − x2
2 (λ= σ ) 1 λ2
= √ e 2σ dx = √ e− 2 dλ
2πσ A (∴dλ= dxσ
) 2π Aσ

A
= Q .
σ
Given the noise voltage is negative, the probability that the noise
voltage exceeds some level A < 0 is:
P (x > A , x < 0) P (A < x < 0)
P (x > A|x < 0) = =
P (x < 0) P (x < 0)
A A
1
Q σ − Q(0) Q σ −2

A
= = = 2Q − 1.
Q(0) 1/2 σ
f x ( x | x < 0)
2
2πσ 2
Area=P (x > A | x < 0)
fx ( x )
1
2πσ 2
x x
0 A 0

With σ 2 = 10−8 and A = −10−4 ,
10−4

P (x > −10−4 ) = Q − √ = Q(−1) = 0.8413
10−8
P (x > −10−4 |x < 0) = 2Q(−1) − 1 = 0.6827.

Example
The Matlab function randn defines a Gaussian random variable of
zero mean and unit variance. Generate a vector of L = 106
samples according to a zero-mean Gaussian random variable x with
variance σ 2 = 10−8 . This can be done as follows:
x=sigma*randn(1,L).
(a) Based on the sample vector x, verify the mean and variance of
random variable x.
(b) Based on the sample vector x, compute the “probability” that
x > A, where A = −10−4 . Compare the result with that
found in Question 2-(c).
(c) Using the Matlab command hist, plot the histogram of
sample vector x with 100 bins. Next obtain and plot the pdf
from the histogram. Then also plot on the same figure the
theoretical Gaussian pdf (note the value of the variance for
the pdf). Do the histogram pdf and theoretical pdf fit well?
Solution
L=10^6; % Length of the noise vector

sigma=sqrt(10^(-8)); % Standard deviation of the noise
A=-10^(-4);
x=sigma*randn(1,L);
%[2] (a) Verify mean and variance:

mean_x=sum(x)/L; % this is the same as mean(x)
variance_x=sum((x-mean_x).^2)/L; % this is the same as var(x);
% mean_x = -3.7696e-008
% variance_x = 1.0028e-008
%[3] (b) Compute P(x>A)

P=length(find(x>A))/L;
% P = 0.8410

%[5] (c) Histogram and Gaussian pdf fit

N_bins=100;
[y,x_center]=hist(x,100); % This bins the elements of x into N_
% and returns the number of elements in each con
% while the bin centers are stored in x_center.
dx=(max(x)-min(x))/N_bins; % This gives the width of each bin.
hist_pdf= (y/L)/dx; % This approximate the pdf to be a constant
pl(1)=plot(x_center,hist_pdf,’color’,’blue’,’linestyle’,’--’,’li
hold on;
x0=[-5:0.001:5]*sigma; % this specifies the range of random vari

true_pdf=1/(sqrt(2*pi)*sigma)*exp(-x0.^2/(2*sigma^2));
pl(2)=plot(x0,true_pdf,’color’,’r’,’linestyle’,’-’,’linewidth’,1
xlabel(’{\itx}’,’FontName’,’Times New Roman’,’FontSize’,16);

ylabel(’{\itf}_{\bf x}({\itx})’,’FontName’,’Times New Roman’,’Fo
legend(pl,’pdf from histogram’,’true pdf’,+1);
set(gca,’FontSize’,16,’XGrid’,’on’,’YGrid’,’on’,’GridLineStyle’,

4000
pdf from histogram
true pdf
3500
3000
2500
f (x)
2000
x
1500
1000
500
0
−5 0 5
x −4
x 10

Multiple Random Variables I
Often encountered when dealing with combined experiments

or repeated trials of a single experiment.
Multiple random variables are basically multidimensional
functions defined on a sample space of a combined
experiment.
Let x and y be the two random variables defined on the same
sample space Ω. The joint cumulative distribution function is
defined as
Fx,y (x, y) = P (x ≤ x, y ≤ y).
Similarly, the joint probability density function is:
∂ 2 Fx,y (x, y)
fx,y (x, y) = .
∂x∂y

Multiple Random Variables II
When the joint pdf is integrated over one of the variables, one
obtains the pdf of other variable, called the marginal pdf:
Z ∞
fx,y (x, y)dx = fy (y),
−∞
Z ∞
fx,y (x, y)dy = fx (x).
−∞
Note that:
Z ∞Z ∞
fx,y (x, y)dxdy = F (∞, ∞) = 1
−∞ −∞
Fx,y (−∞, −∞) = Fx,y (−∞, y) = Fx,y (x, −∞) = 0.

Multiple Random Variables III

The conditional pdf of the random variable y, given that the
value of the random variable x is equal to x, is defined as

 fx,y (x, y)
, fx (x) 6= 0
fy (y|x) = fx (x) .
0, otherwise

Two random variables x and y are statistically independent if

and only if
fy (y|x) = fy (y) or equivalently fx,y (x, y) = fx (x)fy (y).
The joint moment is defined as

Z ∞Z ∞
j k
E{x y } = xj y k fx,y (x, y)dxdy.
−∞ −∞

Multiple Random Variables IV
The joint central moment is

Z ∞Z ∞
E{(x−mx )j (y−my )k } = (x−mx )j (y−my )k fx,y (x, y)dxdy
−∞ −∞
where mx = E{x} and my = E{y}.

The most important moments are
Z ∞Z ∞
E{xy} ≡ xyfx,y (x, y)dxdy (correlation)
−∞ −∞
cov{x, y} ≡ E{(x − mx )(y − my )}

= E{xy} − mx my (covariance).

Multiple Random Variables V

Let σx2 and σy2 be the variance of x and y. The covariance
normalized w.r.t. σx σy is called the correlation coefficient:
cov{x, y}
ρx,y = .
σx σy
ρx,y indicates the degree of linear dependence between two

random variables.
It can be shown that |ρx,y | ≤ 1.
ρx,y = ±1 implies an increasing/decreasing linear relationship.
If ρx,y = 0, x and y are said to be uncorrelated.
It is easy to verify that if x and y are independent, then
ρx,y = 0: Independence implies lack of correlation.
However, lack of correlation (no linear relationship) does not
in general imply statistical independence.
Examples of Uncorrelated Dependent Random Variables

Example 1: Let x be a discrete random variable that takes on
{−1, 0, 1} with probabilities { 14 , 12 , 14 }, respectively. The
random variables y = x3 and z = x2 are uncorrelated but
dependent.
Example 2: Let x be an uniformly random variable over
[−1, 1]. Then the random variables y = x and z = x2 are
uncorrelated but dependent.
Example 3: Let x be a Gaussian random variable with zero
mean and unit variance (standard normal distribution). The
random variables y = x and z = |x| are uncorrelated but
dependent.
Example 4: Let u and v be two random variables (discrete or
continuous) with the same probability density function. Then
x = u − v and y = u + v are uncorrelated dependent random
variables.
Example 1
x ∈ {−1, 0, 1} with probabilities {1/4, 1/2, 1/4}
⇒ y = x3 ∈ {−1, 0, 1} with probabilities {1/4, 1/2, 1/4}
⇒ z = x2 ∈ {0, 1} with probabilities {1/2, 1/2}
my = (−1) 14 + (0) 12 + (1) 14 = 0; mz = (0) 12 + (1) 12 = 12 .
The joint pmf (similar to pdf) of y and z:
1
1 1 P (y = −1, z = 0) = 0
2 z
4 4 P (y = −1, z = 1) = P (x = −1) = 1/4
1
P (y = 0, z = 0) = P (x = 0) = 1/2
−1 P (y = 0, z = 1) = 0
0 1 y P (y = 1, z = 0) = 0
P (y = 1, z = 1) = P (x = 1) = 1/4
Therefore, E{yz} = (−1)(1) 14 + (0)(0) 12 + (1)(1) 14 =0
⇒ cov{y, z} = E{yz} − my mz = 0 − (0)1/2 = 0!
Jointly Gaussian Distribution (Bivariate)

1 1
fx,y (x, y) = exp −
2(1 − ρ2x,y )
q
2πσx σy 1 − ρ2x,y
(x − mx )2 2ρx,y (x − mx )(y − my ) (y − my )2

× − + ,
σx2 σx σy σy2
where mx , my , σx , σy are the means and variances.
ρx,y is indeed the correlation coefficient.
Marginal density is Gaussian: fx (x) ∼ N (mx , σx2 ) and
fy (y) ∼ N (my , σy2 ).
When ρx,y = 0 → fx,y (x, y) = fx (x)fy (y) → random
variables x and y are statistically independent.
Uncorrelatedness means that joint Gaussian random variables
are statistically independent. The converse is not true.
Weighted sum of two jointly Gaussian random variables is also
Gaussian.
Joint pdf and Contours for σx = σy = 1 and ρx,y = 0
ρx,y=0
0.15
2.5
2
0.1
fx,y(x,y)
0.05
0
y
0
−1
2 3
2
0 1
0 −2
−2 −1
−2 −2.5
y −3 −2 −1 0 1 2
x x

Joint pdf and Contours for σx = σy = 1 and ρx,y = 0.3
ρ =0.30
x,y
a cross−section
0.15 2.5
0.1
fx,y(x,y)
0.05
0
y
0
−1
2 3
2
0 1
0 −2
−2 −1
−2 −2.5
y −3 −2 −1 0 1 2
x x

ρ =0.70
x,y
0.2 2.5
2
0.15
a cross−section
fx,y(x,y)
0.1 1
0.05 0
y
0
−1
2 3
2
0 1
0 −2
−2 −1
−2 −2.5
y −3 −2 −1 0 1 2
x x

ρx,y=0.95
0.5
2.5
0.4
2
0.3
fx,y(x,y)
1
0.2
0.1 0
y
0
−1
2 3
2
0 1
0 −2
−2 −1
−2 −2.5
y −3 −2 −1 0 1 2
x x

Multivariate Gaussian pdf
Define −
→
x = [x1 , x2 , . . . , xn ], a vector of the means
−
→
m = [m1 , m2 , . . . , mn ], and the n × n covariance matrix C
with Ci,j = cov(xi , xj ) = E{(xi − mi )(xj − mj )}.
The random variables {xi }ni=1 are jointly Gaussian if:
1
fx1 ,x2 ,...,xn (x1 , x2 , . . . , xn ) = p ×
det(C) (2π)n

1 −
→ −
→ −1 −
→ −
→ ⊤
exp − ( x − m)C ( x − m) .
2
If C is diagonal (i.e., the random variables {xi }ni=1 are all

uncorrelated), the joint pdf is a product of the marginal pdfs:
Uncorrelatedness implies statistical independent for multiple
Gaussian random variables.

Random Processes I
ωj
ω
3
ω1 ωM
ω2
x(t,ω)
Real number
x(tk,ω)
x1(t,ω1)
x2(t,ω2)
. .
. .
xM(t,ωM) . .
. .
. .
t
1
t
2
... . t
k . Time
A mapping from a sample space to a set of time functions.

Random Processes II
Ensemble: The set of possible time functions that one sees.

Denote this set by x(t), where the time functions x1 (t, ω1 ),
x2 (t, ω2 ), x3 (t, ω3 ), . . . are specific members of the ensemble.
At any time instant, t = tk , we have random variable x(tk ).
At any two time instants, say t1 and t2 , we have two different
random variables x(t1 ) and x(t2 ).
Any relationship between them is described by the joint pdf
fx(t1 ),x(t2 ) (x1 , x2 ; t1 , t2 ).
A complete description of the random process is determined by
the joint pdf fx(t1 ),x(t2 ),...,x(tN ) (x1 , x2 , . . . , xN ; t1 , t2 , . . . , tN ).
The most important joint pdfs are the first-order pdf
fx(t) (x; t) and the second-order pdf fx(t1 )x(t2 ) (x1 , x2 ; t1 , t2 ).

Examples of Random Processes I
(b) Uniform phase
(a) Thermal noise
0 t 0 t

Examples of Random Processes II
(d) Binary random data

+V
(c) Rayleigh fading process −V
Tb
0 t 0 t

Classification of Random Processes

Based on whether its statistics change with time: the process
is non-stationary or stationary.
Different levels of stationarity:
Strictly stationary: the joint pdf of any order is independent of
a shift in time.
N th-order stationarity: the joint pdf does not depend on the
time shift, but depends on time spacings:
fx(t1 ),x(t2 ),...x(tN ) (x1 , x2 , . . . , xN ; t1 , t2 , . . . , tN ) =
fx(t1 +t),x(t2 +t),...x(tN +t) (x1 , x2 , . . . , xN ; t1 + t, t2 + t, . . . , tN + t).
The first- and second-order stationarity:

fx(t1 ) (x, t1 ) = fx(t1 +t) (x; t1 + t) = fx(t) (x)
fx(t1 ),x(t2 ) (x1 , x2 ; t1 , t2 ) = fx(t1 +t),x(t2 +t) (x1 , x2 ; t1 + t, t2 + t)

= fx(t1 ),x(t2 ) (x1 , x2 ; τ ), τ = t2 − t1 .
Statistical Averages or Joint Moments
Consider N random variables x(t1 ), x(t2 ), . . . x(tN ). The

joint moments of these random variables is
Z ∞ Z ∞
k1 k2 kN
E{x (t1 ), x (t2 ), . . . x (tN )} = ···
x1 =−∞ xN =−∞
kN
xk11 xk22 · · · xN fx(t1 ),x(t2 ),...x(tN ) (x1 , x2 , . . . , xN ; t1 , t2 , . . . , tN )
dx1 dx2 . . . dxN ,
for all integers kj ≥ 1 and N ≥ 1.

Shall only consider the first- and second-order moments, i.e.,
E{x(t)}, E{x2 (t)} and E{x(t1 )x(t2 )}. They are the mean
value, mean-squared value and (auto)correlation.

Mean Value or the First Moment
The mean value of the process at time t is

Z ∞
mx (t) = E{x(t)} = xfx(t) (x; t)dx.
−∞
The average is across the ensemble and if the pdf varies with
time then the mean value is a (deterministic) function of time.
If the process is stationary then the mean is independent of t
or a constant:
Z ∞
mx = E{x(t)} = xfx (x)dx.
−∞

Mean-Squared Value or the Second Moment
This is defined as
Z ∞
2
MSVx (t) = E{x (t)} = x2 fx(t) (x; t)dx (non-stationary),
−∞
Z ∞
MSVx = E{x2 (t)} = x2 fx (x)dx (stationary).
−∞
The second central moment (or the variance) is:
σx2 (t) = E [x(t) − mx (t)]2 = MSVx (t) − m2x (t) (non-stationary),

σx2 = E [x(t) − mx ]2 = MSVx − m2x (stationary).


Correlation
The autocorrelation function completely describes the power
spectral density of the random process.
Defined as the correlation between the two random variables
x1 = x(t1 ) and x2 = x(t2 ):
Rx (t1 , t2 ) = E{x(t1 )x(t2 )}
Z ∞ Z ∞
= x1 x2 fx1 ,x2 (x1 , x2 ; t1 , t2 )dx1 dx2 .
x1 =−∞ x2 =−∞
For a stationary process:

Rx (τ ) = E{x(t)x(t + τ )}
Z ∞ Z ∞
= x1 x2 fx1 ,x2 (x1 , x2 ; τ )dx1 dx2 .
x1 =−∞ x2 =−∞
Wide-sense stationarity (WSS) process: E{x(t)} = mx for

any t, and Rx (t1 , t2 ) = Rx (τ ) for τ = t2 − t1 .
Properties of the Autocorrelation Function

1. Rx (τ ) = Rx (−τ ). It is an even function of τ because the
same set of product values is averaged across the ensemble,
regardless of the direction of translation.
2. |Rx (τ )| ≤ Rx (0). The maximum always occurs at τ = 0,
though there maybe other values of τ for which it is as big.
Further Rx (0) is the mean-squared value of the random
process.
3. If for some τ0 we have Rx (τ0 ) = Rx (0), then for all integers
k, Rx (kτ0 ) = Rx (0).
4. If mx 6= 0 then Rx (τ ) will have a constant component equal
to m2x .
5. Autocorrelation functions cannot have an arbitrary shape.
The restriction on the shape arises from the fact that the
Fourier transform of an autocorrelation function must be
greater than or equal to zero, i.e., F{Rx (τ )} ≥ 0.
Power Spectral Density of a Random Process (I)

Taking the Fourier transform of the random process does not work.
Time−domain ensemble Frequency−domain ensemble
x (t,ω ) |X (f,ω )|
1 1 1 1
0 t
f
0
x (t,ω ) |X (f,ω )|
2 2 2 2
0 t
f
. 0 .
x (t,ω ) . |X (f,ω )|
.
M M M M
. .
0 t
f
. 0 .
. .
. .

Power Spectral Density of a Random Process (II)
Need to determine how the average power of the process is

distributed in frequency.
Define a truncated process:

x(t), −T ≤ t ≤ T
xT (t) = .
0, otherwise
Consider the Fourier transform of this truncated process:

Z ∞
XT (f ) = xT (t)e−j2πf t dt.
−∞
Average the energy over the total time, 2T :

Z T Z ∞
1 1
P= 2
x (t)dt = |XT (f )|2 df (watts).
2T −T T 2T −∞

Power Spectral Density of a Random Process (III)
Find the average value of P:

Z T Z ∞
1 2 1 2
E{P} = E x (t)dt = E |XT (f )| df .
2T −T T 2T −∞
Take the limit as T → ∞:
Z T Z ∞ n
1 1 o
E |XT (f )|2 df,
2
lim E xT (t) dt = lim
T →∞ 2T −T T →∞ 2T −∞
It follows that
Z T
1
E x2T (t) dt

MSVx = lim
T →∞ 2T −T
n o
Z ∞ E |XT (f )|2
= lim df (watts).
−∞ T →∞ 2T

Power Spectral Density of a Random Process (IV)
Finally,
n o
E |XT (f )|2
Sx (f ) = lim (watts/Hz),
T →∞ 2T
is the power spectral density of the process.
It can be shown that the power spectral density and the
autocorrelation function are a Fourier transform pair:
Z ∞
Rx (τ ) ←→ Sx (f ) = Rx (τ )e−j2πf τ dτ.
τ =−∞

Time Averaging and Ergodicity
A process where any member of the ensemble exhibits the

same statistical behavior as that of the whole ensemble.
All time averages on a single ensemble member are equal to
the corresponding ensemble average:
Z ∞
n
E{x (t))} = xn fx (x)dx
−∞
T
1
Z
= lim [xk (t, ωk )]n dt, ∀ n, k.
T →∞ 2T −T
For an ergodic process: To measure various statistical

averages, it is sufficient to look at only one realization of the
process and find the corresponding time average.
For a process to be ergodic it must be stationary. The
converse is not true.
Examples of Random Processes
(Example 3.4) x(t) = A cos(2πf0 t + Θ), where Θ is a

random variable uniformly distributed on [0, 2π]. This process
is both stationary and ergodic.
(Example 3.5) x(t) = x, where x is a random variable
uniformly distributed on [−A, A], where A > 0. This process
is WSS, but not ergodic.
(Example 3.6) x(t) = A cos(2πf0 t + Θ) where A is a
zero-mean random variable with variance, σA2 , and Θ is
uniform in [0, 2π]. Furthermore, A and Θ are statistically

independent. This process is not ergodic, but strictly
stationary.

Random Processes and LTI Systems
Input Linear, Time-Invariant Output

(LTI) System
x(t ) h(t ) ←
→H( f ) y (t )
Rx , y (τ )
mx , Rx (τ ) ←
→ Sx ( f ) my , Ry (τ ) ←
→ Sy ( f )
Z ∞
my = E{y(t)} = E h(λ)x(t − λ)dλ = mx H(0)
−∞
Sy (f ) = |H(f )|2 Sx (f )
Ry (τ ) = h(τ ) ∗ h(−τ ) ∗ Rx (τ ).

Thermal Noise in Communication Systems
A natural noise source is thermal noise, whose amplitude

statistics are well modeled to be Gaussian with zero mean.
The autocorrelation and PSD are well modeled as:
e−|τ |/t0
Rw (τ ) = kθG (watts),
t0
2kθG
Sw (f ) = (watts/Hz).
1 + (2πf t0 )2
where k = 1.38 × 10−23 joule/0 K is Boltzmann’s constant, G

is conductance of the resistor (mhos); θ is temperature in
degrees Kelvin; and t0 is the statistical average of time
intervals between collisions of free electrons in the resistor (on
the order of 10−12 sec).

(a) Power Spectral Density, Sw(f)
Sw(f) (watts/Hz) N0/2
White noise
Thermal noise
0
−15 −10 −5 0 5 10 15
f (GHz)
(b) Autocorrelation, R (τ)
w
N0/2δ(τ)
Rw(τ) (watts)
White noise
Thermal noise
−0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1
τ (pico−sec)
The noise PSD is approximately flat over the frequency range

of 0 to 10 GHz ⇒ let the spectrum be flat from 0 to ∞:
N0
Sw (f ) =
(watts/Hz),
2
where N0 = 4kθG is a constant.
Noise that has a uniform spectrum over the entire frequency
range is referred to as white noise
The autocorrelation of white noise is
N0
Rw (τ ) = δ(τ ) (watts).
2
Since Rw (τ ) = 0 for τ 6= 0, any two different samples of
white noise, no matter how close in time they are taken, are
uncorrelated.
Since the noise samples of white noise are uncorrelated, if the
noise is both white and Gaussian (for example, thermal noise)
then the noise samples are also independent.
Example
Suppose that a (WSS) white noise process, x(t), of zero-mean and

power spectral density N0 /2 is applied to the input of the filter.
(a) Find and sketch the power spectral density and
autocorrelation function of the random process y(t) at the
output of the filter.
(b) What are the mean and variance of the output process y(t)?
x(t ) R y (t )

R 1
H(f ) = = .
R + j2πf L 1 + j2πf L/R
N0 1 N0 R −(R/L)|τ |
Sy (f ) = ←→ Ry (τ ) = e .
2 1+ 2πL 2 4L

R f2
S y ( f ) (watts/Hz) Ry (τ ) (watts)
N0
2 N0R
4L
0 f (Hz) 0 τ (sec)

Chapter1and3 Slides PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter1and3 Slides PDF

Uploaded by

Copyright:

Available Formats

Chapter 1: Introduction Chapter 3: Probability, Random Variables, Random Processes

A First Course in Digital Communications

A First Course in Digital Communications 1/75

Analog and Digital Amplitude Modulations

What is Digital Communication?

A First Course in Digital Communications 3/75

Why Digital Communications?

Why Digital Communications?

Regenerative Repeater in Digital Communications

Digital communications: Transmitted signals belong to a

Block Diagram of a Communication System

Source Channel De- Channel Source

Digital vs. Analog

Sample Space and Probability

Random experiment: its outcome, for some reason, cannot be

A First Course in Digital Communications 10/75

Example of Throwing a Fair Die

Various events can be defined: “the outcome is even number of

A First Course in Digital Communications 11/75

Three Axioms of Probability

For a discrete sample space Ω, define a probability measure P on

Important Properties of the Probability Measure

1. P (E c ) = 1 − P (E), where E c denotes the complement of E.

A First Course in Digital Communications 13/75

If P (E2 |E1 ) = P (E2 ), or P (E2 ∩ E1 ) = P (E1 )P (E2 ), then

Total Probability Theorem

A First Course in Digital Communications 15/75

A random variable is a mapping from the sample space Ω to

Random Variable in the Example of Throwing a Fair Die

There could be many other random variables defined to describe

A First Course in Digital Communications 17/75

Cumulative Distribution Function (cdf)

cdf gives a complete description of the random variable. It is

Fx (x) = P (ω ∈ Ω : x(ω) ≤ x) = P (x ≤ x).

The cdf has the following properties:

A First Course in Digital Communications 18/75

Typical Plots of cdf I

A random variable can be discrete, continuous or mixed.

A First Course in Digital Communications 19/75

Typical Plots of cdf II

A First Course in Digital Communications 20/75

Probability Density Function (pdf)

Bernoulli Random Variable

A discrete random variable that takes two values 1 and 0 with

A First Course in Digital Communications 22/75

Binomial Random Variable

A First Course in Digital Communications 23/75

Uniform Random Variable

A continuous random variable that takes values between a

A First Course in Digital Communications 24/75

Gaussian (or Normal) Random Variable

A continuous random variable whose pdf is:

Functions of A Random Variable

The function y = g(x) is itself a random variable.

Fy (y) = P (ω ∈ Ω : g(x(ω)) ≤ y).

Assume that for all y, the equation g(x) = y has a countable

where {xi } are the solutions of g(x) = y.

Expectation of Random Variables I

Statistical averages, or moments, play an important role in the

where E denotes the statistical expectation operator.

For n = 2, E{x2 } is known as the mean-squared value of the

Expectation of Random Variables II