Part 2
Prof H Xu
FEB 28, 2013
Useful Maths
What is the definition of random variable?
A variable whose values are random but whose
statistical distribution is known.
What is probability density function (pdf)?
A probability function describes all the values that the
random variable can take .
Useful Maths
What is cumulative distribution function (cdf.)?
If pdf f(x), then cdf F(x) is given as
}
= s =
x
dx x f x X P x F ) ( ) ( ) (
The relationship between cdf and pdf:
dx
x dF
x f
) (
) ( =
Joint probability distribution and density function
) , ( ) , ( y Y x X P y x F
xy
s s =
dxdy
y x dF
y x f
) , (
) , ( =
Useful Maths
Let X be a random variable. Assume the probability density
function (pdf) of X is f(x) ( )
The mean of X is defined as
The variance of X is defined as
Remind:
}
= = dx x f x X E ) ( ] [
< < x
}
= = dx x f x X D ) ( ) ( ] [
2 2
o
1 ) ( =
}
+
dx x f
Conditional Probability density function
Useful Maths
) (
) , (
)  (

y f
y x f
y x f
y x
=
Statistical Independence: Two random variables X,Y are
called statistically independent if and only if
) ( ) ( ) , ( y f x f y x f
y x xy
= or ) ( ) ( ) , ( y F x F y x F
y x xy
=
Useful Maths
Let X be a random variable. Assume the probability density
function (pdf) of X is f(x) ( ).
Let be also a random variable. The pdf of Y is
where is the inverse function of
< < x
) (X g Y =
) ( y
< <
=
others
y y h y h f
y
0
) ( ' )] ( [
) (
 o
) ( y h
) (x g
{ } ) ( ), ( min + = g g o
{ } ) ( ), ( max + = g g 
Useful Maths
Example 1
}
=
s =
s =
a y
x
Y
dx x f
a y X P
y aX P y F
/
0
) (
) / (
) ( ) (
a y
x
y
y
e
a
a y f
a
dy
y dF
y f
/
1
) / (
1
) (
) (
=
=
=
0 > a
Let aX Y = and pdf is
x
x
e x f
= ) (
Find
) ( y f
y
0 > x
) 0 ( > a
What is the random variable?
meaning of PDF
Know to find mean and variance
Know to find PDf of a function of a random
variable
Summary of Useful Maths
Discrete Memoryless Channels
A discrete memoryless channel is a statistical model with an
input and output , where is a noisy version of .
X Y
Y
X
view of a discrete memoryless channel
( ) ( )
  for all and
k j k j
p y x P Y y X x j k = = =
Naturally, we have
( )
0  1 for all and
k j
p y x j k s s
Discrete Memoryless Channels
For example:
( )
0 0
 1 p y x q =
( )
1 0
 p y x q =
( )
0 1
 p y x q =
( )
1 1
 1 p y x q =
Discrete Memoryless Channels
A convenient way of describing a discrete memoryless
channel is to arrange the various transition probabilities of the
channel in the form of a matrix as follows
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
0 0 1 0 1 0
0 1 1 1 1 1
0 1 1 1 1 1
  
  
  
K
K
J J K J
p y x p y x p y x
p y x p y x p y x
p y x p y x p y x
(
(
(
=
(
(
(
P
( )
1
0
 1 for all
K
k j
k
p y x j
=
=
( ) ( )
( ) ( )
0 0 1 0
0 1 1 1
 
 
p y x p y x
p y x p y x
(
=
(
P
( ) ( )
0 0 1 0
  1 p y x p y x + =
( ) ( )
0 1 1 1
  1 p y x p y x + =
Discrete Memoryless Channels
( ) ( )
( ) ( )
, ,

j k j k
k j j
p x y P X x Y y
P Y y X x P X x
= = =
= = = =
The joint probability distribution of the random variables
and is given by
X
Y
( ) ( )

k j j
p y x p x =
( ) ( )
for 0,1, , 1
j j
p x P X x j J = = =
( )
0 0
 1 p y x q =
( )
1 0
 p y x q =
( )
0 1
 p y x q =
( )
1 1
 1 p y x q =
For example:
( )
0 0
0 0 0
( , )
 ( )
1
(1 )
2
p x y
p y x p x
q
=
=
Assume:
0 1
( ) ( ) 1/ 2 p x p x = =
Discrete Memoryless Channels
The marginal probability distribution of the output random
variable is obtained by averaging out the dependence of
on , as shown by
Y
( )
,
j k
p x y
j
x
( ) ( )
( ) ( )
1
0

k k
J
k j j
j
p y P Y y
P Y y X x P X x
=
= =
= = = =
( ) ( )
1
0
 for 0,1, , 1
J
k j j
j
p y x p x k K
=
= =
=
= =
For example:
( )
0 0
 1 p y x q =
( )
1 0
 p y x q =
( )
0 1
 p y x q =
( )
1 1
 1 p y x q =
( ) ( )
0
0 0 0 0 1 1
( )
 ( )  ( )
1/ 2
p y
p y x p x p y x p x = +
=
Assume:
0 1
( ) ( ) 1/ 2 p x p x = =
Discrete Memoryless Channels
For , the average probability of symbol error, , is
defined as the probability that the output random variable
is different from the input random variable , averaged
over all we thus write
J K =
e
P
k
Y
j
X
. j k =
( )
1
0
K
j
e k
k
k j
P P Y y
=
=
= =
( ) ( )
1 1
0 0

K J
k j j
k j
k j
p y x p x
= =
=
=
=
=
= =
( ) ( )
1 1
0 0

K J
k j j
k j
k j
p y x p x
= =
=
=
=
=
= =
( ) ( )
1 1
0 0

K J
k j j
k j
k j
p y x p x
= =
=
=
2
1
) 1 ( ) 0 (
1 0
= = = = x p x p
) 1 ( ) 1  0 ( ) 0 ( ) 0  1 ( = = = + = = = = x p x y p x p x y p p
e
p x y p p
e
= = = = ) 0  1 (
Symmetric channel
Review of entropy
{ } { }
= = =
= =


.

\

= = =
K
k
k k
K
k
k
k
k
K
k
k k k
p p I E
p
p I p I E S H
1 1 1
log
1
log ) (
Average amount of information per symbol
Average amount of surprise when observing the symbol
Uncertainty the observer has before seeing the symbol
Average number of bits needed to communicate the
symbol
Conditional entropy
( ) ( )
( )
1
2
0
1
  log

J
k j k
j
j k
H X y p x y
p x y
=
 
 =

\ .
\

= = =
K
k
k k
K
k
k
k
k
K
k
k k k
p p I E
p
p I p I E S H
1 1 1
log
1
log ) (
In the following discussion, use X to replace S .
Conditional entropy
( ) ( )
( )
1
2
0
1
  log

J
k j k
j
j k
H X y p x y
p x y
=
 
 =

\ .
( ) ( )
1
 ( / )
=
=
K
k k
k
H X Y H X y p y
( ) ( ) ( )
( )
2
1 1
1
  log

K J
k j k
k j
j k
H X Y p y p x y
p x y
= =
=
( )
( )
2
1 1
1
, log

K J
j k
k j
j k
p x y
p x y
= =
=
( ) ( )
( )
2
1 1
1
, , log
,
J K
j k
j k
j k
H X Y p x y
p x y
= =
=
X Y
H X Y p x y p x y E P X Y = =
= =
X Y
Y X p E y x p y x p Y X H )]] , ( [log[ )] , ( log[ ) , ( ) , (
Mutual Information
)
`
=
(
=
X Y
Y p X p
Y X p
E
y p x p
y x p
y x p Y X I
) ( ) (
) , (
log
) ( ) (
) , (
log ) , ( ) ; (
{ } ) / ( ) ( )] / ( log[
) (
1
log
) (
1
) (
) , (
log ) ; (
Y X H X H Y X p E
X p
E
X p Y p
Y X p
E Y X I
= +
)
`
=
)
`
=
) / ( ) ( ) / ( ) ( ) ; ( X Y H Y H Y X H X H Y X I = =
( , ) ( , ). H X Y H Y X =
) / ( ) ( ) ; ( X Y H Y H Y X I =
For
, the concept of mutual information
is essentially a measure of how much information about the
random variable Y is contained in the random variable X.
Similarly
{ } ) / ( ) ( )] / ( log[
) (
1
log
) (
1
) (
) , (
log ) ; (
X Y H Y H X Y p E
Y p
E
Y p X p
Y X p
E Y X I
= +
)
`
=
)
`
=
Mutual Information
) / ( ) ( ) ; ( Y X H X H Y X I =
Reduction in the uncertainty of X due to the knowledge of Y
) / ( ) ( ) ; ( X Y H Y H Y X I =
Reduction in the uncertainty of Y due to the knowledge of X
Mutual Information
Interpretation: on the average bits will be saved
if both transmitter and receiver know Y.
) / ( ) ( Y X H X H
Mutual Information
If there is no communication, (X and Y are independent)
0 ) / ( ) ( ) ; ( = = Y X H X H Y X I ) ( ) / ( X H Y X H =
If communication channel is perfect (X and Y are identical)
) ( ) ( ) ; ( Y H X H Y X I = = 0 ) / ( = Y X H
If communication channel is not perfect , ) ( ) ; ( 0 X H Y X I < <
Channel Capacity
Channel Capacity
The channel capacity is the maximum value of
( )
; I X Y
Channel capacity: ) ; ( max
) (
Y X I C
x p
=
Bits/sec
If the symbol rate is symbolssec, then
r
Channel capacity: ) ; ( max
) (
Y X I r C
x p
=
Examples of Channel Capacity
Example 1: noiseless binary channel
Assume p(x=0)=p(x=1)=1/2
max ( ; ) 1 C I X Y = =
bit
Examples of Channel Capacity
Example 2: noise channel with nonoverlapping outputs
Assume p(x=0)=p(x=1)=1/2
max ( ; ) 1 C I X Y = =
bit
Examples of Channel Capacity
Example 3: Binary Symmetric Channel
Assume p(x=0)=p(x=1)=1/2
max ( ; ) 1 ( ) C I X Y H p = =
bit
Examples of Channel Capacity
Example 4: Binary Erasure Channel
Assume p(x=0)=p(x=1)=1/2
max ( ; ) 1 C I X Y o = =
bit
Y=X+Z
X
Z
Y
) 2 log(
2
1
) (
2
N
e Z H o t =
)] ( 2 log[
2
1
) (
2 2
S N
e Y H o o t + s
Will be proved late
2 2
[ ]
N
E Z =o
2 2
[ ]
S
E X =o
Examples of Channel Capacity
Example 5: Gaussian Channel
Capacity of Gaussian Channel
] 2 log[
2
1
)] ( 2 log[
2
1
) ( ) ( ) ; (
2 2 2
N N S
e e Z H Y H Y X I o t o o t + s =
) 1 log(
2
1
) 1 log(
2
1
]
2
2 2
log[
2
1
2
2
2
2 2
o
o
o t
o t o t
+ = + =
+
=
N
S
N
N S
e
e e
2
2
N
S
o
o
=
is signal to noise ratio per symbol.
Channel capacity: ) ; ( max
2 2
] [
) (
Y X I C
s
X E
x p
o s
=
)
2
exp(
2
1
) (
2
2
2
N
N
z
z p
o
o t
= ) 2 ( log
2
1
) (log
2
2
2 2
2
2
N
N
e
z
E to
o
  ) 2 ( log
2
1
) (log
2
1
2
2
2
2
2
N
N
z E e to
o
+ =
  ) 2 ( log
2
1
) (log
2
1
2
2
2
2
2
N
N
z E e to
o
+ =
) 2 ( log
2
1
) 2 ( log
2
1
) (log
2
1
2
2
2
2 2 N N
e e o t to = + =
2 2 2 2 2 2
] [ ] [ ] ) [( ] [
N S
Z E X E Z X E Y E o o + s + = + =
Capacity of Gaussian Channel
Prove
)] ( 2 log[
2
1
) (
2 2
S N
e Y H o o t + s
Entropy maximized if Y is Gaussian, we have
)] ( 2 log[
2
1
) (
2 2
S N
e Y H o o t + s ) 2 log(
2
1
) (
2
N
e Z H o t =
Capacity of limited continuous Gaussian Channel
The capacity of a band limited continuous Gaussian channel is
given by the Shannon Hartley law (also called Shannons 3
rd
theorem or the channel capacity theorem)
log 1
S
C B
N
 
= +

\ .
B N
S
N
S
ratio noise to signal
N
S
bandwidth B
0
=