Mit6 450f09 Slide04

ENTROPY OF X, [.
[= M, Pr(X=j) = p
j
H(X) = p
j
logp
j
= E[logp
X
(X)]
j
logp (X) is a rv, called the log pmf.
X
H(X) 0; Equality if X deterministic.
H(X) logM; Equality if X equiprobable.
If X and Y are independent random symbols,
then the random symbol XY takes on sample
value xy with probability p
XY
(xy) = p
X
(x)p
Y
(y).
H(XY) = E[logp (XY)] = E[logp (X)p (Y)]
XY X Y
= E[logp (X) logp (Y)] = H(X) + H(Y)
X Y
1
For a discrete memoryless source, a block of

n random symbols, X
1
, . . . , X
n
, can be viewed
as a single random symbol X
n
taking on the
sample value x
n
= x
1
x
2
x
n
with probability
n
p
X
n
(x
n
) = p
X
(x
j
)
j=1
The random symbol X
n
has the entropy
n
H(X
n
) = E[logp (X
n
)] = E[log p (X
j
)]
X
n
X
j=1

n
= E
logp (X
j
)
= nH(X)
X
j=1
2
Fixed-to-variable prex-free codes
Segment input into n-blocks X
n
= X
1
X
2
X
n
.
Form min-length prex-free code for X
n
.
This is called an n-to-variable-length code
H(X
n
) = nH(X)
H(X
n
) E[L(X
n
)]
min
<H(X
n
) +1
E[L(X
n
)]
min
L
min,n
= bpss
n
H(X) L
min,n
<H(X) +1/n
3
WEAK LAW OF LARGE NUMBERS
(WLLN)
Let Y
1
, Y
2
, . . . be sequence of rvs with mean Y
and variance
Y
2
.
The sum S = Y
1
+ + Y
n
has mean nY and
variance n
Y
2
The sample average of Y
1
, . . . , Y
n
is
A
n
Y
=
S
=
Y
1
+ + Y
n
n n
It has mean and variance
2
E[A
n
y
] = Y ; VAR[A
Y
n
] =
Y
n
Note: lim
n
VAR[S] = lim
n
VAR[A
n
] =0.
Y
4
1
y
F
A
n
Y
(y)
F
A
2n
Y
(y)
Pr[A
n
Y
Y [ <
Pr[A
2n
Y
Y [ <
Y Y Y +
The distribution of A
n
Y
clusters around Y , clus-
tering more closely as n .
2
Chebyshev: for > 0, Pr[A
n
Y
Y [
Y
n
2
For any , > 0, large enough n,
Pr[A
Y
n
Y [
5
ASYMPTOTIC EQUIPARTITION
PROPERTY (AEP)
Lete X
1
, X
2
, . . . , be output from DMS.
Dene log pmf as w(x) = logp (x).
X
w(x) maps source symbols into real numbers.
For each j, W(X
j
) is a rv; takes value w(x) for
X
j
= x. Note that
E[W(X
j
)] = p (x)[logp (x)] = H(X)
X X
x
W(X
1
), W(X
2
), . . . sequence of iid rvs.
6
For X
1
= x
1
, X
2
= x
2
, the outcome for W(X
1
)+
W(X
2
) is
w(x
1
) + w(x
2
) = logp (x
1
) logp (x
2
)
X X
= logp
X
(x
1
)p
X
(x
2
)
= logp (x
1
x
2
)= w(x
1
x
2
)
X
1
X
2
where w(x
1
x
2
) is -log pmf of event X
1
X
2
= x
1
x
2
W(X
1
X
2
) = W(X
1
) + W(X
2
)
X
1
X
2
is a randomsymbol in its own right (takes
values x
1
x
2
). W(X
1
X
2
) is -log pmf of X
1
X
2
Probabilities multiply, log pmfs add.
7

For X
n
= x
n
; x
n
= (x
1
, . . . , x
n
), the outcome for
W(X
1
) + + W(X
n
) is
n n
logp
X
(x
j
) = logp
X
n(x
n
) w(x
j
) =
j=1 j=1
Sample average of log pmfs is
S
n
=
W(X
1
) + W(X
n
)
=
logp (X
n
)
X
n
W
n n
WLLN applies and is
Pr A
n
W
E[W(X)]
2
W
n
2
logp (X
n
)
X
n
n
H(X)
2
Pr
n
W
2
.
8

Dene typical set as
T
n
= x
n
:
logp (x
n
)
X
n
n
H(X) <
1
w
F
W
n
Y
(w)
F
W
2n
Y
(w
PrT
n

PrT
2n
)
H H H+
As n , typical set approaches probability 1:
2
Pr(X
n
T
n
) 1
n
W
2
9

We can also express T
n
as
T
n
= x
n
: n(H(X)) < logp
X
n
(x
n
) < n(H(X)+)
T
n
= x
n
: 2
n(H(X)+)
< p (x
n
) < 2
n(H(X))
.
X
n
Typical elements are approximately equiprob-
able in the strange sense above.
The complementary, atypical set of strings,
satisfy
2
Pr[(T
n
)
c
]
W
2
n
For any , > 0, large enough n, Pr[(T
n
)
c
] < .
10
For all x
n
T
n
, p
X
n
(x
n
) >2
n[H(X)+]
.
1 p
X
n
(x
n
) >[T
n
[ 2
n[H(X)+]
x
n
T
n
[T
n
[ <2
n[H(X)+]
1 p
X
n
(x
n
) <[T
n
[2
n[H(X)]
x
n
T
n
[T
n
[ >(1 )2
n[H(X)]
Summary: Pr[(T
n
)
c
] 0, [T
n
[ 2
nH(X)
,
p (x
n
) 2
nH(X)
for x
n
T
n
.
X
n
11
EXAMPLE
Consider binary DMS with Pr[X=1] = p
1
<1/2.
H(X) = p
1
logp
1
p
0
log(p
0
)
Consider a string x
n
with n
1
ones and n
0
zeros.
p
X
n
(x
n
) = p
n
1
1
p
n
0
0
logp
n
X
n
(x
n
)
=
n
n
1
logp
1
n
n
0
logp
0
The typical set T
n
is the set of strings for which
H(X)
logp (x
n
) n
0
n
X
n
=
n
n
1
logp
1
n
logp
0
In the typical set, n
1
p
1
n. For this binary
case, a string is typical if it has about the right
relative frequencies.
12
H(X)
logp (x
n
) n
1
n
0
n
X
n
=
n
logp
1
n
logp
0
The probability of a typical n-tuple is about
p
1
p
1
n
p
0
p
0
n
= 2
nH(X)
.
The number of n-tuples with p
1
n ones is
n!
2
nH(X)
(p
1
n)!(p
0
n)!
Note that there are 2

n
binary strings. Most of
them are collectively very improbable.
The most probable strings have almost all ze-
ros, but there arent enough of them to mat-
ter.
13
Fixed-to-xed-length source codes
For any , >0, and any large enough n, assign
xed length codeword to each x
n
T
n
.
Since [T
n
[ <2
n[H(X)+]
, L H(X)++
1
.
n
Prfailure .
Conversely, take L H(X) 2, and n large.
Probability of failure will then be almost 1.
14
For any > 0, the probability of failure will be
almost 1 if LH(X)2 and n is large enough:
We can provide codewords for at most 2
nH(X)2n
source n-tuples. Typical n-tuples have at most
probability 2
nH(X)+n
.
The aggregate probability of typical n-tuples
assigned codewords is at most 2
n
.
The aggregate probability of typical n-tuples
not assigned codewords is at least 1 2
n
.
Prfailure>1 2
n
1
15
General model: Visualize any kind of mapping
from the sequence of source symbols X
into
a binary sequence Y
.
Visualize a decoder that observes encoded bits,
one by one. For each n, let D
n
be the number
of observed bits required to decode X
n
(deci-
sions are nal).
The rate rv, as a function of n, is D
n
/n.
In order for the rate in bpss to be less than
H(X) in any meaningful sense, we require that
D
n
/n be smaller than H(X) with high probabil-
ity as n .
16
Theorem: For a DMS and any coding/decoding
technique, let , > 0 be arbitrary. Then for
large enough n,
PrD
n
n[H(X) 2] < +2
n
.
Proof: For given n, let m = ]n[H(X) 2]|.
Suppose that x
n
is decoded upon observation
n
of y
j
for some j m. Only x can be decoded
from y
m
. There are only 2
m
source n-tuples
(and thus at most 2
m
typical n-tuples) that
can be decoded by time m. Previous result
applies.
17
Questions about relevance of AEP and xed-
to-xed length source codes:
1) Are there important real DMS sources? No,
but DMS model provides memory framework.
2) Are xed-to-xed codes at very long length
practical? No, but view length as product life-
time to interpret bpss.
3) Do xed-to-xed codes with rare failures
solve queueing issues? No, queueing issues
arise only with real-time sources, and discrete
sources are rarely real time.
18
MARKOV SOURCES
A nite state Markov chain is a sequence S
0
, S
1
, . . .
of discrete cvs from a nite alphabet S where
q
0
(s) is a pmf on S
0
and for n1,
Q(s[s
t
) = Pr(S
n
=s[S
n1
=s
t
)
= Pr(S
n
=s[S
n1
=s
t
, S
n2
= s
n2
. . . , S
0
=s
0
)
for all choices of s
n2
. . . , s
0
, We use the states
to represent the memory in a discrete source
with memory.
19

Example: Binary source X
1
, X
2
, . . . ; S
n
= (X
n1
X
n
)

1; 0.1

00
01
0; 0.9

1; 0.5
0; 0.5 1; 0.5
0; 0.5
10
11
0; 0.1
1; 0.9
Each transition from a state has a single and
distinct source letter.
Letter species new state, new state species
letter.
20
Transitions in graph imply positive probability.
A state s is accessible from state s
t
if graph
has a path from s
t
s.
The period of s is gcd of path lengths from s
back to s.
A nite state Markov chain is ergodic if all
states are aperiodic and accessible from all
other states.
A Markov source X
1
, X
2
, . . . is the sequence of
labeled transitions on an ergodic Markov chain.
21
Ergodic Markov chains have steady state prob-

abilities given by
q(s) = q(s
t
)Q(s[s
t
); s S (1)
s
t
S
q(s) = 1
sS
Steady-state probabilities are approached asymp-
totically from any starting state, i.e., for all
s, s
t
S,
lim Pr(S
n
=s[S
0
=s
t
) = q(s) (2)
n
22
Coding for Markov sources

Simplest approach: use separate prex-free
code for each prior state.
If S
n1
=s, then encode X
n
with the prex-free
code for s. The codeword lengths l(x, s) are
chosen for the pmf p(x[s).
2
l(x,s)
1 for each s
x
It can be chosen by Human algorithm and
satises
H[X[s] L
min
(s) <H[X[s] +1
where
H[X[s] = P(x[s) logP(x[s)
x.
23
If the pmf on S
0
is the steady state pmf, q(s),
then the chain remains in steady state.
H[X[S] L
min
<H[X[S] +1, (3)
where
L
min
= q(s)L
min
(s) and
sS
H[X[S] = q(s)H[X[s]
sS
The encoder transmits s
0
followed by code-
word for x
1
using code for s
0
.
This species s
1
and x
2
is encoded with code
for s
1
, etc.
This is prex free and can be decoded instan-
taneously.
24
Conditional Entropy
H[X S] for Markov is like H[X] for DMS. [

1
H[X[S] = q(s)P(x[s) log
P(x[s)
sSx.
Note that
1
H[XS] =
s,x
q(s)P(x[s) log
q(s)P(x[s)
= H[S] + H[X S] [
Recall that
H[XS] H[S] + H[X]
Thus,
H[X[S] H[X]
25
Suppose we use n-to-variable-length codes for
each state.
H[S
1
, S
2
, . . . S
n
[S
0
] = nH[X[S]
H[X
1
, X
2
, . . . X
n
[S
0
] = nH[X[S]
By using n-to-variable length codes,
H[X[S] L
min,n
<H[X[S] +1/n
Thus, for Markov sources, H[X[S] is asymptot-
ically achievable.
The AEP also holds for Markov sources.
L H[X[S] can not be achieved, either in
expected length or xed length, with low prob-
ability of failure.
26
MIT OpenCourseWare
http://ocw.mit.edu
6.450 Principles of Digital Communication I
Fall 2009
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Mit6 450f09 Slide04

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mit6 450f09 Slide04

Uploaded by

Copyright:

Available Formats

ENTROPY OF X, [.

For a discrete memoryless source, a block of

Note that there are 2

Ergodic Markov chains have steady state prob-

Coding for Markov sources

You might also like