Professional Documents
Culture Documents
Entropy
Joint entropy & conditional entropy
Mutual information
2
Entropy(1/2)
Entropy(self-information)
H ( p) H ( X ) p( x) log 2 p( x)
x
3
Entropy(2/2)
- Example
Simplified Polynesian
letter frequencies
i p t k a i u
per-letter entropy
H ( P) P(i) log P(i) 2.5 bits
i{ p ,t , k , a ,i ,u }
coding
p t k a i u
100 00 101 01 110 111
4
Joint Entropy & Conditional Entropy(1/4)
Joint Entropy
H ( X , Y ) p( x, y ) log p( X , Y )
x y
5
Joint Entropy & Conditional Entropy(2/4)
6
Joint Entropy & Conditional Entropy(3/4)
- Example
Simplified Polynesian Revisited
syllable structure
all words consist of sequences of CV syllables.
C: consonant, V: vowel
p t k H (C ) 1.061 bits
a 1/16 3/8 1/16 1/2
H (V | C ) p(C c) H (V | C c)
c p ,t , k
i 1/16 3/16 0 1/4
1 1 1 3 1 1 1 1 1 1
u 0 3/16 1/16 1/4 H ( , ,0) H ( , , ) H ( ,0, )
8 2 2 4 2 4 4 8 2 2
1/8 3/4 1/8 1.375 bits
H (C , V ) H (C ) H (V | C ) 2.44 bits
7
Joint Entropy & Conditional Entropy(4/4)
Entropy of a Language
1
H rate ( L) lim H ( X 1 , X 2 ,..., X n )
n n
8
Mutual Information(1/2)
Mutual Information H ( X ,Y )
I ( X ; Y ) H ( X ) H ( X | Y ) H (Y ) H (Y | X )
H (X |Y) H (Y | X )
p ( x, y )
p ( x, y ) log I ( X ;Y )
x, y p( x) p( y )
H (X ) H (Y )
the reduction in uncertainty of one random variable due to
knowing about another
the amount of information one random variable contains about
another
measure of independence
I ( X ; Y ) 0 : two variables are independent
grows according to ...
the degree of dependence
the entropy of the variables
9
Mutual Information(2/2)
Chain Rule
I ( X 1n ; Y ) I ( X 1 ; Y ) ... I ( X n ; Y | X 1 ,..., X n 1 )
n
I ( X i ; Y | X 1 ,..., X i 1 )
i 1
10
The Noisy Channel Model(1/2)
Assumption
the output of the channel depends probabilistically on the input
Channel capacity
C max I ( X ; Y )
p( X )
11
The Noisy Channel Model(2/2)
applications
MT, POS tagging, OCR, Speech recognition, ...
12
Derivation of Mutual Information
Mutual Information
I ( X ;Y ) H ( X ) H ( X | Y )
H ( X ) H (Y ) H ( X , Y )
1 1
p( x) log p( y ) log p( x, y ) log p( x, y )
x p( x) y p( y) x, y
1 1
p( x, y ) log p( x, y ) log p( x, y ) log p( x, y )
x, y p ( x) x , y p( y ) x , y
p ( x, y )
p( x, y ) log
x, y p ( x) p( y )
13