You are on page 1of 40

HARAMAYA UNIVERSITY

HARAMAYA INSTITUTE OF TECHNOLOGY


SCHOOL OF ELECTRICAL AND COMPUTER
ENGINEERING

Introduction to Information Theory and Coding

Course Coordinator:-
Dr. MulugetaCourse
Atlabachew (Ass. Professor),
Coordinator :-
Guest Lecturer
Dr. Mulugeta Atlabachew (Ass. Professor)
INFORMATION
 Information can be described as one or more statements or facts that are
received by a human and that have some form of worth to the recipient.

Haramaya University, HIT, School of ECE


 The working definition for information therefore must :
1. be something, although the exact nature (substance, energy, or
abstract concept) isn't clear;
2. provide “new” information: a repetition of previously received
messages isn't informative;
3. be “true:" a lie or false or counterfactual information is mis-
information, not information itself;
4. be ``about" something.

2 11/22/2022 Haramaya University, HIT, School of ECE


INFORMATION

 Information can be thought of as the resolution of


uncertainty;
 It answers the question of "What an entity is" and thus
defines both its essence and the nature of its characteristics.
 The concept of information has different meanings in different
contexts.
 Thus the concept becomes related to notions of constraint,
communication, control, data, form, education, knowledge,
meaning, understanding, mental stimuli, pattern, perception,
representation, and entropy.

11/22/2022 Haramaya University, HIT, School of ECE 3


INFORMATION

 Information is stimuli that has meaning in some context


for its receiver.
 When information is entered into and stored in a
computer, it is generally referred to as data.
 After processing (such as formatting and printing), output
data can again be perceived as information.
 When information is packaged or used for understanding
or doing something, it is known as knowledge.

11/22/2022 Haramaya University, HIT, School of ECE 4


Information

How to measure Information?

Haramaya University, HIT, School of ECE


 The uncertainty of an event is measured by its probability of
occurrence and is inversely proportional to that.
 More uncertain events, require more information to resolve
the uncertainty of that event.
 The mathematical theory of information is based on
probability theory and statistics, and measures information
with several quantities of information.

5 11/22/2022 Haramaya University, HIT, School of ECE


Information

How to measure Information?

Haramaya University, HIT, School of ECE


 The choice of logarithmic base in the following formulae determines
the unit of information entropy that is used.
 The most common unit of information is the bit, based on the binary
logarithm.
 Other units include the nat, based on the natural logarithm or base e,
and the hartley, based on the base 10 or common logarithm.
 The bit is a typical unit of information. It is 'that which reduces
uncertainty by half'.

6 11/22/2022 Haramaya University, HIT, School of ECE


Why logarithm to measure Information?

 We want our information measure I(p) to have several

Haramaya University, HIT, School of ECE


properties
1) Information is a non-negative quantity: I(p) ≥ 0.
2) If an event has probability 1, we get no information from the occurrence of
the event: I(1) = 0.
3) If two independent events occur, then the information we get from observing
the events is the sum of the two informations: I(p1,p2) = I(p1)+I(p2):
4) We want our information measure to be a continuous (and, in fact,
monotonic) function of the probability (slight changes in probability should
result in slight changes in information).

7 11/22/2022 Haramaya University, HIT, School of ECE


Why logarithm to measure Information?

We can therefore derive the following:

Haramaya University, HIT, School of ECE


8 11/22/2022 Haramaya University, HIT, School of ECE
Why logarithm to measure Information?

 From this, we can derive the nice property between logarithm and Self-
information content:

Haramaya University, HIT, School of ECE


 This formula is called the surprise function which represents the
amount of surprise or the amount of self-information a particular
symbol x of a distribution holds.
 Intuitively the definition can be understood as follows: we would be
surprised if a rare symbol ( p(x) is small ) is observed. Thus, lower the
p(x), higher the surprise, which is achieved by the above definition.

9 11/22/2022 Haramaya University, HIT, School of ECE


Why logarithm to measure Information?

 Thus, using different bases for the logarithm results in

Haramaya University, HIT, School of ECE


information measures which are just constant multiples of
each other, corresponding with measurements in different
units:
1. log2 units are bits (from 'binary')-Typical in Information Theory
2. log3 units are trits(from 'trinary')

3. loge units are nats (from 'natural logarithm') (We'll use ln(x)
for loge(x))
4. log10 units are Hartleys, after an early worker in the field.

10 11/22/2022 Haramaya University, HIT, School of ECE


Why logarithm to measure Information?

In general,

Haramaya University, HIT, School of ECE


 The Shannon information content /self-information content/ of an
outcome x is defined to be

and it is measured in bits.


 The entropy of an ensemble X is defined to be the average Shannon
information content of an outcome:

11 11/22/2022 Haramaya University, HIT, School of ECE


Entropy

How to measure Information?

Haramaya University, HIT, School of ECE


 In information theory, the entropy of a random variable is the average level
of "information", "surprise", or "uncertainty" inherent in the variable's
possible outcomes.
 The concept of information entropy was introduced by Claude Shannon in his
work published in 1948.
 Entropy should be a measure of how "surprising" the average outcome of a
variable is.
 Entropy is the average self-information content of the random variable.
 For a continuous random variable, differential entropy is analogous to
entropy.

12 11/22/2022 Haramaya University, HIT, School of ECE


Entropy

How to measure Information?

Haramaya University, HIT, School of ECE


 Suppose now that we have n symbols {a1; a2; : : : ; an}, and some
source is providing us with a stream of these symbols.
 Suppose further that the source emits the symbols with probabilities
{p1; p2; : : : ; pn}, respectively.
 Assume that the symbols are emitted independently.

 What is the average amount of information we get from each symbol


we see in the stream from the source?

13 11/22/2022 Haramaya University, HIT, School of ECE


Entropy

How to measure Information?

Haramaya University, HIT, School of ECE


 What we really want here is a weighted average.

 If we observe the symbol ai, we will be getting log(1/pi) information


from that particular observation.
 In a long run (say N) of observations, we will see (approximately) N*pi
occurrences of symbol ai.
 Thus, in the N (independent) observations, we will get total information
I of

14 11/22/2022 Haramaya University, HIT, School of ECE


Entropy

How to measure Information?

Haramaya University, HIT, School of ECE


 Then, the average information we get per symbol observed for the N
observations will be

 We define to be zero when

15 11/22/2022 Haramaya University, HIT, School of ECE


Entropy

 For generalization, let us suppose that we have a set of probabilities


(a probability distribution) P = {p1, p2,…, pn}.

Haramaya University, HIT, School of ECE


 We define the entropy of the distribution P by:

 if we have a continuous rather than discrete probability distribution


P(x):

16 11/22/2022 Haramaya University, HIT, School of ECE


Entropy

 Another way to think about entropy is in terms of expected value.

Haramaya University, HIT, School of ECE


 Given a discrete probability distribution P = {p1, p2,…, pn} with
and or a continuous distribution P(x) with
and
 We can define the expected value of an associated discrete set F = {f1;
f2; : : : ; fn} or function F(x) by:

 Or
 From this we can conclude that

17 11/22/2022 Haramaya University, HIT, School of ECE


Entropy

 The entropy of a probability distribution is just the expected value of


the information of the distribution..

Haramaya University, HIT, School of ECE


 Intuitively, the more the expected surprise or the entropy of the
distribution, the harder it is to represent.

Therefore, Entropy represents the expected value of surprise a distribution


holds.

18 11/22/2022 Haramaya University, HIT, School of ECE


Information Theory

 Information theory is the scientific study of the

Haramaya University, HIT, School of ECE


quantification, storage, and communication of information.
 The field was fundamentally established by the works of
Harry Nyquist and Ralph Hartley, in the 1920s, and Claude
Shannon in the 1940s.
 The field is at the intersection of probability theory,
statistics, computer science, statistical mechanics,
information engineering, and electrical engineering.

19 11/22/2022 Haramaya University, HIT, School of ECE


Shannon Definition of Communication

 “The fundamental problem of communication is that of reproducing at


one point either exactly or approximately a message selected at
another point.”

“Frequently the messages have meaning”

“... [which is] irrelevant to the engineering problem.”

November
22 Haramaya University, HIT, School of ECE 20
Shannon Definition of Communication

 Shannon wants to find a way for “reliably” transmitting data


throughout the channel at “maximal” possible rate.

Information
Coding
Source
Communication
Channel
Destination Decoding

November
22 Haramaya University, HIT, School of ECE 21
Model of a Digital Communication System
Message Encoder
e.g. English symbols e.g. English to 0,1 sequence

Information
Coding
Source

Communication
Channel

Destination Decoding

Can have noise


or distortion
Decoder
e.g. 0,1 sequence to English

11/22/2022 Haramaya University, HIT, School of ECE 22


Information

 Given a discrete random variable X , with possible outcomes

Haramaya University, HIT, School of ECE


x1,x2,…xn, which occur with probability p(x1),p(x2),…,p(xn) the
entropy of X is formally defined as:
H ( X )   p( x ) log 2 p( x )
x
 This Function is called the entropy function

23 11/22/2022 Haramaya University, HIT, School of ECE


Information
Example
 Tossing a dice:

Haramaya University, HIT, School of ECE


 Outcomes are 1,2,3,4,5,6
 Each occurs at probability 1/6
 Information provided by tossing a dice is

6 6
H    p(i ) log 2 p(i )    p(i ) log 2 p(i )
i 1 i 1
6
1 1
   log 2  log 2 6  2.585 bits
i 1 6 6
24 11/22/2022 Haramaya University, HIT, School of ECE
ENTROPY-Properties
NOTATIONS
 We use capital letters X and Y to name random variables,

Haramaya University, HIT, School of ECE


 lower case letters and to refer to their respective
outcomes.
 These are drawn from particular sets A and B: and

 The probability of a particular outcome is denoted

25 11/22/2022 Haramaya University, HIT, School of ECE


ENTROPY-Properties
JOINT ENSEMBLE
 A joint ensemble „XY' is an ensemble whose outcomes are

Haramaya University, HIT, School of ECE


ordered pairs with
 The joint ensemble XY defines a probability distribution
over all possible joint outcomes
MARGINAL PROBABILITY
 From the Sum Rule, we can see that the probability of X taking on a particular
value is the sum of the joint probabilities of this outcome for X and
all possible outcomes for Y :

26 11/22/2022 Haramaya University, HIT, School of ECE


ENTROPY-Properties
 In a simplified way it can be rewrite as

Haramaya University, HIT, School of ECE


 And similarly for Y

27 11/22/2022 Haramaya University, HIT, School of ECE


ENTROPY-Properties
Conditional probability
 From the Product Rule, we can easily see that the conditional

Haramaya University, HIT, School of ECE


probability that

 In a simplified way

 Similarly

28 11/22/2022 Haramaya University, HIT, School of ECE


ENTROPY-Properties
JOINT ENTROPY ENSEMBLES

Haramaya University, HIT, School of ECE


CONDITIONAL ENTROPY
 Conditional entropy of an ensemble X, given that y = bj measures the
uncertainty remaining about random variable X after specifying that random
variable Y has taken on a particular value y = bj.
 It is defined naturally as the entropy of the probability distribution

29 11/22/2022 Haramaya University, HIT, School of ECE


ENTROPY-Properties
CONDITIONAL ENTROPY
 Conditional entropy of an ensemble X, given an ensemble Y is given by

Haramaya University, HIT, School of ECE


CHAIN RULE FOR ENTROPY
 The joint entropy, conditional entropy, and marginal entropy for two
ensembles X and Y are related by:

30 11/22/2022 Haramaya University, HIT, School of ECE


ENTROPY-Properties
 It should seem natural and intuitive that the joint entropy of a pair of
random variables is the entropy of one plus the conditional entropy of

Haramaya University, HIT, School of ECE


the other (the uncertainty that it adds once its dependence on the first
one has been discounted by conditionalizing on it).

 If we have three random variables X; Y; Z, the conditionalizing of the


joint distribution of any two of them, upon the third, is also expressed
by a Chain Rule:

31 11/22/2022 Haramaya University, HIT, School of ECE


ENTROPY-Properties
INDEPENDENCE BOUND ON ENTROPY
 A consequence of the Chain Rule for Entropy is that if we have many

Haramaya University, HIT, School of ECE


different random variables X1; X2; :::; Xn, then the sum of all their
individual entropies is an upper bound on their joint entropy:

 Their joint entropy only reaches this upper bound if all of the random
variables are independent.

32 11/22/2022 Haramaya University, HIT, School of ECE


ENTROPY-Properties
MUTUAL INFORMATION
 Mutual information is a quantity that measures a relationship between

Haramaya University, HIT, School of ECE


two random variables that are sampled simultaneously.
 In particular, it measures how much information is communicated, on
average, in one random variable about another.
 Intuitively, one might ask, how much does one random variable tell me
about another?

 In this definition, P(X) and P(Y ) are the marginal distributions of X and
Y obtained through the marginalization process.

33 11/22/2022 Haramaya University, HIT, School of ECE


ENTROPY-Properties
MUTUAL INFORMATION
 Note that in case X and Y are independent random variables, then the

Haramaya University, HIT, School of ECE


numerator inside the logarithm equals the denominator.
 Then the log term vanishes, and the mutual information equals zero, as
one should expect.
 In the event that the two random variables are perfectly correlated,
then their mutual information is the entropy of either one alone.

 The mutual information of a random variable with itself is just its


entropy or self-information
34 11/22/2022 Haramaya University, HIT, School of ECE
ENTROPY-Properties
MUTUAL INFORMATION
 The mutual information I (X;Y) is the intersection between H(X) and H(Y)

Haramaya University, HIT, School of ECE


 These properties are reflected in three equivalent definitions for the
mutual information between X and Y :

35 11/22/2022 Haramaya University, HIT, School of ECE


Information

Haramaya University, HIT, School of ECE


Assignment-1

Using Probability Theory, Show that there is only one way to


measure information which is in terms of number of bits.

36 11/22/2022 Haramaya University, HIT, School of ECE


Shannon’s 1st Source Coding Theorem
 Shannon showed that:
“To reliably store the information generated by some

Haramaya University, HIT, School of ECE


random source X, you need no more/less than, on the
average, H(X) bits for each outcome.”

37 11/22/2022 Haramaya University, HIT, School of ECE


Shannon’s 1st Source Coding Theorem
 If I toss a dice 1,000,000 times and record values from each
trial

Haramaya University, HIT, School of ECE


1,3,4,6,2,5,2,4,5,2,4,5,6,1,….
 In principle, I need 3 bits for storing each outcome as 3 bits
covers 1-8. So I need 3,000,000 bits for storing the
information.
 Using ASCII representation, computer needs 8 bits=1 byte
for storing each outcome
 The resulting file has size 8,000,000 bits

38 11/22/2022 Haramaya University, HIT, School of ECE


Shannon’s 1st Source Coding Theorem
 You only need 2.585 bits for storing each outcome.
So, the file can be compressed to yield size

Haramaya University, HIT, School of ECE


2.585x1,000,000=2,585,000 bits
 Optimal Compression Ratio is:

2,585,000
 0.3231  32.31%
8,000,000

39 11/22/2022 Haramaya University, HIT, School of ECE


Shannon’s 1st Source Coding Theorem

Haramaya University, HIT, School of ECE


40
11/22/2022 Haramaya University, HIT, School of ECE

You might also like