You are on page 1of 9

Assignment EE 331

Principles of Communication

Lecture 35 Transcript

INFORMATION

Zainab Ali
210108062
EEE
Contents
1 INFORMATION 2
1.1 Basic Idea of Information Theory . . . . . . . . . . . . . . . . 2
1.2 Communication Engineering . . . . . . . . . . . . . . . . . . . 2
1.2.1 General Model . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Information Source . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Discrete Memoryless Source . . . . . . . . . . . . . . . 3
1.4 Measure of Information . . . . . . . . . . . . . . . . . . . . . 3
1.4.1 Properties of Information . . . . . . . . . . . . . . . . . 3
1.4.2 Self Information . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.1 Binary Entropy . . . . . . . . . . . . . . . . . . . . . . 5
1.5.2 Joint and Conditional Entropy . . . . . . . . . . . . . . 6
1.6 Studied So far . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1
1 INFORMATION
In this lecture, we’ll start by understanding information in regard to commu-
nication and how it’s connected to the probability of occurrence of events.
We’ll then learn how to quantify it.

1.1 Basic Idea of Information Theory


Information is the source of a communication system, whether it is analogue
or digital and information theory is a mathematical approach to the study
of coding of information along with the quantification, storage, and commu-
nication of information. The very basic way to understand information is to
first establish that information content is different for different events and
as you share more information, the chances of a specific event happening
become smaller.
When more information is shared, the range of possibilities becomes nar-
rower. It’s like playing a guessing game where you start with a big group of
options, but as you get more clues, the possibilities decrease.
1
Information content of an event ∝
Probability of occurrence of the event

1.2 Communication Engineering


The main goal of communication is to transmit information with the least
amount of distortion or minimum noise. In any communication system, there
exists an information source that produces the information, and the purpose
of the communication system is to transmit the output of the source to the
destination.

1.2.1 General Model

Source Channel Sink

The source generates the information, which is represented in the form of a


signal.
The signal is transmitted through the channel, possibly encountering noise
or interference along the way.
The sink receives the signal, processes it, and presents the information in a
meaningful way to the user.

2
1.3 Information Source
Information only comes from randomness. And information originates
from an Information Source. It is the provider of the data or message to
be communicated. The information can take various forms, such as text,
audio, video, or any other data type. For example, an image as the source
of information. And it’s in the form of pixels.

1.3.1 Discrete Memoryless Source


Understanding information can be slightly hard so we begin with the simplest
model for the information source, the discrete memoryless source (DMS). A
DMS is a discrete-time, discrete-amplitude random process in which all X i ’s
are generated independently and with the same distribution.
Let A = {a1 , a2 , . . . , aN } denote the set in which the random variable X
takes its values, and let the probability mass function for the discrete random
variable X be denoted by pi = p(X = ai ) for all i = 1, 2, . . . , N . The set A
called the alphabet, and the probabilities {pi }N i=1 give the full description of
the Discrete Memoryless Source.
We can understand this by taking an example of tossing a coin, in the
case of a cricket match, the information held by the event is which team will
bat, and the coin is a Discrete Memoryless Source.

1.4 Measure of Information


We all understand what information is, but when we want to analyze how
well communication systems work, we need precise ways to measure and
model information. Hartley, Nyquist, and Shannon were the first to come up
with ways to put numbers into information and create mathematical models
for information sources.

1.4.1 Properties of Information


In order to quantify information we must start with understanding the basic
properties of information.

1. Information contained by an event ai depends only on the probability


of occurrence of the event not the value of it. We denote this function
by I(pj ) and call it self-information.

2. Since we know,
1
The information content of an event ∝ probability of the event

3
this implies their no information on a certain event. Hence, I(pj ) is a
continuous decreasing function of pi .

3. If pj = pj1 · pj2 , then I(pj ) = I(pj1 ) + I(pj2 ). We can understand this


using the event of rolling a die. The outcome j (number coming on
the die) can be broken into two independent parts. The number can
be even or odd j1 and also prime or non-prime j2. Since the compo-
nents are independent, revealing the information about one component
(odd/even) does not provide any information about the other compo-
nent (prime or non-prime) and, therefore, intuitively the amount of
information held by the outcome is the sum of the information held by
the two independent parts.

It can be proved that the only function that satisfies all the above properties
is the logarithmic function; i.e.,I(x) = − log(x).
The base of the logarithm is not important and defines the unit by which
the information is measured. If the base is 2, the information is expressed
in bits, and if the natural logarithm is employed, the unit is nats. From this
point on we assume all logarithms are in base 2 and all information is given
in bits.

1.4.2 Self Information


In information theory (elaborated by Claude E. Shannon, 1948), self-information
is a measure of the information content associated with the out-
come of a random variable.

I(pi ) = − log(pi ) (1.4.1)

The above information is contained in the event when the random variable
takes a value ai .

But what about the information contained by the whole of the random vari-
able? There comes the concept of entropy, which we are going to cover in
the next section.

4
1.5 Entropy
We can define the information content of the source as the weighted average
of the self-information of all source outputs. This is justified by the fact
that various source outputs appear with their corresponding probabilities.
Therefore, the information revealed by an unidentified source output is the
weighted average of the self-information of the various source outputs; i.e.,
N
X N
X
H(X) = pi I(pi ) = − pi log pi (1.5.1)
i=1 i=1

H(X) i.e. Entropy is a measure of the uncertainty or randomness asso-


ciated with a random variable or a data source. It quantifies the average
amount of information contained in the outcomes of that variable or source.

Definition The entropy of a discrete random variable X is a function of


its PMF and is defined by
N
X
H(X) = − pi log pi ,
i=1

where 0 log 0 = 0. Note that there exists a slight abuse of notation here.
One would expect H(X) to denote a function of the random variable X and,
hence, be a random variable itself. However, H(X) is a function of the PMF
of the random variable X and is, therefore, a number.

1.5.1 Binary Entropy


In the binary memoryless source with probabilities pand1 − p,, respectively,
we have
H(X) = −p log p − (1 − p) log(1 − p) (1.5.3)
This function, denoted by Hb (p), is known as the binary entropy function,
and a plot of it is given in Figure 1.

5
Figure 1: The binary entropy function.

As we can clearly observe from the plot there is a maximum entropy at


a certain value of p.
To find the maximum of the binary entropy function Hb (p), we differentiate
it with respect to p:

dHb (p) d
= − (p log p + (1 − p) log(1 − p))
dp dp
Now, we set the derivative equal to zero to find the maximum:

dHb (p)
=0
dp
Solving for p:
d
− (p log p + (1 − p) log(1 − p)) = 0
dp
You can solve the above equation to find the value of p that maximizes the
binary entropy function Hb (p). In the case of Bernoulli random variables, we
get p=1/2. This means when both events are equally likely to occur we get
the maximum entropy. Hence,
When randomness is maximum, Entropy is maximum.

1.5.2 Joint and Conditional Entropy


Till now we have been dealing with just a single random variable. When
dealing with two or more random variables, exactly in the same way that

6
joint and conditional probabilities are introduced, one can introduce joint
and conditional entropies. These concepts are especially important when
dealing with sources with memory.

Joint Entropy Joint entropy is defined as the relative entropy between


the joint distribution p(x, y) and the product distribution p(x)p(y) of two
random variables X and Y. It’s the amount by which the uncertainty of one
random variable is reduced due to the knowledge of another. It’s written as
H(X,Y).

Definition The joint entropy of two discrete random variables (X, Y ) is


defined by
XX
H(X, Y ) = − p(x, y) log p(x, y) (1.5.4)
x y

In this definition, p(x, y) represents the joint probability mass function of


the random variables X and Y .
We can extend this definition to a random vector X = (X1 , X2 , . . . , Xn ), the
joint entropy is defined as:
X
H(X) = − p(x) log p(x)
x

where x represents a specific outcome of the random vector X, and p(x)


is the joint probability mass function of X.

Conditional Entropy The conditional entropy measures how much entropy


a random variable X has remaining if we have already learned the value of a
second random variable Y.
Definition The conditional entropy of the random variable X given the
random variable Y is defined as:

X
H(X|Y ) = − p(x1 , x2 , . . . , xn |Y ) log p(x1 , x2 , . . . , xn |Y ) (1.5.5)
x1 ,x2 ,...,xn

In general, we have:

X
H(Xn |X1 , . . . , Xn−1 ) = − p(xn |X1 , . . . , Xn−1 ) log p(xn |X1 , . . . , Xn−1 )
xn

7
1.6 Studied So far
In conclusion, we’ve covered the core principles of Information Theory in
this lecture. We began by introducing the fundamental idea of Information
Theory and its practical applications in Communication Engineering. From
there, we explored the concept of Information Sources, focusing on the sim-
plified Discrete Memoryless Source. We learned how to measure information
and discussed the properties of information, particularly Self Information.
Entropy, a central concept, was explained, including its application in Bi-
nary Entropy and more complex scenarios involving Joint and Conditional
Entropy. This knowledge lays the groundwork for understanding and using
information effectively in further lectures.

Did You Know?


Jamming in communication is somewhat like a drummer playing a
drum. Just as a drummer uses beats to create rhythms and patterns,
jammers use interference to disrupt communication signals. By trans-
mitting noise or signals at the same frequency as the target signal, they
”jam” the communication, making it difficult for the intended message
to be heard or understood.

You might also like