COMP3357 - 2023-Lec 3

CRYPTOGRAPHY (COMP3357)
Ravi Ramanathan
TA: Chengru Zhang
Department of Computer Science, HKU
Lecture 3 - Information-Theory. Perfect Secrecy in Information-Theoretic Terms.
Email: ravi at cs dot hku dot hk

Optimality of the One-Time Pad
Theorem. If (Gen, Enc, Dec) is a perfectly secret encryption scheme with message space ℳ
and key space 𝒦, then | 𝒦 | ≥ | ℳ | .
Proof. Suppose | 𝒦 | < | ℳ | . We will show that there exists a distribution on ℳ, a message m,
and a ciphertext c such that
Pr[M = m | C = c] ≠ Pr[M = m] .
1
Take the uniform distribution on ℳ, i.e. Pr[M = m] = ∀m ∈ ℳ .
|ℳ|
Take any ciphertext c . Consider the set ℳ(c) = {Deck(c)} .

k∈𝒦
These are the only possible m that could yield the ciphertext c through any key k .
Optimality of the One-Time Pad
Theorem. If (Gen, Enc, Dec) is a perfectly secret encryption scheme with message space ℳ
and key space 𝒦, then | 𝒦 | ≥ | ℳ | .
Proof. Continued. Observe that | ℳ(c) | ≤ | 𝒦 | < | ℳ | . This is because Dec is deterministic.
Therefore, there is some m* ∈ ℳ such that m* ∉ ℳ(c) .
In other words, we have Pr[M = m* | C = c] = 0.
1
But by the assumption that the message distribution is uniform, we know that Pr [M = m*] = .
|ℳ|
Pr [M = m* | C = c] ≠ Pr [M = m*]
Thus, the condition of perfect secrecy is not satisfied and the scheme is not perfectly secret.
By contraposition, we have that | 𝒦 | ≥ | ℳ | for every perfectly secret encryption scheme. ◼

Shannon’s Theorem
Theorem. (Shannon's Theorem). Let (Gen, Enc, Dec) be an encryption scheme with message space
ℳ for which | ℳ | = | 𝒦 | = | 𝒞 | . The scheme is perfectly secret if and only if
1. Every key k ∈ 𝒦 is chosen with equal probability 1/ | 𝒦 | by the Gen algorithm.
2. For every m ∈ ℳ and every c ∈ 𝒞 there exists a unique key k ∈ 𝒦 such that Enck(m) outputs c .
It can be readily seen that the one-time pad satisfies both conditions, since Gen chooses the key
uniformly at random, and for every m, c there is a unique key k = m ⊕ c mapping m to c .

Proof of Shannon’s Theorem
Proof. We'll take Enc to be deterministic, the case when Enc is randomised follows analogously.
Suppose that the distribution over ℳ, 𝒞 are such that all m, c are assigned non-zero probabilities.
Perfect Secrecy ⟹ Conditions 1 and 2.
Firstly, we observe that for every m, c there must be at least one k ∈ 𝒦 such that Enck(m) = c .
As otherwise Pr[M = m | C = c] = 0 ≠ Pr[M = m] .
This implies that for any fixed m, the set of all ciphertexts {Enck(m)}k∈𝒦 satisfies {Enck(m)}k∈𝒦 ≥ | 𝒞 | .
But since for every k ∈ 𝒦, Enck(m) ∈ 𝒞, it also holds that {Enck(m)}k∈𝒦 ≤ | 𝒞 | .
Therefore, we have {Enck(m)}k∈𝒦 = | 𝒞 | which in turn equals | 𝒦 | .

Proof. Contd. Since {Enck(m)}k∈𝒦 = | 𝒦 | , we see that for every pair (m, c),
there is at most one k ∈ 𝒦 such that Enck(m) = c, which is Condition 2.
Fix ciphertext c . Let ℳ = {m1, …, mn} and let ki denote the key that encrypts mi to c .
Pr[C = c | M = mi] ⋅ Pr[M = mi]

Perfect Secrecy implies that Pr[M = mi] = Pr[M = mi | C = c] =
Pr[C = c]
Pr[K = ki] ⋅ Pr[M = mi]

= .
Pr[C = c]
Therefore, we have Pr[K = ki] = Pr[C = c] . Similarly Pr[K = kj] = Pr[C = c] for every j .
1
Thus, we obtain that Pr[K = ki] = which is Condition 1.
|𝒦|
Proof. Contd. Conditions 1 and 2 ⟹ Perfect Secrecy .
1
When Conditions 1,2 hold, we have that for every m, c it holds that Pr[C = c | M = m] = .
|𝒦|
This implies that for every pair of messages m, m′ ∈ ℳ and every ciphertext c, we have
1
Pr[C = c | M = m] = = Pr[C = c | M = m′],
|𝒦|
which is an equivalent form of the Perfect Secrecy criterion. ◼

Perfect Secrecy Exercise
Q1. Suppose the message space is M = {0,…,4}, and Gen chooses randomly from {0,…,5} .
Suppose Enck(m) := (k + m) mod 5, and Deck(c) := (c − k) mod 5.
Is this Encryption Scheme perfectly secure?
A1. By definition Perfect Secrecy requires: Pr[C = c | M = m] = Pr[C = c] . M, C must be independent rv's.
∑
Pr[C = c | M = m] = Pr[K = k]
k:c=Enck (m)
For M = 1, say, applying every possible key in {0,…,5} gives the ciphertexts {1,2,3,4,0,1}
1
We get Pr[C = 0 | M = 1] = . This is because C = 0 only occurs from M = 1 when k = 4.
6
C = 0 can only be obtained by the following (m, k) out of 30 combinations: {(0,0), (0,5), (1,4), (2,3), (3,2), (4,1)}
6 1 1
Therefore, Pr[C = 0] = = ≠ . This scheme is not Perfectly Secure. ◼
30 5 6
Q2. Consider a private-key encryption scheme with message space ℳ := {a, b, c}
(with respective probabilities for a, b, c being 0.6, 0.3, 0.1), and key space 𝒦 := {k1, k2, k3}
(with respective probabilities for choosing keys k1, k2, k3 being 0.4, 0.3, 0.3.)
k1 k2 k3
The encryption table takes the form a 1 2 3
b 2 1 3
c 3 1 2
The table is to be read as saying that Enck1(a) = 1, Enck3(b) = 3 and so on.
What is the probability that the ciphertext is 2? Calculate Pr[M = b | C = 2], Pr[M = b | C = 3] .
Is this scheme perfectly secret?

A2. Pr[C = 2] = Pr[C = 2 | M = a] ⋅ Pr[M = a] + Pr[C = 2 | M = b] ⋅ Pr[M = b] + Pr[C = 2 | M = c] ⋅ Pr[M = c]
= Pr[K = k2] ⋅ Pr[M = a] + Pr[K = k1] ⋅ Pr[M = b] + Pr[K = k3] ⋅ Pr[M = c]
= 0.3 ⋅ 0.6 + 0.4 ⋅ 0.3 + 0.3 ⋅ 0.1 = 0.18 + 0.12 + 0.03 = 0.33.
Pr[C = 2 | M = b] ⋅ Pr[M = b] Pr[K = k1] ⋅ Pr[M = b]

Pr[M = b | C = 2] = =
Pr[C = 2] Pr[C = 2]
0.4 × 0.3 12
= = ≈ 0.364.
0.33 33
Since Pr[M = b | C = 2] ≠ Pr[M = b], the scheme is not perfectly secret. ◼

Overview of Information Theory
Concepts
Entropy
Definition. The Entropy H(X ) of a discrete random variable X is defined as
∑
H(X ) = − Pr[X = x] log2 Pr[X = x]
x∈𝒳
Entropy is a measure of the uncertainty of a random variable and is expressed in bits.
We use the convention that 0 log2 0 = 0.
The entropy is equivalently stated as H(X ) = 𝔼 (−log2 Pr[X ]), where 𝔼 denotes the
expectation value.
E.g. If X takes one value with probability 1 and other values with probability 0, then
the entropy of X is zero.
If X takes n values each with probability 1/n then the entropy of X is log2 n .
In general, we note that H(X ) ≤ log2 | 𝒳 |
Entropy: Example
Example. Suppose we have a horse race with eight horses taking part. Assume that the
( 2 4 8 16 64 64 64 64 )
1 1 1 1 1 1 1 1
probabilities of winning for the eight horses are , , , , , , , .
Suppose that we want to send a message indicating which horse won the race.
How many bits on average should we use in the best description as to which horse won?
The entropy of the horse race is calculated to be H(X ) = 2 bits.
We want to send the index of the winning horse. It makes sense to use shorter descriptions for
more probable horses and longer descriptions for the less probable ones.
We could use the following set of bit strings: 0, 10, 110, 1110, 111100, 111101, 111110, 111111.
The average description length above is 2 bits as opposed to 3 for the uniform code.
Entropy of a r.v. is a lower bound on the average number of bits required to represent the r.v.
2
is shown in Figure 2.1. The figure illustrates some of the basic properties
of entropy: It is a concave function of the distribution and equals 0 when
p = 0 or 1. This makes sense, because when p = 0 or 1, the variable
is not random and there is no uncertainty. Similarly, the uncertainty is
maximum when p = 12 , which also corresponds to the maximum value of
the entropy. Entropy: Example
Example
Example. 2.1.2 theLet
Calculate entropy of r.v. X with distribution
⎧
⎪
⎪ a with probability 12 ,
⎪
⎨ b with probability 14 ,
X= (2.6)
⎪
⎪ c with probability 18 ,
⎪
⎩
d with probability 18 .
The entropy of X is
The entropy of X is H(X ) = − 1/2 log2 1/2 − 1/4 log2 1/4 − 2/8 log2 1/8 = 7/4 bits.
1 1 1 1 1 1 1 1 7
SupposeHwe wish
(X) = to
−determine log of−X with
log −the value logthe − log number
minimum = of bits. (2.7)
binary questions.
2 2 4 4 8 8 8 8 4
An efficient first question is: Is X = a?
If the answer is no, an efficient second question is: Is X = b?
If the answer is no, an efficient third question is: Is X = c?

2
is shown in Figure 2.1. The figure illustrates some of the basic properties
of entropy: It is a concave function of the distribution and equals 0 when
p = 0 or 1. This makes sense, because when p = 0 or 1, the variable
is not random and there is no uncertainty. Similarly, the uncertainty is
maximum when p = 12 , which also corresponds to the maximum value of
the entropy. Entropy: Example
Example
Example. 2.1.2 theLet
Calculate entropy of r.v. X with distribution
⎧
⎪
⎪ a with probability 12 ,
⎪
⎨ b with probability 14 ,
X= (2.6)
⎪
⎪ c with probability 18 ,
⎪
⎩
d with probability 18 .
The entropy of X is
The expected number of binary questions required to determine the value of X in the best
1 1 1 1 1 1 1 1 7
(X) = −scheme
possibleHquestioning log is−1.75.log − log − log = bits. (2.7)
2 2 4 4 8 8 8 8 4
In general, the expected number of binary questions required to determine a random variable X
lies between H(X ) and H(X ) + 1.

Conditional Entropy
Definition. The Conditional Entropy H(X | Y ) of a discrete random variable X conditioned on
another random variable Y is defined as
∑
H(X | Y ) = − Pr[X = x, Y = y] log2 Pr[X = x | Y = y]
x∈𝒳,y∈𝒴
The conditional entropy is equivalently stated as H(X | Y ) = 𝔼 (−log2 Pr[X | Y ]) .
∑
It is also customary to use the notation H(X | Y = y) := − Pr[X = x | Y = y] log2 Pr[X = x | Y = y]
x∈𝒳
∑
We observe that H(X | Y ) := − Pr[Y = y] H(X | Y = y)
y∈𝒴
Chain Rule and other entropy relations
The following Chain Rule holds

H(XY ) = H(X ) + H(Y | X ) = H(Y ) + H(X | Y )
∑
Here H(XY ) = H(X, Y ) = − Pr[X = x, Y = y] log2 Pr[X = x, Y = y] denotes the entropy
x∈𝒳,y∈𝒴
of the joint random variable (X, Y ) .
In general, we have H(X1X2…Xn) = H(X1) + H(X2 | X1) + H(X3 | X1X2) + … + H(Xn | X1X2…Xn−1)
The uncertainty of a random variable X can never increase by knowledge of the outcome of
another random variable Y, i.e. H(X | Y ) ≤ H(X ) with equality iff X, Y are independent.
We thus have H(XY ) ≤ H(Y ) + H(X ) with equality iff X, Y are independent.
and take the expectation of both sides of the equation to obtain the
theorem. !
Corollary
H (X, Y |Z) = H (X|Z) + H (Y |X, Z). (2.21)
Proof: The proof follows along the same lines as the theorem. !
Conditional Entropy and Joint Entropy: Example
Example 2.2.1 Let (X, Y ) have the following joint distribution:
Example. Let (X, Y ) have the following joint distribution
The marginal distribution of X is ( 12 , 14 , 18 , 18 ) and the marginal distribution
(2 4 8 8)
1 1 1of 1Y is ( 1 , 1 , 1 , 1 ), and hence H (X) = 7 bits and H (Y ) = 2 bits. Also,
The marginal distribution of X is , , , . So
4 4H(X 4 4) = 7/4 bits. 4
4
!
(4 4 4 4)
H (X|Y ) = p(Y = i)H (X|Y = i) (2.22)
1 1 1 1
The marginal distribution of Y is , , , . So H(Y ) = 2i=1bits.
" # " #
1 1 1 1 1 1 1 1 1 1
= H , , , + H , , ,
4 2 4 8 8 4 4 2 8 8
" #
What are H(X | Y ), H(Y | X ) and H(X, Y )? 1
+ H
1 1 1 1
, , , +
1
H (1, 0, 0, 0) (2.23)
4 4 4 4 4 4
1 7 1 7 1 1
= × + × + ×2+ ×0 (2.24)
4 4 4 4 4 4
11
= bits. (2.25)
8
13 27
Similarly, H (Y |X) = 8 bits and H (X, Y ) = 8 bits.
Remark Note that H (Y |X) ̸= H (X|Y ). However, H (X) − H (X|Y ) =

and take the expectation of both sides of the equation to obtain the
theorem. !
Corollary
H (X, Y |Z) = H (X|Z) + H (Y |X, Z). (2.21)
Proof: The proof follows along the same lines as the theorem. !
Conditional Entropy and Joint Entropy: Example
Example 2.2.1 Let (X, Y ) have the following joint distribution:
Example. Let (X, Y ) have the following joint distribution
∑
H(X | Y ) = P(Y = y) ⋅ H(X | Y = y) The marginal distribution of X is ( 12 , 14 , 18 , 18 ) and the marginal distribution
y=1 of Y is ( 14 , 14 , 14 , 14 ), and hence H (X) = 74 bits and H (Y ) = 2 bits. Also,
4
!
1 1 H (X|Y ) 1= p(Y = i)H (X|Y = i)1 (2.22)
= H (1/2,1/4,1/8,1/8) + H (1/4,1/2,1/8,1/8) + Hi=1 (1/4,1/4,1/4,1/4) + H(1,0,0,0)
4 4 4 " # 4" #
1 1 1 1 1 1 1 1 1 1
= H , , , + H , , ,
1 7 1 7 1 1 11 4 2 4 8 8 4 4 2 8 8
= × + × + ×2+ ×0= bits. " #
4 4 4 4 4 4 8 1
+ H
1 1 1 1
, , , +
1
H (1, 0, 0, 0) (2.23)
4 4 4 4 4 4
1 7 1 7 1 1
13 27 = × + × + × 2 + × 0 (2.24)
Similarly, we find that H(Y | X ) = bits and H(X, Y ) = bits.4 Note 4 4H(X4| Y ) ≠ H(Y
4 that 4 |X) .
8 8 11
= bits. (2.25)
8
13 27
Similarly, H (Y |X) = 8 bits and H (X, Y ) = 8 bits.
Remark Note that H (Y |X) ̸= H (X|Y ). However, H (X) − H (X|Y ) =

Relative Entropy
Definition. The relative entropy D (p(X )∥q(X )) between two probability distributions p(X ), q(X )
of a random variable X is defined as
( q(X = x) )
p(X = x)
D (p(X )∥q(X )) =
∑
p(X = x) log2 .
x∈𝒳
The relative entropy is a measure of the distance between two probability distributions.
It can be thought of as the inefficiency of assuming distribution q(X ) when the correct
distribution is p(X ) .
( )
p(X )
Note that D (p(X )∥q(X )) = 𝔼P log2 .
q(X )
In general D (p(X )∥q(X )) ≠ D (q(X )∥p(X )) .

Mutual Information
Definition. The Mutual Information I(X; Y ) between random variables X and Y is defined as
Pr[X = x, Y = y]
I(X; Y ) = D (Pr[X, Y ]∥Pr[X ]Pr[Y ]) =
∑
Pr[X = x, Y = y] log2
x∈𝒳,y∈𝒴
Pr[X = x]Pr[Y = y]
The mutual information I(X; Y ) measures the information (in bits) we receive about the random
variable X when observing the outcome of the random variable Y .
It also describes the information we receive about the random variable Y when observing the
outcome of the random variable X .
Mutual Information is also the a measure of the price for encoding (X, Y ) as a pair of independent
random variables, when in reality they are not.

Mutual Information
Definition. The Mutual Information obeys the following properties
I(X; Y ) = H(X ) − H(X | Y ) = H(Y ) − H(Y | X )
I(X; Y ) = H(X ) + H(Y ) − H(X, Y ) = I(Y; X )
I(X; X ) = H(X )
We have that I(X; Y ) ≥ 0 with equality if and only if X, Y are independent.
The Conditional Mutual Information I(X; Y | Z ) := D (Pr[X, Y | Z ] ∥ Pr[X | Z ]Pr[Y | Z ])
p(X = x | Z = z)
where D (p[X | Z ] ∥ q[X | Z ]) :=
∑ ∑
p(Z = z) p(X = x | Z = z) log2
z∈𝒵 x∈𝒳
q(X = x | Z = z)
is the conditional relative entropy.

Mutual Information
Let us derive the equation from the previous slide.
I(X; Y ) = H(X ) − H(X | Y ) = H(Y ) − H(Y | X )
The mutual information is the relative entropy between the joint distribution and the product distribution.
p(x, y)
∑
I(X; Y ) = p(x, y) log
x,y
p(x)p(y)
p(x | y)
∑
= p(x, y) log
x,y
p(x)
∑ ∑
=− p(x, y) log p(x) + p(x, y) log p(x | y)
x,y x,y
∑ ∑
=− p(x) log p(x) − − p(x, y) log p(x | y) = H(X ) − H(X | Y ) .
x x,y
Mutual Information
Thus the mutual information I(X; Y ) is the reduction in uncertainty of X due to knowledge of Y .
22 ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION
I(X; Y ) = H(X ) − H(X | Y ) .

H(X,Y )
A similar reasoning shows that I(X; Y ) = H(Y ) − H(Y | X ) .
That is, X says as much about Y as Y says about X .
H(X |Y ) I(X;Y ) H(Y |X )

Using the chain rule, I(X; Y ) = H(X ) + H(Y ) − H(X, Y )
Finally, we note that I(X; X ) = H(X ) − H(X | X ) = H(X ) .

H(X ) H(Y )
The mutual information of a random variable with itself is equal

FIGURE 2.2.toRelationship
the entropy of the
between random
entropy variable.
and mutual information.
Theto relationship
Therefore, the entropy is also sometimes referred between H (X), H (Y ), H (X, Y ), H (X|Y ), H
as the self-information.
and I (X; Y ) is expressed in a Venn diagram (Figure 2.2). Noti
the mutual information I (X; Y ) corresponds to the intersection
information in X with the information in Y .
Example 2.4.1 For the joint distribution of Example 2.2.1, it is

Perfect Secrecy in Information-Theoretic
Terms
Message and Key Entropies
∑
We have the key entropy H(K ) = − Pr[K = k] log2 Pr[K = k]
k∈𝒦
∑
and the message entropy H(M ) = − Pr[M = m] log2 Pr[M = m] .
m∈ℳ
The key entropy describes the uncertainty Eve faces regarding the unknown key a priori
and the message entropy describes the uncertainty regarding the transmitted message.
∑
The key equivocation H(K | C) = − Pr[K = k, C = c] log2 Pr[K = k | C = c]
k∈𝒦,c∈𝒞
∑
and the message equivocation H(M | C) = − Pr[M = m, C = c] log2 Pr[M = m | C = c]
m∈ℳ,c∈𝒞
describe the remaining uncertainty after Eve observes the transmitted ciphertext.
Perfect Secrecy in Information-Theoretic terms
We have that H(K | C) ≤ H(K ) and H(M | C) ≤ H(M ) .
This reflects the fact that the uncertainties never increase by knowledge of the ciphertext.
An encryption scheme is said to have perfect secrecy if I(M; C) = 0.
In a system with perfect secrecy, the plaintext and the ciphertext are independent.
When Eve observes the ciphertext, she obtains no information at all about the plaintext,
i.e., H(M | C) = H(M ) .

Perfect Secrecy in Information-Theoretic terms
Theorem. For an encryption scheme with perfect secrecy we have H(M ) ≤ H(K ) .
Proof. We have by the Chain Rule that H(K, M | C) = H(K | C) + H(M | C, K ) .
When key and ciphertext are given, the plaintext is uniquely determined since Dec
is deterministic. So that H(M | C, K ) = 0.
Since H(M | C) ≤ H(K, M | C) (the uncertainty about the joint variable K, M is at least as large
as the uncertainty about M ), we have that H(M | C) ≤ H(K, M | C) = H(K | C) .
Since H(K | C) ≤ H(K ) (conditioning cannot increase entropy), we have that H(M | C) ≤ H(K ) .
Now, by definition of perfect secrecy we have H(M | C) = H(M ) . So H(M ) ≤ H(K ) . ◼
For perfect secrecy, length of key bit string N should be ≥ entropy of the plaintext language.
Exercise: Information-Theory for Cryptography
Q . Let K, M, C be the random variables denoting the key, message and ciphertext respectively.
(a) . For a general private-key encryption scheme, is it the case that H(M | K, C) = 0? Explain your answer.
(b) . For a general private-key encryption scheme, is it the case that H(C | K, M ) = 0? Explain your answer.
(c) . In the One-Time Pad scheme with | ℳ | = | 𝒦 | = | 𝒞 | = {0,1}l, how many bits of information
about the message and key are revealed by a single ciphertext on average? Explain your answer.
Assume that the key is chosen uniformly at random, independent of the message.
A . (a) . H(M | K, C) = 0 for a general private-key encryption scheme.
If you know the ciphertext and the key then you know the plaintext.
This must hold since otherwise decryption will not work correctly.
(b) . H(C | M, K ) = 0 for deterministic encryption schemes.
H(C | M, K ) = 0 means that if you know the plaintext and key, then you know the ciphertext.
This holds when encryption is deterministic, but not for general private-key encryption schemes.
A . (c) . In the One-Time Pad, where the key is chosen uniformly from {0,1}l, we have H(K ) = l, the maximum.
To find how many bits of information are revealed about the key on average by a ciphertext,
we must compute H(K ) − H(K | C), i.e., the mutual information between key and ciphertext.
H(K | C) is the key equivocation, i.e., the amount of uncertainty about the key left
after one ciphertext is revealed.
Similarly, to find how many bits of information are revealed about the message on average by a ciphertext,
we must compute H(M ) − H(M | C), i.e., the mutual information between message and ciphertext.
A . (c) . Contd. Since the One-Time Pad is a perfectly secret scheme, it holds that
H(M ) = H(M | C) .
That is, the message equivocation is equal to the message entropy, where the message equivocation
denotes the amount of uncertainty about the message left after one ciphertext is observed.
Therefore H(M ) − H(M | C) = 0 bits of information about the message are revealed on average by
a single ciphertext.
We leave the corresponding quantity for I(K; C) as an exercise.

Channel Capacity
A communication channel is a system in which the output depends probabilistically on its input.
The channel is characterised by transition matrix with elements p(y | x) that determine the
conditional distribution of the output given the input.
For a communication channel with input X and output Y we define the capacity C by
C = max I(X; Y ) .
p(x)
The capacity is the maximum rate at which we can send information over the channel and
recover the information at the output with a vanishing probability of error.

8 Channel Capacity: Example
INTRODUCTION AND PREVIEW
1 1
2 2
3 3
4 4
FIGURE 1.4. Noisy channel.

Example. Consider the noisy channel above. In this channel, each input letter is received either
only
as thetwo
sameofletter
the inputs (1 and 3, 1/2
with probability say),or we cannext
as the tell letter
immediately from the1/2.
with probability
output which input symbol was sent. This channel then acts like the noise-
less channel of Example 1.1.3, and we can send 1 bit per transmission
If we use only inputs 1, 3 we can tell from the output which input was sent. What if we use all inputs?
over this channel with no errors. We can calculate the channel capacity
C =hasmax
This channel ) ini.e.,
I (X;1Ybit,
capacity thiswecase, and 1it bit
can send is per
equal to 1 bit per
transmission transmission,
with no errors.
in agreement with the analysis above.
In general, communication channels do not have the simple structure of
The ultimate limit on the rate of communication over a channel is given by channel capacity.
this example, so we cannot always identify a subset of the inputs to send
information
The channel without
coding theorem error.that
shows Butthis
if we
limitconsider a sequence
can be achieved ofcodes
using transmissions,
with long block length.
all channels look like this example and we can then identify a subset of the
input sequences (the codewords) that can be used to transmit information
over the channel in such a way that the sets of possible output sequences
associated with each of the codewords are approximately disjoint. We can

COMP3357 - 2023-Lec 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

COMP3357 - 2023-Lec 3

Uploaded by

Copyright:

Available Formats

CRYPTOGRAPHY (COMP3357)

TA: Chengru Zhang

Department of Computer Science, HKU

Lecture 3 - Information-Theory. Perfect Secrecy in Information-Theoretic Terms.

Email: ravi at cs dot hku dot hk

and a ciphertext c such that

Take any ciphertext c . Consider the set ℳ(c) = {Deck(c)} .

Therefore, there is some m* ∈ ℳ such that m* ∉ ℳ(c) .

In other words, we have Pr[M = m* | C = c] = 0.

By contraposition, we have that | 𝒦 | ≥ | ℳ | for every perfectly secret encryption scheme. ◼

ℳ for which | ℳ | = | 𝒦 | = | 𝒞 | . The scheme is perfectly secret if and only if

1. Every key k ∈ 𝒦 is chosen with equal probability 1/ | 𝒦 | by the Gen algorithm.

uniformly at random, and for every m, c there is a unique key k = m ⊕ c mapping m to c .

Perfect Secrecy ⟹ Conditions 1 and 2.

As otherwise Pr[M = m | C = c] = 0 ≠ Pr[M = m] .

But since for every k ∈ 𝒦, Enck(m) ∈ 𝒞, it also holds that {Enck(m)}k∈𝒦 ≤ | 𝒞 | .

Therefore, we have {Enck(m)}k∈𝒦 = | 𝒞 | which in turn equals | 𝒦 | .

there is at most one k ∈ 𝒦 such that Enck(m) = c, which is Condition 2.

Pr[C = c | M = mi] ⋅ Pr[M = mi]

Pr[K = ki] ⋅ Pr[M = mi]

which is an equivalent form of the Perfect Secrecy criterion. ◼

Suppose Enck(m) := (k + m) mod 5, and Deck(c) := (c − k) mod 5.

Is this Encryption Scheme perfectly secure?

Q2. Consider a private-key encryption scheme with message space ℳ := {a, b, c}

The table is to be read as saying that Enck1(a) = 1, Enck3(b) = 3 and so on.

Is this scheme perfectly secret?

A2. Pr[C = 2] = Pr[C = 2 | M = a] ⋅ Pr[M = a] + Pr[C = 2 | M = b] ⋅ Pr[M = b] + Pr[C = 2 | M = c] ⋅ Pr[M = c]

= Pr[K = k2] ⋅ Pr[M = a] + Pr[K = k1] ⋅ Pr[M = b] + Pr[K = k3] ⋅ Pr[M = c]

Pr[C = 2 | M = b] ⋅ Pr[M = b] Pr[K = k1] ⋅ Pr[M = b]

Since Pr[M = b | C = 2] ≠ Pr[M = b], the scheme is not perfectly secret. ◼

Entropy is a measure of the uncertainty of a random variable and is expressed in bits.

We use the convention that 0 log2 0 = 0.

The entropy of the horse race is calculated to be H(X ) = 2 bits.

If the answer is no, an efficient second question is: Is X = b?

If the answer is no, an efficient third question is: Is X = c?

lies between H(X ) and H(X ) + 1.

Definition. The Conditional Entropy H(X | Y ) of a discrete random variable X conditioned on

another random variable Y is defined as

The conditional entropy is equivalently stated as H(X | Y ) = 𝔼 (−log2 Pr[X | Y ]) .

The following Chain Rule holds

of the joint random variable (X, Y ) .

Example. Let (X, Y ) have the following joint distribution

The marginal distribution of X is ( 12 , 14 , 18 , 18 ) and the marginal distribution

Remark Note that H (Y |X) ̸= H (X|Y ). However, H (X) − H (X|Y ) =

Example. Let (X, Y ) have the following joint distribution

Remark Note that H (Y |X) ̸= H (X|Y ). However, H (X) − H (X|Y ) =

In general D (p(X )∥q(X )) ≠ D (q(X )∥p(X )) .

variable X when observing the outcome of the random variable Y .

outcome of the random variable X .

random variables, when in reality they are not.

I(X; Y ) = H(X ) − H(X | Y ) = H(Y ) − H(Y | X )

I(X; Y ) = H(X ) + H(Y ) − H(X, Y ) = I(Y; X )

We have that I(X; Y ) ≥ 0 with equality if and only if X, Y are independent.

The Conditional Mutual Information I(X; Y | Z ) := D (Pr[X, Y | Z ] ∥ Pr[X | Z ]Pr[Y | Z ])

is the conditional relative entropy.

I(X; Y ) = H(X ) − H(X | Y ) = H(Y ) − H(Y | X )

I(X; Y ) = H(X ) − H(X | Y ) .

A similar reasoning shows that I(X; Y ) = H(Y ) − H(Y | X ) .

That is, X says as much about Y as Y says about X .

H(X |Y ) I(X;Y ) H(Y |X )

Finally, we note that I(X; X ) = H(X ) − H(X | X ) = H(X ) .

The mutual information of a random variable with itself is equal

Example 2.4.1 For the joint distribution of Example 2.2.1, it is