You are on page 1of 23

PART A

Q1. A histogram showing in the graphical form the relative frequency of letters in all four prepared
long texts, as well, as the textual listings of 26 most frequent bigrams and trigrams for three
relatively long English texts (each of the size of 15,000 or more letters) taken from a novel or
story, news report and technical report(bitcoin), respectively, and a spanish report on bitcoin has
been selected, and the comparison is shown below:

Fig 1: Relative frequency, most frequent bigrams and trigrams for novel (Sherlock Holmes)
Fig 2: Relative frequency, most frequent bigrams and trigrams for news article(BBC)
Fig 3: Relative frequency, most frequent bigrams and trigrams for tech article (Bitcoin)
Fig 4: Relative frequency, most frequent bigrams and trigrams for tech article in Spanish
(Bitcoin)
The findings from the resective analysis are given below:
● The frequency distributions of letters can depend slightly on the type of text in English.
Different types of text, such as scientific papers, news articles, and literary works, may
have distinct patterns of letter usage. For example, in scientific papers, there may be a
higher frequency of technical terms and abbreviations, which can skew the frequency
distribution towards certain letters. In contrast, literary works may have a higher frequency
of certain vowels or consonants, depending on the author's writing style or the language
used.
● It is important to note that while there may be some variations in the frequency distributions
of letters in different types of English text, there are still overall patterns and trends that
can be observed. For example, on average, the letters E, A, and R are consistently among
the most frequently used letters in the English language, regardless of the type of text. But
in our case, E, T A, O and E, T, A, I became the most frequent for the novel and the news
articles. On the other hand, E, T, O, A was the most frequent letter occurring in the article
about bitcoin.
● Also, the frequency distributions of letters can depend significantly on the language in
which the message was written. Different languages have different letter frequencies, and
therefore, the frequency distribution of letters in a message will be influenced by the
language in which it was written. For example, in English, the most common letters are E,
T, A, O, and I, whereas in Spanish, the most common letters are E, A, O, S, and N.
Therefore, if a message was written in English, its frequency distribution of letters would
be different from that of a message written in Spanish, even if they were the same length
and contained the same letters.

Q2. The five most frequent letters in the text, for first 500, 1K, 2K, 5K, 10K, and 15K letters, are
given below:
(i) 500 letters (ii) 1000 letters

(iii) 2000 letters (iv) 5000 letters

(v) 10000 letters (vi) 15000 letters

Fig 5: Top 5 most frequent letters for novel (Sherlock Holmes)


Fig 6: The estimated probabilities against the number of letters for the five most probable letters.
From the plot, we can observe that the estimated probabilities for each of the five letters stabilize
as the number of letters in the sample increases. The letter "e" is the most frequent letter in the
text, and its estimated probability is relatively constant across all sample sizes. The other four
letters (t, a, o, and i) have estimated probabilities that are lower than "e," but still relatively high
compared to the other letters in the alphabet.

Q3.

(i) 10 letters (ii) 20 letters


(iii) 50 letters (iv) 100 letters

(v) 200 letters (vi) 500 letters

(vii) 1000 letters (viii) 2000 letters

(ix) 5000 letters (x) 10000 letters

(xi) 15000 letters


Observed characteristics:
● Gradual increment of entropy with respect to the increase of letters.
● In the beginning, the entropy was less as the number of letters were less.
Q4.

(i) Single letters (ii)Digrams


(iii)Trigrams
Fig : Frequency distribution of the 10 most frequent single letters, diagrams and trigrams for
plaintext.

(i) Single letters (ii)Digrams

(iii)Trigrams
Fig : Frequency distribution of the 10 most frequent single letters, diagrams and trigrams for
Vigenere.

The Vigenere cipher is a polyalphabetic substitution cipher that uses a keyword to encrypt the
plaintext. The characteristic feature of the frequency distributions of letters after the Vigenere
cipher is that the frequency distribution of letters is masked due to the repeated use of different
Caesar shift ciphers (each shift determined by the letter of the keyword).
To determine which cipher was used to obtain the given ciphertext, one could use frequency
analysis on the ciphertext to estimate the period of the Vigenere cipher. The period is the length of
the keyword used to encrypt the plaintext. Once the period is known, one can apply frequency
analysis on every nth letter in the ciphertext (where n is the period) to obtain the frequency
distribution of letters for each Caesar shift cipher. The characteristic features of the frequency
distributions of letters for each Caesar shift cipher can then be compared to the characteristic
features of the frequency distributions of letters for each cipher in order to determine which cipher
was used to obtain the given ciphertext.
(i) Single letters (ii)Digrams

(iii) Trigrams
Fig : Frequency distribution of the 10 most frequent single letters, diagrams and trigrams for Hill
text.

Hill cipher is a polygraphic substitution cipher that operates on blocks of letters rather than
individual letters. Therefore, the frequency distribution of letters in the ciphertext after Hill cipher
does not have the same characteristic features as simple substitution ciphers like Vigenere cipher.
Instead, Hill cipher uses matrix operations to encrypt blocks of letters; here, P, Q, V, T were used,
which can introduce patterns and correlations in the ciphertext. One possible approach to analyzing
the ciphertext after the Hill cipher is to examine the correlation between adjacent letters, which
can reveal the size and structure of the key matrix used in the encryption process. Another possible
approach is to use statistical analysis techniques such as the chi-squared test or the index of
coincidence to identify the degree of randomness in the ciphertext. If the ciphertext exhibits a high
degree of randomness, it is likely that Hill cipher was used, as this cipher can produce ciphertext
that appears almost random.

(i) Single letters (ii)Digrams


(iii) Trigrams
Fig : Frequency distribution of the 10 most frequent single letters, diagrams and trigrams for
Substitution ciphertext..
After a substitution cipher, the frequency distribution of letters is likely to be significantly altered.
The most common letters in the plaintext may be replaced by less common letters in the ciphertext,
and vice versa. However, there may still be some discernible patterns in the frequency distribution
that can be used to determine which cipher was used.
One characteristic feature of the frequency distribution after a substitution cipher is that the
frequency of certain letters is likely to be higher than others. For example, in English, the letters
E, T, A, O, and I are the most commonly used letters. In a ciphertext produced by a substitution
cipher, these letters may still be more frequent than others, even if they have been replaced by
different letters.
Another characteristic feature is the presence of repeating patterns in the ciphertext. For example,
if the same plaintext letter is replaced by the same ciphertext letter throughout the text, this will
result in a repeating pattern in the ciphertext. Such patterns can be used to identify which letters
have been substituted and can help to reconstruct the original plaintext.
To determine which cipher was used to obtain the given ciphertext, one can compare the frequency
distribution of letters in the ciphertext to the expected frequency distribution for English or other
languages. If there are significant differences, this may indicate that a substitution cipher has been
used. One can then apply frequency analysis and other cryptanalytic techniques to try to determine
the specific substitution cipher that was used.

Q5.
Given the plaintext, the unicity distance depends on the chosen encryption algorithm and key
length.We used a simple substitution cipher with a key length of 5, then the unicity distance can
be calculated as follows:
The English language has an entropy of approximately 4.7 bits per character.
If we use a key length of 5, then there are 26^5 possible keys, or approximately 12.3
million.
The unicity distance is the length of text at which there is only one possible key that could
have been used to encrypt the message.
To calculate the unicity distance, we can use the formula:
unicity distance = log2(N) / H
where N is the number of possible keys and H is the entropy per character of the plaintext.
Plugging in the numbers, we get:
unicity distance = log2(1230186) / 4.09 = 14.01
Therefore, if we encrypt the plaintext using a simple substitution cipher with a key length of 5,
then we would need at least 15 characters of ciphertext to have a reasonable probability of
determining the key.

Q6. Applying Caesar cipher decryption on ciphertext 1,2,3,4 we got our plaintext from Ciphertext
3.

Fig: Plaintext from Ciphertext3 (Derived Caesar Key = G)


Similarly, Vigenere decryption can detect the plaintext from Ciphertex 3 as well (Key =
HHHHHH).

The Hill decryption was achieved from Ciphertext 2. The keys are given below:
Fig: Hill decryption can detect Ciphertext 2.
The Ciphertext 4 can be decrypted by substitution. The key is ZOXWVUCARBYNTSQPMLKJ-
IDGHEF

Fig: Substitution decryption can detect Ciphertext 4.


Finally, the Ciphertext 5 can be decrypted by playfair.

Q7. Steps for analysis:


● Firstly, among all the ciphertexts, cipher 3 can be decrypted successfully using Caesar
cipher decryption (Key = G) using Cryptool. Then after analyzing the Ciphertex 3 by
Vigenere, plaintext was obtained
● Then, Ciphertext 2 was analyzed by hill decryption, using known plaintext attack.
● Finally, The Ciphertext 4 was decrypted by substitution. The key is
ZOXWVUCARBYNTSQPMLKJIDGHEF

Q8. By performing XOR operation on the plaintext and ciphertext, we get -

S XOR W = 100 XOR 011 = 111 = A


U XOR I = 010 XOR 000 = 010 = U
S XOR O = 100 XOR 001 = 101 = E
A XOR E = 111 XOR 101 = 010 = U
N XOR S = 110 XOR 100 = 010= U

So, the encryption key is AUEUU.

Q9.
I XOR U =U XOR I = 010 XOR 000 = 010 = U
E XOR W = 101 XOR 011 = 110 = N
So the key is U- - N
So, first message = I - - N
Second message = U - - E

Q10. In part A of the practical work, we explored various classical ciphers (Caesar, Vigenere, Hill,
Substitution, and Playfair) and analyzed their weaknesses in terms of their susceptibility to
cryptanalysis. We also looked at how entropy and unicity distance are calculated and how they can
be used to evaluate the strength of a given cipher.
From our analysis, we can conclude that classical ciphers are generally weak and can be easily
broken with known plaintext attacks or other cryptanalysis techniques. The Caesar cipher, for
example, is particularly weak due to its simplicity and lack of key complexity. The Vigenere
cipher, although more complex than the Caesar cipher, is still vulnerable to cryptanalysis
techniques such as frequency analysis. The Hill cipher is stronger than the previous two ciphers
but can still be broken if the key matrix is known. The Substitution cipher is vulnerable to
frequency analysis, and the Playfair cipher can be broken with known plaintext attacks or by
analyzing the bigrams in the ciphertext.
Furthermore, we can see that the unicity distance is an important factor in determining the strength
of a given cipher, as it provides an estimate of the amount of ciphertext required to uniquely
determine the key. A cipher with a low unicity distance is more vulnerable to cryptanalysis than a
cipher with a high unicity distance.
Overall, the practical work has demonstrated the importance of evaluating the strength of a given
cipher and the value of known plaintext in cryptanalysis. It has also highlighted the weaknesses of
classical ciphers and the need for more advanced encryption techniques to ensure secure
communication.

PART B
Q1.

Fig: Encryption and decryption via DES(ECB) where key = 00 00 00 00 00 00 00 00


Q2.

Fig: Encryption and decryption via DES(CBC) where key = 00 00 00 00 00 00 00 00

Plaintext: Open a new file, type a plaintext message and save the file.
DES(ECB): )©!î’„%vê ¦åȲá4¾*jIàÕ~OÌ›Vÿ¯Ý×I‡OO3Žï%F}v%Œ‘•=qÐ|aCåùƒkB^
DES(CBC): )©!î’„%¤f4°o€±NWÊrypÕÊåÆþÙJŒ<Ù«Äá†MªŒün‹¯Ïß4)EOs6:õ[gn˜½S¹&
The ciphertexts are different, despite being encrypted by the same (DES) method.

Q3.
Fig: Encryption and decryption via DES(ECB) where key = 00 00 00 00 00 00 00 00
Q4.

Fig: Encryption and decryption via DES(CBC) where key = 00 00 00 00 00 00 00 00

Q5. The Data Encryption Standard (DES) is a symmetric key block cipher that uses a Feistel
network structure to encrypt data. The encryption process consists of 16 rounds, each of which
involves a series of operations on the 64-bit plaintext block and a 64-bit key block.
The input plaintext block is first permuted using the initial permutation (IP) matrix. The resulting
block is divided into two halves, left and right, each of 32 bits. In each round, the right half is
expanded to 48 bits using an expansion permutation (E), and then combined with a subkey
generated from the main key using a key schedule algorithm. The resulting 48-bit block is then fed
into the S-boxes, which replace it with a 32-bit block based on the values in the box.
The 32-bit block is then subjected to a permutation known as the P-box permutation, which
rearranges the bits. The left half is then XORed with the output of the P-box permutation, and the
resulting block is swapped with the right half to form the new input block for the next round.
After the 16 rounds are completed, the output block is subjected to a final permutation (IP^-1) to
produce the ciphertext.
Fig: Encryption and decryption via DES
The DES encryption process is reversible, meaning that the ciphertext can be decrypted using the
same key and algorithm by applying the operations in reverse order.

Q6. The main differences between AES and DES encryption are:

● Key size: AES has a key size options of 128, 192, or 256 bits, while DES has a fixed key
size of 56 bits.
● Security: AES is considered more secure than DES because of its larger key size and more
complex encryption algorithm.
● Speed: DES is generally faster than AES in hardware implementations due to its simpler
algorithm, but AES can be faster in software implementations because it has a simpler key
schedule.

Q7. The total size of the S-boxes in DES is 3284 = 1024 bits, as there are 8 S-boxes each containing
4 rows of 16 elements of 4 bits each.

In contrast, the total size of the S-boxes in AES is 16*256 = 4096 bits, as there is a single S-box
that operates on 16 bytes at a time, each containing 256 elements of 8 bits each.

Therefore, the total size of the S-boxes in AES is four times larger than that of DES, which is one
of the reasons why AES is considered to be more secure than DES. The larger S-box size of AES
provides a greater level of non-linearity, which makes it more resistant to cryptanalytic attacks.

Q8.
Fig: Encryption via DES and AES
By decrypting the ciphertexts, we get-

Fig: Decryption via DES (ECB).

Q9. Prime module = 448828247752399623229538938095521367599


Generator = 165241684265116747181316017919853165086
Alice’s Secret = 133898785863340768314517820567391863323
Bob’s Secret = 53086240634235269280638285728852127659

By calculating A=g^a mod p


Shared key : Alice = 256850123833353658890851835333721211228
Bob = 441672826309949745683399859388626165919

Fig : Diffie-Hellman key demonstration

Q10. In the Diffie-Hellman key exchange protocol, the exchanged values YA and YB are used to
compute a shared secret key by both Alice and Bob. In this scenario, the attacker wants to
establish a single Diffie-Hellman key, KABM = g^XAXBXM mod p, that the attacker, Alice and
Bob all share. To accomplish this, the attacker needs to compute the value of XA and XB.
One possible way for the attacker to do this is by performing a man-in-the-middle (MITM)
attack on the Diffie-Hellman key exchange. The attacker intercepts the exchanged values YA
and YB and sends their own values to Alice and Bob respectively, which are YAM = g^XM mod
p and YBM = g^XBXM mod p.
Now Alice and Bob compute the shared secret key as KA = YB^XA mod p = (g^XBXM mod
p)^XA mod p = g^(XAXBXM) mod p, and KB = YA^XB mod p = (g^XA mod p)^XB mod p =
g^(XBXAXM) mod p, respectively.
However, the attacker computes the shared secret key as KAM = YBM^XA mod p = (g^XBXM
mod p)^XA mod p = g^(XAXBXM) mod p, and KBM = YAM^XB mod p = (g^XM mod p)^XB
mod p = g^(XBXAM) mod p, respectively.
Thus, the attacker can compute the shared secret key KABM as KABM = g^XAXBXM mod p =
KAM^XB mod p = KBM^XA mod p, which is the same shared secret key as that computed by
Alice and Bob. Now, the attacker, Alice and Bob can all share the same secret key.
It is important to note that this attack is possible because the attacker can intercept the exchanged
values and perform a Man In The Middle attack. To prevent this attack, the Diffie-Hellman key
exchange protocol can be augmented with authentication mechanisms such as digital signatures
or message authentication codes.

Q11.

M1 =

M2 =

M3 =

M4 =

M5 =

M6 =

Q16. No, the CBC and ECB modes of encryption do not ensure data integrity on their own. These
modes of encryption only provide confidentiality and do not include any integrity checks. An
attacker can modify the ciphertext, and the modified message can still be successfully decrypted
without any indication that it has been tampered with.
Q19. Let,
p = 503, q = 997, and e = 65537
These values are all prime, and e is coprime to both p-1 and q-1.

The value of N is calculated by,


N = p * q = 501491

The value of Ø(N) is calculated as,


(p-1) * (q-1): Ø(N) = 499992.

Q20.

Fig : RSA encryption

Fig : RSA decryption

Q22. The ciphertext is 134635.


The size of ciphertext < N

Q23. The security of the RSA modulus depends on the size of the modulus, which is determined
by the bit length of the prime factors p and q used to compute the modulus. Generally, the larger
the size of the modulus, the more secure the RSA encryption. The security of RSA also depends
on the strength of the random number generator used to generate the prime factors.
Let N = 4960345275737677027

P = 853759537 and Q = 5810002771

Ø(N) = 4960345269073914720
E = 65537
D = 3730571401916233313

Plaintext = 12345623456
Ciphertext = 3913838836646879578
Now this ciphertext is quite large so security of RSA can be ensured.

Q29.In Part B of the practical work, we explored and reinforced our understanding of various
symmetric and asymmetric key cryptographic algorithms, as well as their modes of operation.
We learned about the DES symmetric key cipher, its history, design, and implementation. We also
explored the AES symmetric key cipher, its features, and how it improves upon the DES cipher.
We compared and contrasted the two ciphers, highlighting their strengths and weaknesses.
Furthermore, we examined the ECB and CBC modes of operation of a symmetric block cipher and
how they are vulnerable to attacks. We learned about the importance of using more secure modes
of operation such as CTR and GCM.
We also reinforced our understanding of the Diffie-Hellman key exchange algorithm, which allows
two parties to establish a shared secret key over an insecure channel. We learned how it works and
how it provides perfect forward secrecy.
Lastly, we reinforced our understanding of the RSA public key cipher, which is widely used for
secure communication and digital signatures. We learned how it works, the importance of choosing
large prime numbers for its key generation, and the significance of its modulus in determining its
security.
Overall, we gained a better understanding of various cryptographic algorithms and their modes of
operation, which will be useful in designing and implementing secure communication systems.

You might also like