Professional Documents
Culture Documents
Entropy
1 INTRODUCTION
Encryption Algorithm (3DES), and Blowfish. Then, all the files encrypted are
sent into network traffic and sniffing it through network sniffing tools and finding
the entropy values. With those range of values we can identify the encryption
algorithms.
It is assumed that alice and bob are sharing messages in a public network,
with one portion of the message being encrypted and the other unencrypted.
Further, another user, eve (i.e. the adversary), can passively intercept alice and
bob’s communication. This method is based on the attacker’s point of view.
When an attacker eavesdrops on a conversation between two valid users, the
attacker can record the files in the conversation and attempt to decipher the
message by analysing the data. If the data is encrypted using the cypher text
format and string values, the attacker can figure out the encryption technique
for such files and then decrypt them to acquire the data. Twenty different files
of same size are used here with more than 100 words of data and encrypted in
four different encryption algorithms AES, 3DES, RC4 and Blowfish. Encryptions
are done using python programming. Here, the main motive is to analyze the
encrypted files and classify the encryption algorithms.
Entropy is a metric for measuring randomness. This notion originated in
thermodynamics, but it was adapted to digital communications by Claude E.
Shannon in 1948 [1]. Shannon was interested in figuring out how much a digital
file could theoretically be compressed. Using the entropy values it can identified
as encrypted or non-encrypted traffic [18].
A file is compressed simply by replacing more extended patterns of bits with
shorter patterns of bits. As a result, the more entropy in a data file, the less
compressed it may be. Determining a file’s entropy [8] might also help to figure
out if it’s likely to be encrypted. There are formal proofs in the discipline of
cryptography that indicate that if an adversary can accurately differentiate an
encrypted file from a genuinely random file with a better than 50% probability,
then he has ”the advantage”. The opponent can then take advantage of this ad-
vantage and crack the encryption. The mathematical examination of encryption
schemes uses this concept of benefit. In the actual world, however, files contain-
ing random data are useless in a file system. As a result, files with high entropy
are very likely to be encrypted or compressed.
There are many research papers based on with machine learning techniques
[11] used to identify the classification of traffics [21] and traffics based on appli-
cation usages [19] and traffics based on user actions [20] all these works are on
finding the encryption algorithms but with different methods.
2 RELATED WORKS
3 METHODOLOGY
At first, twenty different files with plain text data is created in a text file
format. Then files are encrypted using four encryption algorithms. All the en-
cryptions are done using python programming. There’s a research paper for
Entropy model for symmetric key cryptography [15].
Finally, all twenty files are encrypted in all four encryption methods, and Now
twenty encrypted files are obtained. The proposed methodology flow diagram is
shown in figure 1. After collecting all of the encrypted files, files are sent into
network traffic, where all other shared network traffics are found. Those obtained
files have been sniffed using any network sniffing tools at hand. Here Wireshark is
used for sniffing the file (Figure 2), but it does not give the actual file. Therefore,
there will be data modifications, and they will not be identical to the original
files. Also, now it will have few more characters added to the original data.
algorithms as each encryption type will have different values. The Shannon en-
tropy values of all the files are calculated and listed in the [1,2,3,4] Shannon
entropy can
P be calculated using the below formula
H(X) = - p(X) log p(X)
It is clearly shown that the same range of values is to be encrypted using the
same procedures. Shannon’s entropy is a measure of how much information is
stored in data. Entropy measures how widely the data are dispersed throughout
all potential values. Thus, an increase in entropy value means that the data is
spread out as widely as possible. In contrast, a decreased entropy value implies
that information is practically all concentrated on one value. Entropy estimation
for real time traffic is discussed in [16].
In this paper, real-time segregation of encrypted data was done using entropy
as a measure. Twenty different files are created and encrypted using python
Analysis on Encrypted Files to Classify Encryption Algorithms Using Entropy 7
programming. Finding an algorithm with the same files created will have no
relevance, so those files are sent into a network where all common traffics are
shared. Using a packet sniffing tool Wireshark, all the twenty encrypted files are
captured, and Shannon entropy values are calculated. Here Python programming
language is used for calculating the Shannon entropy value for the files. The files
encrypted with the same algorithm have entropy values found to be almost
similar or can fall in the same range. We have classified each algorithm into a
particular range of values. The range of entropy values are represented in the
table: [5]. There’s also way to classifying encryption algorithms using pattern
recognition [17].
From these calculated entropy values, it is found that each encryption algo-
rithm has some range of values. Hence, if a file’s entropy is in the 7.66 -7.74
range, it will be in the AES encryption method. On the other hand, the various
encryption techniques are categorised according to their range, and the Shan-
non entropy value of the files is confirmed before encryption. It was all ranged
between 4.3 - 4.8 values tab: [6,7]. So it clearly shows encrypted files entropy
values are more than files that are not encrypted. Usually, an entropy value of
an encrypted file is near 8 here it goes between 6 - 8 values.
References
11. Cha, Seunghun, and Hyoungshick Kim. “Detecting encrypted traffic: a machine
learning approach”. International Workshop on Information Security Applications.
Springer, Cham, 2016.
12. Vinayakumar, R., K. P. Soman, and Prabaharan Poornachandran. ”Secure shell
(ssh) traffic analysis with flow based features using shallow and deep networks.”
2017 International Conference on Advances in Computing, Communications and
Informatics (ICACCI). IEEE, 2017.
13. Mukundan, Puliparambil Megha, et al. “Hash-One: a lightweight cryptographic
hash function”. IET Information Security 10.5 (2016): 225-231.
14. Krishnan, Lekshmi R., M. Sindhu, and Chungath Srinivasan. “Analysis of sponge
function based authenticated encryption schemes”. 2017 4th International Confer-
ence on Advanced Computing and Communication Systems (ICACCS). IEEE, 2017.
15. Othman, Hiba, Youssef Hassoun, and Michel Owayjan. “Entropy model for sym-
metric key cryptography algorithms based on numerical methods” .2015 Inter-
national Conference on Applied Research in Computer Science and Engineering
(ICAR). IEEE, 2015.
16. Dorfinger, Peter, Georg Panholzer, and Wolfgang John. “Entropy estimation for
real-time encrypted traffic identification”. International workshop on traffic moni-
toring and analysis. Springer, Berlin, Heidelberg, 2011.
17. Sharif, Suhaila O., L. I. Kuncheva, and S. P. Mansoor. “Classifying encryption al-
gorithms using pattern recognition techniques”.2010 IEEE International Conference
on Information Theory and Information Security. IEEE, 2010.
18. Tang, Zhengzhi, Xuewen Zeng, and Yiqiang Sheng. “Entropy-based feature extrac-
tion algorithm for encrypted and non-encrypted compressed traffic classification”.
International Journal of ICIC 15.3 (2019): 845.
19. Peter Dorfinger, Georg Panholzer, Brian Trammell, and Teresa Pepe. “Entropy-
based traffic filtering to support real-time skype detection”. In Proceedings of the
6th International Wireless Communications and Mobile Computing Conference,
pages 747–751. ACM, 2010.
20. M. Conti, L. V. Mancini, R. Spolaor, and N. V. Verde. “Analyzing android en-
crypted network traffic to identify user actions”. IEEE Transactions on Information
Forensics and Security, 11(1):114–125, Jan 2016.
21. Jeffrey Erman, Anirban Mahanti, Martin Arlitt, Ira Cohen, and Carey
Williamson.“ Offline/realtime traffic classification using semi-supervised learning”.
Perform. Eval., 64(9-12):1194–1213, October 2007.