Professional Documents
Culture Documents
Abstract—Advent of the Information Age and Internet of of both, only cryptographic hash functions are used to ensure
Things has lead to an unprecedented increase in flow of data. This integrity of data. For this two algorithms are prevalent in
increase in data-flow has been matched with increase in computa- usage SHA and MD5 [1]. Several generations of the Secure
tional power. However, low-end devices, usually implemented as
a cost-saving alternative, have not been able to keep up with the Hashing Algorithms have been developed and are widely
increasing security and error detection & correction demands used. Internally SHA-3 is quite different from SHA-1 and
of MD5 or SHA1/2/3 algorithms. We present the performance SHA-2, which makes it more suited to be implemented on
characteristics of modified versions of MD5 message-digest algo- hardware, although it takes more computation time in software
rithm which have been altered using two different approaches, [2]. However due to the recent publication of SHA-3 (2015),
reducing lateral register widths or reducing computation cycles.
The metrics for comparison are collisions, synthesis reports and there hasn’t been wide-spread development in this algorithm.
time required for message-digest calculation. Thus, SHA-2 has been widely implemented as a software
Index Terms—Internet of Things, IoT Edge Devices Security, data verification tool, due to it’s software based improved
Low End IoT Edge Devices, Modified MD5 Algorithm, Modified performance, which leaves IoT hardware applications to be
Message-Digest. handled solely by MD5 algorithm.
I. I NTRODUCTION To further justify the choice of MD5 algorithm, we take
into account the low memory and power IoT edge devices, our
For wireless communication between devices, cryptographic use case. This leads to cryptographic hash algorithms being
algorithms are widely used to monitor or assist communica- disregarded due to the target implementation of data integrity
tion. This is done by the algorithms playing a major role in verification. Furthermore, for the constant bit-rate data streams
verification of integrity of data, which can get corrupted on any associated with the IoT edge device, a fixed length output hash
of the many stages of transmissions, owing to internal fault or algorithm is appropriate. The MD5 algorithm satisfies both of
external interference. For Internet of Things (IoT) applications, these conditions, as it has been cryptographically broken [3]
where data is streamed continuously from various sources, the and should not be considered a cryptographic hash algorithm.
data is monitored real-time. This is accomplished by utilizing
Application Specific Integrated Circuits (ASIC) or Graphics The MD5 message-digest algorithm was primarily made
Processing Units (GPU). All these hardware options are used to differentiate between two messages using their respec-
to generate a hash for a given message signal, while the hash tive message-digest. This was achieved by making the same
itself assists with data analysis and decision making during message-digest output improbable to be produced by two
erroneous data transmission and reception. slightly differing messages [4]. This property has been used
There are several algorithms that have been made for this as a error-detecting and security mechanism in various forms
specific purpose, like Secure Hashing Algorithm (SHA) and of communications. It has 4 stages with 64 iterations total
Mesage-Digest 4 (MD4). These were designed with 32-bit at it’s core [4], and then further trivial functions to achieve
processors in focus, as they require 32-bit variables and the properties the message-digest possesses. By modifying the
32-bit logical operations. However, as greater progress was MD5 architecture, we aim to make it more feasible to be
made in computing these algorithms increased in complexity, implemented as hardware in low power and memory devices,
like SHA-256 and MD5, which ensure better integrity in like IoT edge devices, to add the functionality of data integrity
transmission. verification.
Depending on the particular IoT application, the tolerance of We will first discuss the Message-Digest cryptographic
erroneous data changes, resulting in requirement of verification algorithms, Their variations and working, followed by how this
of data integrity dependent on application. When a non- algorithm is implemented in verilog, subsequently showing
tolerant IoT application has various sources of data, sources how this architecture can be scaled down. Ultimately, perfor-
who don’t have enough resources to commit to Encryption mance comparison will be done on the metrics of collisions,
along with integrity verification, a trade-off is made. Instead synthesis reports and time required.
This algorithm was developed by Ronald Rivest in 1990, Thus, the Look-up Table for i has rearranged values of i,
although it was published in 1992 [9]. The main contribution which are fed to M, to choose a input message chunk, while
of the MD4 algorithm in this context is to be a significant normal iterating values of i are used in Look-up Tables for k &
influence on later designs like SHA-1 and MD5. By this virtue S to find out the predetermined pseudo-random additions and
the MD4 algorithm is not discussed further here. the shift values to be used in the four MD5 blocks, namely A,
B, C & D.
B. MD5
IV. V ERILOG I MPLEMENTATION
Designed as a replacement for MD4 algorithm by Ronald
Rivest [4], producing a 128-bit hash from a 512-bit message. A. Original
A longer message than 448-bits is broken into different pieces The unedited verilog model [10] consists of multiple tiers
of 448-bit length. If the message length is not a multiple of of blocks all of which contribute individually to the func-
448, then it is padded with 1 followed by zeros until it reaches tioning of the MD5 algorithm. These blocks are md5 ctl,
A. Architecture ues to relationships and impact of original shift values (S32 [i]).
The relevant kLUT and sLUT are discussed changed ac- Thus, negative impact to the message-digest is expected here
cordingly. The values for kLUT are calculated by, (32-bit block to 4-bit block) more than 32-bit block to 8-bit
block.
K[i] = 232 × | sin(i + 1)| (6)
B. Iteration
In Eq.6, all i are normal iterating constants, taken as radians
for sin, thus giving predetermined pseudo-random values. The The relevant look-up table, iLUT, is discussed here, along
K[i] here is a 32-bit variable, changing which into 8-bit with feeding and computation characteristics, as they are also
during definitions transforms all random addition values to 8- affected by iteration scale down.
bit without problems. Further, for sLUT, the shift values are in In 64 iterations cycle (×64), the four functions , F, G, H
range 5 to 23, which can’t be mapped straight to the range of 1 & I, each get 16 iterations, into which 16 message chunks,
to 8 without compromising the extent of properties possessed are fed in accordance to order outlined in iLUT. In iLUT, the
by the hash algorithm as a whole. Initially following equation values are given according to Eq.5. For scaling these values
maps the 5 to 23 range values (for 32-bit blocks) to 0 to 8 down to 16 iterations (×16), we can not simply use I×16 [i] =
shift values. I×64 [i] mod 4 as the number of values present in iLUT×64 is
S8 [i] = S32 [i] mod 8 (7) 64, while for iLUT×16 is 16. Not only that, as both have to
chose from 16 message chunks, they have to have all values
In Eq.7, S8 [i] signifies all sLUT values for 8-bit blocks, ranging from 1 to 16. iLUT×16 must have all values from 1
while S32 [i] signify all shift values for 32-bit blocks. Due to to 16, only in a different order than ascending. This order can
non-ideal mapping in Eq.7, some shift values come out as 0 & be similar to the one described for iLUT×64 , where instead of
8 which both mean no shift at all, and some shift values come 4 groups 16 values each, we take 2 groups, one of 4 values,
out to be same for previous & next cycle, which disqualifies and the other consisting of 12 values. This can be achieved in
this method from being applied. To devise the shift values the following manner.
for sLUT the original shift values are studied, to find their
relation to next shift values, and their effect on the blocks. i 1≤i≤4
iLU T×16 = (9)
After understanding the relationships between shift values and (5 × i) mod 12 + 5 5 ≤ i ≤ 16
their impact, values for S8 [i] emulating similar relationships
and having similarr impact can be determined. In Eq.9, we can use any one of the 3 remaining equations
The original sLUT has 4 values which repeat 4 times for one from Eq.5. Any of these will give satisfactory randomness
of 4 functions (F, G, H & I). Thus each of the 4 functions has to the values, while exhausting all the values ranging from
a set of 4 values that repeat 4 times to complete 16 iterations. 5 to 12. Although this ×16 equation will not give the same
These 4 values (for each of the 4 functions) are assigned from message-digest characteristics as ×64, it is used for lack of
one quarter of the range 1 to 32, except the last value, which other options.
belongs to the same range as the third shift value. The first Moreover, as stated earlier for ×64, the first 16 iterations
shift value belongs to 1 to 8, the second belongs to 9 to 16 are fed straight to the block md5 core while simultaneously
, and the third & fourth belong to 17 to 23. There is another being stored in sram. This cannot be done as for ×16 the first
limit on the fourth shift value being greater than the third shift 4 iterations only have ascending order message chunks due
value. Using similar relationships, the sLUT values for 8-bit to normal iLUT values, and the remaining are out of order,
blocks are taken from 1 or 2 for the first value, 3 or 4 for which means these chunks must be first stored in sram and
the second, and 5 to 7 for the third and fourth. For the third then fed to the block md5 core. This results in ×16 needing
and fourth shift values, for ensuring that fourth value can be 16 iterations for storing, and then another 16 iterations to
greater than the third, while maintaining some change between compute the message-digest, unlike ×64, which requires 64
the 4 groups according to functions, an extra value from the iterations for both storage of message chunks (in the first 16)
fourth quarter of 1 to 8 range is also included. and computing message-digest.
Similar techniques are used for scaling down the 8-bit
block architecture to 4-bit block architecture. At 4-bit block VI. P ERFORMANCE
architecture, the message size becomes 64-bit (16×4-bit) size, The measurements taken for measurement of performance
while the hash becomes 16-bit size (4 × 4-bit). The values for of these different combinations of architecture need to be
kLUT can be found similar to Eq.6, by taking the register standardized. This is achieved by taken random values as
lateral width as 4-bit. However, while scaling down sLUT to input to the message-digest algorithm, and then comparing
shift values in range 1 to 4, scaling down from 32 is not the collision rate of the message-digests that are outputted
feasible, thus shift values are found by scaling down from 8. by the algorithm with different architectures. At the same
time an output signal of the message must be available for
S4 [i] = S8 [i] mod 4 (8)
comparison and distinction of different message-digests. Here
In Eq.8, Due a very small range (1 to 4), redundancies in either a through output from sram can be established, however
shifting are unavoidable, even after conforming these shift val- this is complicated to implement, thus the input message must
VII. C ONCLUSION
This paper presents the different possibilities of a modified
MD5 algorithm aiming to be implemented on Low-End IoT
edge devices. We have also shown the relative performance
characteristics in terms of message-digest collisions, synthesis
of components and time taken for hashing for the different
scale-down architectures. Reducing the message-digest size
has severe impact on integrity verification property of hashing,
and must be avoided, meanwhile reducing the number of iter-
ations has minimal effect, while highly reducing architecture
and time costs, and should be implemented.
However, one thing left to be noticed is that all con-
clusions made are from simulation alone, and not FPGA
Fig. 7. Synthesis Report
board implementation, which can disrupt simulations with an
unforseen variable. Also, fuzzy hashing can, more efficiently
and effectively, help in identifying files that contain a high
This can be attributed to 32-bit architecture needing more percentage of similarities, whose hardware implementation is
storage, while the 16-bit architecture requiring more signals the future work involved in this publication.
controlling actions, owing to lesser storage, thus requiring R EFERENCES
more Multiplexers than 32-bit architecture. Similarly, ×64
[1] Rogaway, Phillip, and Thomas Shrimpton. ”Cryptographic hash-function
iterations require more Registers, LUTs and Flipflops than basics: Definitions, implications, and separations for preimage resistance,
×16, while ×16 iterations require more Multiplexers than ×64 second-preimage resistance, and collision resistance.” International work-
iterations. This can be attributed to ×64 requiring architecture shop on fast software encryption. Springer, Berlin, Heidelberg, 2004.
[2] Bertoni, Guido, et al. ”The keccak sha-3 submission.” Submission to
to manage through inputs of message to sram, while ×16 NIST (Round 3) 6.7 (2011): 16.
requiring no such architecture. [3] Wang, Xiaoyun, and Hongbo Yu. ”How to break MD5 and other hash
functions.” Annual international conference on the theory and applications
Fig.8 shows the average time taken to calculate the message- of cryptographic techniques. Springer, Berlin, Heidelberg, 2005.
digest for one input message when the data-set of 1,048,575 [4] Rivest, Ronald. ”The MD5 message-digest algorithm.” No. RFC 1321.
is taken. There is no effect of architecture size on time 1992.
[5] Yang, Kaiyuan, David Blaauw, and Dennis Sylvester. ”Hardware designs
taken, only the number of iterations. ×64 iterations take for security in Ultra-Low-Power IoT systems: an overview and survey.”
680ns (6.8 × 10−7 seconds), while ×16 iterations take 380ns IEEE Micro 37.6 (2017): 72-89.
(3.8 × 10−7 seconds). For ×64, one iteration takes 10ns, [6] Didla, Shammi, Aaron Ault, and Saurabh Bagchi. ”Optimizing AES for
embedded devices and wireless sensor networks.” Proceedings of the
taking computation cycle to 640ns, with 40ns between two 4th International Conference on Testbeds and research infrastructures for
instances of computation to record, iterate LFSR, and reset the the development of networks & communities. ICST (Institute for Com-
architecture. Similarly, for ×16, computation cycle is 160ns puter Sciences, Social-Informatics and Telecommunications Engineering),
2008.
long, while the data capture is just as long (160ns), but to [7] Rivest, Ronald. ”The MD4 message-digest algorithm.” No. RFC 1320.
ensure flawless data capture another 20ns are given before 1992.
computation cycle starts. The reset and LFSR iterate cycles [8] Sedov, Stanislav (2011). ”MD5 core in verilog.” GitHub repository,
https://github.com/stass/md5 core
take the same time (40ns), thus bringing the total average cycle [9] ”Linear-feedback shift register.” Wikipedia contributors. Wikipedia, The
time to 380ns. Free Encyclopedia. Wikipedia, The Free Encyclopedia, 19 Feb. 2019.
[10] Goresky, Mark, and Andrew M. Klapper. ”Fibonacci and Galois repre-
sentations of feedback-with-carry shift registers.” IEEE Transactions on
Information Theory 48.11 (2002): 2826-2836
[11] George, Maria, and Peter Alfke. ”Linear feedback shift registers in virtex
devices.” Xilinx apprication note XAPP210 (2007).